Question/difficulty with parallel implementation

I have a PDE problem that I need to solve for a very large number of parameters and I want to run them in parallel as the PDE problem is the same for all parameters (embarrassingly parallel problem). I have 2 options: multi-thread with joblib or mpi. I faced many issues with both options:

  • Joblib: python multiprocessing uses pickle to pass data. However, FEniCS is not pickle-friendly. I tried a lot to avoid pickling but with no success. I can write a joblib script which each process initializes the PDE independently, however, initialization is as expensive as solving, so this is not going to work for me. Ideal I want to create function spaces and load mesh only once and then run the show in parallel.
  • mpi4py: I tried to run this parallel problem with mpi4py but when I use mpirun comman, FEniCS automatically assumes that I use domain decomposition. So it divides my (coefficint) vectors into chunks. This has tons of issues with, e.g., get_local() and set_local() methods. Is there a way to deactivate FEniCS parallelization?

At this point I am really desperate for any help. A serial computations might takes months to complete, while in parallel, it would take a couple of days.

Many thanks for any help.

I forgot to mention that the parallel PDE problems must communicate to perform some computation outside the PDE environment (using numpy for example). Therefor, I cannot run multiple parallel scripts simultaneously. Everything must be in one script.

Consider the following:

import concurrent.futures
from functools import partial
from dolfin import *
import numpy as np
def f(alpha):
    mesh = UnitSquareMesh(MPI.comm_self, N, N)
    z = assemble(alpha*dx(domain=mesh))
    return z

if __name__ == '__main__':
    N = 10
    alphas = np.arange(1,13)
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for alpha, value in zip(alphas, executor.map(f, alphas)):
            print(f"Volume*{alpha} = {value}")

You can also use mpirun, and set the MPI communicator to comm_self (as done above) as you would have a single mesh on each process then.

Thank you very much MPI.COMM_SELF fixed my problem. Now I successfully run the code in parallel using mpi. I have another question for multithread implementation. If instead of a function f we have a class such as

class f():
    def __init__(self):
        self.mesh = UnitSquareMesh(MPI.comm_self, N, N)
        self.V = FunctionSpace(self.mesh,'CG', 1)
        self.fun = Expression("sin(alpha*atan2(x[1], x[0]))", degree=1)
    def do_stuff(self, alpha):
        self.fun.alpha = alpha
        proj = Project(self.fun, self.V)
        z = assemble(proj*dx)
        return z

where we initiate f1, …,fk as instances of f and we want to run fk.do_stuff(alpha) in parallel. Does this change something with the parallelization in your code? (this is were I have issue with pickling)

If you could show how you specified the parallelization of this, and what error you got, I might be able to help you.