I have a PDE problem that I need to solve for a very large number of parameters and I want to run them in parallel as the PDE problem is the same for all parameters (embarrassingly parallel problem). I have 2 options: multi-thread with joblib or mpi. I faced many issues with both options:

Joblib: python multiprocessing uses pickle to pass data. However, FEniCS is not pickle-friendly. I tried a lot to avoid pickling but with no success. I can write a joblib script which each process initializes the PDE independently, however, initialization is as expensive as solving, so this is not going to work for me. Ideal I want to create function spaces and load mesh only once and then run the show in parallel.

mpi4py: I tried to run this parallel problem with mpi4py but when I use mpirun comman, FEniCS automatically assumes that I use domain decomposition. So it divides my (coefficint) vectors into chunks. This has tons of issues with, e.g., get_local() and set_local() methods. Is there a way to deactivate FEniCS parallelization?

At this point I am really desperate for any help. A serial computations might takes months to complete, while in parallel, it would take a couple of days.

I forgot to mention that the parallel PDE problems must communicate to perform some computation outside the PDE environment (using numpy for example). Therefor, I cannot run multiple parallel scripts simultaneously. Everything must be in one script.

import concurrent.futures
from functools import partial
from dolfin import *
import numpy as np
def f(alpha):
mesh = UnitSquareMesh(MPI.comm_self, N, N)
z = assemble(alpha*dx(domain=mesh))
return z
if __name__ == '__main__':
N = 10
alphas = np.arange(1,13)
with concurrent.futures.ProcessPoolExecutor() as executor:
for alpha, value in zip(alphas, executor.map(f, alphas)):
print(f"Volume*{alpha} = {value}")

You can also use mpirun, and set the MPI communicator to comm_self (as done above) as you would have a single mesh on each process then.

Thank you very much MPI.COMM_SELF fixed my problem. Now I successfully run the code in parallel using mpi. I have another question for multithread implementation. If instead of a function f we have a class such as

class f():
def __init__(self):
self.mesh = UnitSquareMesh(MPI.comm_self, N, N)
self.V = FunctionSpace(self.mesh,'CG', 1)
self.fun = Expression("sin(alpha*atan2(x[1], x[0]))", degree=1)
def do_stuff(self, alpha):
self.fun.alpha = alpha
proj = Project(self.fun, self.V)
z = assemble(proj*dx)
return z

where we initiate f1, …,fk as instances of f and we want to run fk.do_stuff(alpha) in parallel. Does this change something with the parallelization in your code? (this is were I have issue with pickling)