Parallelisation with python multiprocessing

Hey everyone,

I have a case where I want to solve a PDE with many (hundreds or thousands) of parameter variations. In my case, I can’t use MPI to parallelise the evaluation (a longer story), so I thought of using the multiprocessing functionalities of python. The idea is to use a pool of processes where each process creates its own copy of the mesh, applies the parameters, defines the variational problem and solves it individually. In practice, this only works with some solvers though and I can’t explain why.

To give an example, I wrote this short script that reproduces the behaviour. It solves an elasticity problem and the modulus of elasticity is varied. The exact problem that is solved is irrelevant here (and actually the presented example evaluates pointless quantities).

from dolfin import *
import multiprocessing

import numpy as np

# bottom boundary for dirichlet bc
def bottom(x, on_boundary):
    return near(x[2],0) and on_boundary

# top boundary for dirichlet BC
def top(x, on_boundary):
    return near(x[2], 0) and on_boundary

# definitions of stress and strain
def eps(v):
    return sym(grad(v))

def sigma(v, mu, lmbda):
    return lmbda*tr(eps(v))*Identity(3) + 3*mu*eps(v)

def calc_disp(modulus):

    # create mesh of unit cube
    mesh = UnitCubeMesh(15,15,15)

    # define function spaces
    V = VectorFunctionSpace(mesh, "CG", 1)
    v = TestFunction(V)
    du = TrialFunction(V)
    u = Function(V)

    # Lamé constants
    nu = 0.33
    mu = modulus/(2*(1+nu))
    lmbda = modulus/((1-2*nu)*(1+nu))

    # variational problem
    psi = inner(sigma(du, mu, lmbda), eps(v)) * dx
    aM = lhs(psi)
    LM = rhs(psi)

    # BCs
    delta = 1e-1
    bcs = [DirichletBC(V, Constant((0.,0.,0.)), bottom),
            DirichletBC(V.sub(2), Constant(delta), top)]
    
    # solve the variational problem
    solve( aM == LM, u, bcs) # hangs
    # solve( aM == LM, u, bcs, solver_parameters={"linear_solver": "bicgstab"}) # doesn't hang

    # return the x-displacement at a top corner
    return u([1.,1.,1.])[0]

# starting a pool with nproc workers
nproc = 8
pool = multiprocessing.Pool(nproc)

# moduli to loop over
moduli = np.linspace(1,10,nproc*20)

# loop over all entries
pool.map(calc_disp, iter(moduli))

When I run this script, it starts working fine and solves a number of problems effortlessly. But then, it suddenly hangs. I can see with the ‘top’ command that there is no more CPU usage from these processes.
Interestingly, switching from the default solver to bicgstab (see the line that’s commented out in the script) solves this problem and the script runs through fine. I was also able to verify that if the processes start hanging, they hang indeed in the “solve” - line.
Also, the problem occurs only if the meshes are sufficiently large. If the mesh has a size of 5x5x5 instead of 15x15x15, there is no hanging.

For now, I can work with bicgstab. However, I’d be glad if anyone has an idea to avoid this behaviour with other solvers. I am aware that this is not quite the intended way to parallelise fenics, but it would be cool if something like this worked, especially if one has many parameter variations to go through.

Thanks for your help in advance,
Andreas

Hey Andreas,

maybe in the cases where it doesn’t work an error occurs in one process (residual diverges, etc.). At least for MPI it can then happen, that the processes wait for the other ones to finish. When one process crashed they wait forever and you are basically in a deadlock situation (mpi4py.run — MPI for Python 3.1.3 documentation). Just an idea, haven’t tried your example though.

For what it’s worth, your example does not hang on the Debian build of dolfin with OpenBLAS installed to provide blas (also works with ATLAS). The example completes normally.

If the issue is not your BLAS implementation, then maybe the size dependence is a clue. Does your system have a “small” amount of RAM? Mine has 24Gb memory.

Hi, thanks for your answer. This could be a point, but I don’t think it’s the cause of the hanging here. The problem that is solved is very simple and converges effortlessly. In this case, with multiprocessing, the processes can go from problem to problem and there is no reason why one solve - call should wait for another one (at least intuitively, I don’t know much about the implementation). Anyway, thanks for your thoughts!

Hey, thanks a lot for trying this on your system. Regarding memory: initially, I tried it with the default docker container which (I think) got 2 GB - where it did not work. Increasing the memory for the docker container to 8 GB fixed the problem, so that could indeed have been the cause.
Also, I ran the example on our cluster where openblas is installed (8 GB RAM for the job) and it worked too.
All in all, I think that solves the issue for the moment.
If somebody has an explanation of why exactly the initial configuration results in hanging, I’d be interested in it. I think it could be some kind memory issue or something related to locks. Anyway, thanks a lot for your help!