How to solve in parallel for parametric study

Hi
I want to solve a elasticity problem with changing the constants including the Elasticity modulus (E) and Poisson ratio (nu) in different steps (Lets say in 3 steps). In this case I have 9 combinations of the E and nu. I can solve such problem with a single core in serial. Here is the minimal work:

from dolfin import *

L = 25.
H = 1.
Nx = 250
Ny = 10
mesh = RectangleMesh(Point(0., 0.), Point(L, H), Nx, Ny, "crossed")

rho_g = 1e-3
f = Constant((0,-rho_g))

E = [Constant(1e5) , Constant(2e5),Constant(3e5)]
nu = [Constant(0.15) , Constant(0.3) ,  Constant(0.45)]

for i in range(3):
    for j in range(3):
        mu = E[i]/2/(1+nu[j])
        lmbda = E[i]*nu[j]/(1+nu[j])/(1-2*nu[j])
        def eps(v):
            return sym(grad(v))
        def sigma(v):
            return lmbda * tr(eps(v)) * Identity(2) + 2.0 * mu * eps(v)
        V = VectorFunctionSpace(mesh, 'Lagrange', degree=2)
        du = TrialFunction(V)
        u_ = TestFunction(V)
        a = inner(sigma(du), eps(u_)) * dx
        l = inner(f, u_) * dx

        def left(x, on_boundary):
            return near(x[0], 0.)

        bc = DirichletBC(V, Constant((0., 0.)), left)
        u = Function(V, name="Displacement")
        solve(a == l, u, bc)

        displacement = File(str(i)+str(j)+".pvd")
        displacement << u

Now I want to solve it in parallel to make it faster. With that being said, I want to use 9 cores and solve each combination separately with the assigned core. I have been using FEniCS for 4 years but as I have never used it in parallel (e.g. using MPI) , I could not figure it out. I would appreciate your time helping me with that.

2 Likes

One option is to use the MPI_COMM_SELF communicator instead of the default MPI_COMM_WORLD one (which results in e.g. parallel assembly/solve if your program is run with mpirun -np X). The following illustrates the idea; which parameter the process uses is determined by its rank in the global communicator

# search.py
from dolfin import *

def poisson(alpha, comm=mpi_comm_world()):
    mesh = UnitSquareMesh(comm, 256, 256)
    
    V = FunctionSpace(mesh, 'CG', 1)
    u, v = TrialFunction(V), TestFunction(V)
    bc = DirichletBC(V, Constant(0), 'on_boundary')

    a = inner(Constant(alpha)*grad(u), grad(v))*dx
    L = inner(Constant(1), v)*dx

    uh = Function(V)
    solve(a == L, uh, bc)
    info('alpha %g -> |uh|=%g' % (alpha, uh.vector().norm('l2')))
    File(comm, 'foo%g.pvd' % alpha) << uh
    
alphas = [3, 8, 9, 10]

assert len(alphas) >= MPI.size(mpi_comm_world())
# Get alpha based on global rank
my_alpha = alphas[MPI.rank(mpi_comm_world())]
poisson(my_alpha, mpi_comm_self())  # Do everything on THIS process
# run as `mpirun -np 4 python search.py`

An alternative is to use python’s multiprocessing module; in the above you replace everything after the assert statement with

import multiprocessing

pool = multiprocessing.Pool(processes=4)
pool.map(poisson, alphas)
pool.close()
# Run as python search.py

Timings on my machine (FEniCS 2017.2.0) are

  • 1.4s for time mpirun -np 1 python search.py (one alpha)
  • 1.9s for time mpirun -np 4 python search.py (all alphas, 4cpus)
  • 1.7s for time python search.py (all alphas, pool of 4 workers)
2 Likes

Thanks MiroK. I just followed your first solution to implement it on my specific example. I only added naming1 and naming2 in the function because I want to name the PVD files based on the indexes of the E and nu parameters in their lists. Here is what I did:

from dolfin import *
import time

start_time = time.time()
L = 25.
H = 1.
Nx = 250
Ny = 10
rho_g = 1e-3
f = Constant((0,-rho_g))

def elasticity(E,nu,naming1,naming2, comm=mpi_comm_world()):

    mesh = RectangleMesh(comm, Point(0., 0.), Point(L, H), Nx, Ny, "crossed")

    mu = E/2/(1+nu)
    lmbda = E*nu/(1+nu)/(1-2*nu)

    def eps(v):
        return sym(grad(v))
    def sigma(v):
        return lmbda * tr(eps(v)) * Identity(2) + 2.0 * mu * eps(v)

    V = VectorFunctionSpace(mesh, 'Lagrange', degree=2)
    du = TrialFunction(V)
    u_ = TestFunction(V)
    a = inner(sigma(du), eps(u_)) * dx
    l = inner(f, u_) * dx

    def left(x, on_boundary):
        return near(x[0], 0.)

    bc = DirichletBC(V, Constant((0., 0.)), left)
    u = Function(V, name="Displacement")
    solve(a == l, u, bc)

    info('E %g nu %g -> |uh|=%g' % (E,nu, u.vector().norm('l2')))
    File(comm, '%g%g.pvd' % (naming1.index(E),naming2.index(nu))) << u

E = [Constant(1e5) , Constant(2e5),Constant(3e5)]
nu =[Constant(0.15) , Constant(0.3)]

A = E
B = nu

assert len(E) >= MPI.size(mpi_comm_world())
assert len(nu) >= MPI.size(mpi_comm_world())

# Get alpha based on global rank
my_E = E[MPI.rank(mpi_comm_world())]
my_nu = nu[MPI.rank(mpi_comm_world())]
elasticity(my_E,my_nu,A,B, mpi_comm_self())  # Do everything on THIS process

print("--- %s seconds ---" % (time.time() - start_time))

The thing is I have 2 different parameters in my example (E and nu) and as you can see E changes 3 times and nu changes 2 times. There are 6 combinations of these parameters. So I executed this line in terminal:

 mpirun -n 6 python test.py

I got this error:

File "test.py", line 50, in <module>
assert len(E) >= MPI.size(mpi_comm_world())
AssertionError: mpirun noticed that process rank 3 with PID 32645 on node alan-ThinkCentre-M900 exited on signal 6 (Aborted).

I was just wondering how I could fix it and run these 6 different combinations on 6 cores.
Thanks again for time!

You can for example build the combinations globally and then unpack individual paramaters

 import itertools
 from dolfin import MPI, Constant, info, mpi_comm_world

 E = [Constant(1e5) , Constant(2e5),Constant(3e5)]
 nu =[Constant(0.15) , Constant(0.3)]

 E_nu_combos = list(itertools.product(E, nu))
 assert len(E_nu_combos) >= MPI.size(mpi_comm_world())

 my_E, my_nu = E_nu_combos[MPI.rank(mpi_comm_world())]

 info("%g %g" % (my_E, my_nu))
1 Like

Thanks again for your time. Is there any reason for increasing the time of computation by increasing the number of cores? For example if I only use 1 core to consider only 1 combination by doing:

mpirun -n 1 python test.py

Here is the output:

E 100000 nu 0.15 -> |uh|=0.416805
--- 0.428194999695 seconds ---

If I use 4 cores to consider 4 combinations:

mpirun -n 4 python test.py

Here is the output:

Process 1: E 100000 nu 0.3 -> |uh|=0.212336
Process 0: E 100000 nu 0.15 -> |uh|=0.212336
Process 2: E 200000 nu 0.15 -> |uh|=0.212336
Process 3: E 200000 nu 0.3 -> |uh|=0.212336
--- 0.881356954575 seconds ---
--- 0.881036996841 seconds ---
--- 0.866196870804 seconds ---
--- 0.894992828369 seconds ---

It looks that the time of computation on each processor almost doubles by increasing the number of processors from 1 to 4. Is it normal?