Multiple cpus only for solving part

Hi, I’d like to use multiple cpus for solving a pde. My code is structured like this:

from dolfin import *

class MyPDE(object):
    def __init__(self, args):
        """PDE is defined"""

    def pre_processing(self, args):

    def solving(self, args):

    def post_processing(self, args):

The main code is:

from prob import MyPDE

if __name__ == "__main__":
    pde = MyPDE(args)

When I run this code by

mpirun -n NUM_THREAD python

I need to gather some variables for post_processing.

But, I’d like to apply mpi only for solving the equation.
How can I do this kind of process?


You need to run more than just the actual solve in parallel. This is due to the fact that the linear algebra backend (PETSc) uses distributed matrices, which in turn means that these structures has to be distributed in dolfin (for instance the mesh, function space et).

What you can do is to let your __init__ and solving run with NUM_THREAD processes (which they do by default when you run mpirun -n NUM_THREAD python3,
and then internally in pre_processing and post_processing use

from mpi4py import MPI
if comm.rank == 0:
    # Do serial processing on only rank 0

EDIT the references to threads are to match the original authors syntax


A side comment here is that there is a fundamental difference between threads and processes.

When invoking mpirun -n N python3 you running with N individual processes.

Processes do not share memory and multiple processes are executed in parallel.

Threads share memory and multiple threads are employed in concurrent execution.

This is a delicate definition for which I welcome corrections, but it’s a key consideration when using FEniCS.


Thank you for your reply. So, do I need to gather before post_processing?