Mumps in parallel slower than in serial


I am playing with Hyperelasticity demo. I use MUMPS as the linear solver. I use 40x40x40 for the mesh. When doing:
mpirun -np xxx python
with xxx being 1, the computing time (after meshing until simulation ends) is 153s. When increasing xxx to 4, 6, 8, 10, the timing goes to 175, 161, 188, 199s

I set ghost_mode to be shared_facet and use ParMETIS (I don’t think these are relevant tho)

Does anyone observe this so it is how it is or I missed to set something important?




I had a similar problem which is due to petsc (or in particular its mumps solver) using thread parallelism via OpenMP. Let us say your PC has n threads available.
Then, when you run your program in serial it already uses n threads (afaik this is the default).

Using mpirun with m instances now lets each of these create another n openmp threads, so you have in total m*n openmp threads, allthough your system can only support n. Therefore, you are slower.

You can fix this behaviour setting the omp_ num_threads environment variable like this:

export OMP_NUM_THREADS = 1

in the bash before you run your program. This will make each mpi instance only create a single omp thread.