Does parallelization help mumps?

Matheus-Janczkowski · January 9, 2026, 11:46pm

I’m working with hyperelasticity in dolfin. Is there any benefit in using parallelization tools commonly used in fenics examples if I use mumps as linear solver? Is not mumps supposed to be already parallel?

dokken · January 10, 2026, 11:13am

Mumps is a direct solver that supports mpi parallelisation. Thus if you change the factorization backend for the direct solver in DOLFIN/DOLFINx to mumps, it can yield speedups. However, it depends on the size of the problem, if the communication overhead added by communicating between different process is more expensive than solve the system (which can happen for small systems), you might not see a speedup, or see a speedup when going from 1 to 2 processes, but not a speedup going from 2 to 4 processes.

nate · January 12, 2026, 6:00pm

Just subjective, but I’ve achieved decent parallel performance as measured by strong scaling with MUMPS going up to about 10M DoF for 2D problems. Specifically this was Stokes coupled with the heat equation.

I’ve not been able to demonstrate good strong scaling for 3D problems. Perhaps others have suggestions.

See also How to choose the optimal solver for a PDE problem? - #2 by nate

BillS · January 13, 2026, 4:42pm

I can offer some qualitative observations based on a homebrew PC cluster.

For 3D problems, I have noted that if the DoFs fit in the memory of your computer (i.e. no network communication between compute nodes - only internal processor cores), I see speedup up to 4 cores using MPI. More than this, and the speedup exhibits diminishing returns.

For problems that need to distribute memory over many nodes, i.e. network communications is needed, it is best to use maybe 2 or 4 processors per node. Don’t use too many nodes for small problems. There is a balance between the speed gained by using more cores and the communication required between the nodes, which slows things down. Using bigger “chunks” reduces the communications (fewer SSH background sessions exchanging info over your network). If your simulation is broken up into many small chunks, the communications overhead begins to kill your processing performance. Experiment a bit. 2-3 cores on each node should be sufficient, but it is possible that performance will be quite problem dependent.

Topic		Replies	Views
MPI acceleration with FEniCSx General	13	392	January 24, 2025
Improving Parallel Scalability of Linear Elasticity Code in FEniCSx dolfinx mpi , petsc , performance	2	157	July 2, 2025
Very low parallel efficiency (10 times slower) for multi-nodes computing Linear Algebra	0	375	April 15, 2020
Conspicuous speed ups in parallel computing General	1	455	September 12, 2023
Mumps in parallel slower than in serial	1	3623	April 24, 2019

Does parallelization help mumps?

Related topics