Solving independent problems in parallel

Hello,

I am trying to solve independent problems in parallel. Up to 16 procs the speedup is good with respect to solving only 1 problem in serial, but then suddenly this drops dramatically. Inspecting the times, this slowdown is due to a slower assemble of the stiffness matrix. The solve time is constant on the other hand. Is there any reason/fix for this behaviour?

Thanks,
Federico

Let me maybe ask a more specific question: the communicator of a matrix assembled with

dolfinx.fem.petsc.assemble_matrix(form, bcs)

is derived from the mesh one (which is MPI.COMM_SELF) or not? If not, how can one achieve this?

The matrix assembler is derived from the one in the mesh.
Without an example at hand, it is hard to pinpoint why your problem doesn’t scale after 16 processors.