MPI performance question

Hi all,

I’m working on integrating MPI in my code but I think I do not fully understand what’s going on. Below are two snippets of code, With one difference; The problem is solved in all processes in the first snippet, and only in the process with rank 0 in the second.

I’d expect the second case to run in a similar time as the first, as only the first process is actually solving the problem and not all processes.

However, the second piece of code does not finish running as the program hangs on the problem.solve step.

import timeit
import dolfinx
from dolfinx import FunctionSpace, Constant, fem, Function
from mpi4py import MPI
from ufl import TrialFunction, TestFunction, inner, dx

comm = MPI.COMM_WORLD
rank = comm.rank
start_time = timeit.default_timer()


def p(str):
    print(f"[Rank {rank} ({timeit.default_timer() - start_time:3.5f}s)] {str}")


N = 2 ** 11
mesh = dolfinx.UnitSquareMesh(MPI.COMM_WORLD, N, N, dolfinx.cpp.mesh.CellType.triangle)
V = FunctionSpace(mesh, ("CG", 1))
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(mesh, 4)
uh = Function(V)


A = inner(u, v) * dx
F = inner(f, v) * dx
p("Defining problem")
problem = fem.LinearProblem(A, F, u=uh)

p("Start solving")

problem.solve()

p("Finished")

With result

47ce1a25ceac:python3 -u /opt/project/threading_test.py 3
[Rank 1 (15.68399s)] Defining problem
[Rank 2 (15.68618s)] Defining problem
[Rank 0 (15.68862s)] Defining problem
[Rank 1 (17.02300s)] Start solving
[Rank 2 (17.02234s)] Start solving
[Rank 0 (17.02337s)] Start solving
[Rank 0 (20.09286s)] Finished
[Rank 1 (20.09350s)] Finished
[Rank 2 (20.09285s)] Finished

And the second piece of code:

import timeit
import dolfinx
from dolfinx import FunctionSpace, Constant, fem, Function
from mpi4py import MPI
from ufl import TrialFunction, TestFunction, inner, dx

comm = MPI.COMM_WORLD
rank = comm.rank
start_time = timeit.default_timer()


def p(str):
    print(f"[Rank {rank} ({timeit.default_timer() - start_time:3.5f}s)] {str}")


N = 2 ** 11
mesh = dolfinx.UnitSquareMesh(MPI.COMM_WORLD, N, N, dolfinx.cpp.mesh.CellType.triangle)
V = FunctionSpace(mesh, ("CG", 1))
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(mesh, 4)
uh = Function(V)


A = inner(u, v) * dx
F = inner(f, v) * dx
p("Defining problem")
problem = fem.LinearProblem(A, F, u=uh)

p("Start solving")

problem.solve()
if rank == 0:
    p("Finished")

With result

b9e6cd267292:python3 -u /opt/project/threading_test.py 3
[Rank 2 (15.59279s)] Defining problem
[Rank 1 (15.59342s)] Defining problem
[Rank 0 (15.60022s)] Defining problem
[Rank 1 (16.80396s)] Start solving
[Rank 1 (16.80400s)] Finished
[Rank 2 (16.80393s)] Start solving
[Rank 2 (16.80398s)] Finished
[Rank 0 (16.80780s)] Start solving

Note that rank 0 does not get past the Start solving print statement

Can somebody give me a hint on what I’m missing in the understanding of MPI? This because I fear I’m missing some fundamental understanding.

I’m using v0.3.0 if that’s relevant

  • Wouter

Okay, for anyone visiting this page in the future;

The mistake I made was to initialize the mesh with MPI.COMM_WORLD which (I think) makes it such that the solver will use all the threads to solve the problem. Changing this to MPI.COMM_SELF fixed my problems.