MPI performance question

wvanharten · May 12, 2022, 6:12am

Hi all,

I’m working on integrating MPI in my code but I think I do not fully understand what’s going on. Below are two snippets of code, With one difference; The problem is solved in all processes in the first snippet, and only in the process with rank 0 in the second.

I’d expect the second case to run in a similar time as the first, as only the first process is actually solving the problem and not all processes.

However, the second piece of code does not finish running as the program hangs on the problem.solve step.

import timeit
import dolfinx
from dolfinx import FunctionSpace, Constant, fem, Function
from mpi4py import MPI
from ufl import TrialFunction, TestFunction, inner, dx

comm = MPI.COMM_WORLD
rank = comm.rank
start_time = timeit.default_timer()


def p(str):
    print(f"[Rank {rank} ({timeit.default_timer() - start_time:3.5f}s)] {str}")


N = 2 ** 11
mesh = dolfinx.UnitSquareMesh(MPI.COMM_WORLD, N, N, dolfinx.cpp.mesh.CellType.triangle)
V = FunctionSpace(mesh, ("CG", 1))
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(mesh, 4)
uh = Function(V)


A = inner(u, v) * dx
F = inner(f, v) * dx
p("Defining problem")
problem = fem.LinearProblem(A, F, u=uh)

p("Start solving")

problem.solve()

p("Finished")

With result

47ce1a25ceac:python3 -u /opt/project/threading_test.py 3
[Rank 1 (15.68399s)] Defining problem
[Rank 2 (15.68618s)] Defining problem
[Rank 0 (15.68862s)] Defining problem
[Rank 1 (17.02300s)] Start solving
[Rank 2 (17.02234s)] Start solving
[Rank 0 (17.02337s)] Start solving
[Rank 0 (20.09286s)] Finished
[Rank 1 (20.09350s)] Finished
[Rank 2 (20.09285s)] Finished

And the second piece of code:

import timeit
import dolfinx
from dolfinx import FunctionSpace, Constant, fem, Function
from mpi4py import MPI
from ufl import TrialFunction, TestFunction, inner, dx

comm = MPI.COMM_WORLD
rank = comm.rank
start_time = timeit.default_timer()


def p(str):
    print(f"[Rank {rank} ({timeit.default_timer() - start_time:3.5f}s)] {str}")


N = 2 ** 11
mesh = dolfinx.UnitSquareMesh(MPI.COMM_WORLD, N, N, dolfinx.cpp.mesh.CellType.triangle)
V = FunctionSpace(mesh, ("CG", 1))
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(mesh, 4)
uh = Function(V)


A = inner(u, v) * dx
F = inner(f, v) * dx
p("Defining problem")
problem = fem.LinearProblem(A, F, u=uh)

p("Start solving")

problem.solve()
if rank == 0:
    p("Finished")

With result

b9e6cd267292:python3 -u /opt/project/threading_test.py 3
[Rank 2 (15.59279s)] Defining problem
[Rank 1 (15.59342s)] Defining problem
[Rank 0 (15.60022s)] Defining problem
[Rank 1 (16.80396s)] Start solving
[Rank 1 (16.80400s)] Finished
[Rank 2 (16.80393s)] Start solving
[Rank 2 (16.80398s)] Finished
[Rank 0 (16.80780s)] Start solving

Note that rank 0 does not get past the Start solving print statement

Can somebody give me a hint on what I’m missing in the understanding of MPI? This because I fear I’m missing some fundamental understanding.

I’m using v0.3.0 if that’s relevant

Wouter

wvanharten · May 12, 2022, 7:31am

Okay, for anyone visiting this page in the future;

The mistake I made was to initialize the mesh with MPI.COMM_WORLD which (I think) makes it such that the solver will use all the threads to solve the problem. Changing this to MPI.COMM_SELF fixed my problems.

Topic		Replies	Views
Problems in running in parallel dolfinx	10	1096	April 21, 2023
Different results NewtonSolve parallel mpi.comm_world dolfinx mpi	12	46	January 23, 2025
Running dolfinx on parallel processors dolfinx	6	1088	April 20, 2022
MPI versus OMP_NUM_THREADS & issue with mesh not deforming in some regions dolfinx	3	38	October 25, 2024
Problem in evaluating function at point in parallel with dolfinx dolfinx	1	796	May 24, 2022

MPI performance question

Related topics