Problems in running in parallel

Hi everyone,
I have experienced some issues in running parallel simulations. In particular, there is no difference in using 1 or more mpi processes.

I have downloaded the tutorial on Navier-Stokes and converted it into a .py file.

I have run three different cases:

  • 1 mpi process: takes 2.17 seconds
  • 2 mpi processes using mpirun -np 2 python takes 2.29 seconds
  • 3 mpi processes: takes 2.49 seconds

The computational time seems to be very similar, dolfinx has been installed with conda and the code has been run on MacBook Pro Intel.

I have encontoured the same problem also in other situations, is there something to “activate” to run in parallel?

Thanks in advance! :slight_smile:

First of all, check the output of python3 -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank, MPI.COMM_WORLD.size)"
mpirun -n 2 python3 -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank, MPI.COMM_WORLD.size)"

which should be
0 1

1 2
0 2


In general, I would say it is very suspect that this demo takes 2 seconds to run.
Please attach the actual code you are running in your python file.

I have executed the commands you suggested, for python3 -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank, MPI.COMM_WORLD.size)" the output is 0 1, whereas for the other I have

0 1
0 1

Here’s the code for the demo (I have forgotten to mention that the final time T is 1 second):

# %% [markdown]
# # Test problem 2: Flow past a cylinder (DFG 2D-3 benchmark)
# Author: Jørgen S. Dokken
# In this section, we will turn our attention to a slightly more challenging problem: flow past a cylinder. The geometry and parameters are taken from the [DFG 2D-3 benchmark]( in FeatFlow.
# To be able to solve this problem efficiently and ensure numerical stability, we will substitute our first order backward difference scheme with a Crank-Nicholson discretization in time, and a semi-implicit Adams-Bashforth approximation of the non-linear term.
# ```{admonition} Computationally demanding demo
# This demo is computationally demanding, with a run-time up to 15 minutes, as it is using parameters from the DFG 2D-3 benchmark, which consists of 12800 time steps. It is adviced to download this demo and  not run it in a browser. This runtime of the demo can be increased by using 2 or 3 mpi processes.
# ```
# The computational geometry we would like to use is
# ![Fluid channel with a circular obstacle](turek.png)
# The kinematic velocity is given by $\nu=0.001=\frac{\mu}{\rho}$ and the inflow velocity profile is specified as
# $$
#     u(x,y,t) = \left( \frac{4Uy(0.41-y)}{0.41^2}, 0 \right)
# $$
# $$
#     U=U(t) = 1.5\sin(\pi t/8)
# $$
# which has a maximum magnitude of $1.5$ at $y=0.41/2$. We do not use any scaling for this problem since all exact parameters are known. 
# ## Mesh generation
# As in the [Deflection of a membrane](./../chapter1/membrane_code.ipynb) we use GMSH to generate the mesh. We fist create the rectangle and obstacle.

# %%
import gmsh
import os
import numpy as np
import matplotlib.pyplot as plt
import tqdm.autonotebook

from mpi4py import MPI
from petsc4py import PETSc

from dolfinx.cpp.mesh import to_type, cell_entity_type
from dolfinx.fem import (Constant, Function, FunctionSpace, 
                         assemble_scalar, dirichletbc, form, locate_dofs_topological, set_bc)
from dolfinx.fem.petsc import (apply_lifting, assemble_matrix, assemble_vector, 
                               create_vector, create_matrix, set_bc)
from dolfinx.graph import create_adjacencylist
from dolfinx.geometry import BoundingBoxTree, compute_collisions, compute_colliding_cells
from import (VTXWriter, distribute_entity_data, gmshio)
from dolfinx.mesh import create_mesh, meshtags_from_entities

from ufl import (FacetNormal, FiniteElement, Identity, Measure, TestFunction, TrialFunction, VectorElement,
                 as_vector, div, dot, ds, dx, inner, lhs, grad, nabla_grad, rhs, sym)


L = 2.2
H = 0.41
c_x = c_y =0.2
r = 0.05
gdim = 2
mesh_comm = MPI.COMM_WORLD
model_rank = 0
if mesh_comm.rank == model_rank:
    rectangle = gmsh.model.occ.addRectangle(0,0,0, L, H, tag=1)
    obstacle = gmsh.model.occ.addDisk(c_x, c_y, 0, r, r)

# %% [markdown]
# The next step is to subtract the obstacle from the channel, such that we do not mesh the interior of the circle.

# %%
if mesh_comm.rank == model_rank:
    fluid = gmsh.model.occ.cut([(gdim, rectangle)], [(gdim, obstacle)])

# %% [markdown]
# To get GMSH to mesh the fluid, we add a physical volume marker

# %%
fluid_marker = 1
if mesh_comm.rank == model_rank:
    volumes = gmsh.model.getEntities(dim=gdim)
    assert(len(volumes) == 1)
    gmsh.model.addPhysicalGroup(volumes[0][0], [volumes[0][1]], fluid_marker)
    gmsh.model.setPhysicalName(volumes[0][0], fluid_marker, "Fluid")

# %% [markdown]
# To tag the different surfaces of the mesh, we tag the inflow (left hand side) with marker 2, the outflow (right hand side) with marker 3 and the fluid walls with 4 and obstacle with 5. We will do this by compute the center of mass for each geometrical entitiy.

# %%
inlet_marker, outlet_marker, wall_marker, obstacle_marker = 2, 3, 4, 5
inflow, outflow, walls, obstacle = [], [], [], []
if mesh_comm.rank == model_rank:
    boundaries = gmsh.model.getBoundary(volumes, oriented=False)
    for boundary in boundaries:
        center_of_mass = gmsh.model.occ.getCenterOfMass(boundary[0], boundary[1])
        if np.allclose(center_of_mass, [0, H/2, 0]):
        elif np.allclose(center_of_mass, [L, H/2, 0]):
        elif np.allclose(center_of_mass, [L/2, H, 0]) or np.allclose(center_of_mass, [L/2, 0, 0]):
    gmsh.model.addPhysicalGroup(1, walls, wall_marker)
    gmsh.model.setPhysicalName(1, wall_marker, "Walls")
    gmsh.model.addPhysicalGroup(1, inflow, inlet_marker)
    gmsh.model.setPhysicalName(1, inlet_marker, "Inlet")
    gmsh.model.addPhysicalGroup(1, outflow, outlet_marker)
    gmsh.model.setPhysicalName(1, outlet_marker, "Outlet")
    gmsh.model.addPhysicalGroup(1, obstacle, obstacle_marker)
    gmsh.model.setPhysicalName(1, obstacle_marker, "Obstacle")

# %% [markdown]
# In our previous meshes, we have used uniform mesh sizes. In this example, we will have variable mesh sizes to resolve the flow solution in  the area of interest; close to the circular obstacle. To do this, we use GMSH Fields.

# %%
# Create distance field from obstacle.
# Add threshold of mesh sizes based on the distance field
# LcMax -                  /--------
#                      /
# LcMin -o---------/
#        |         |       |
#       Point    DistMin DistMax
res_min = r / 3
if mesh_comm.rank == model_rank:
    distance_field = gmsh.model.mesh.field.add("Distance")
    gmsh.model.mesh.field.setNumbers(distance_field, "EdgesList", obstacle)
    threshold_field = gmsh.model.mesh.field.add("Threshold")
    gmsh.model.mesh.field.setNumber(threshold_field, "IField", distance_field)
    gmsh.model.mesh.field.setNumber(threshold_field, "LcMin", res_min)
    gmsh.model.mesh.field.setNumber(threshold_field, "LcMax", 0.25 * H)
    gmsh.model.mesh.field.setNumber(threshold_field, "DistMin", r)
    gmsh.model.mesh.field.setNumber(threshold_field, "DistMax", 2 * H)
    min_field = gmsh.model.mesh.field.add("Min")
    gmsh.model.mesh.field.setNumbers(min_field, "FieldsList", [threshold_field])

# %% [markdown]
# ## Generating the mesh
# We are now ready to generate the mesh. However, we have to decide if our mesh should consist of triangles or quadrilaterals. In this demo, to match the DFG 2D-3 benchmark, we use second order quadrilateral elements.

# %%
if mesh_comm.rank == model_rank:
    gmsh.option.setNumber("Mesh.Algorithm", 8)
    gmsh.option.setNumber("Mesh.RecombinationAlgorithm", 2)
    gmsh.option.setNumber("Mesh.RecombineAll", 1)
    gmsh.option.setNumber("Mesh.SubdivisionAlgorithm", 1)

# %% [markdown]
# ## Loading mesh and boundary markers
# As we have generated the mesh, we now need to load the mesh and corresponding facet markers into DOLFINx.
# To load the mesh, we follow the same structure as in  [Deflection of a membrane](./../chapter1/membrane_code.ipynb), with the difference being that we will load in facet markers as well. To learn more about the specifics of the function below, see [A GMSH tutorial for DOLFINx](

# %%
mesh, _, ft = gmshio.model_to_mesh(gmsh.model, mesh_comm, model_rank, gdim=gdim) = "Facet markers"

# %% [markdown]
# ## Physical and discretization parameters
# Following the DGF-2 benchmark, we define our problem specific parameters

# %%
t = 0
T = 1                       # Final time
dt = 1/1600                 # Time step size
num_steps = int(T/dt)
k = Constant(mesh, PETSc.ScalarType(dt))        
mu = Constant(mesh, PETSc.ScalarType(0.001))  # Dynamic viscosity
rho = Constant(mesh, PETSc.ScalarType(1))     # Density

# %% [markdown]
# ```{admonition} Reduced end-time of problem
# In the current demo, we have reduced the run time to one second to make it easier to illustrate the concepts of the benchmark. By increasing the end-time `T` to 8, the runtime in a notebook is approximately 25 minutes. If you convert the notebook to a python file and use `mpirun`, you can reduce the runtime of the problem.
# ```
# ## Boundary conditions
# As we have created the mesh and relevant mesh tags, we can now specify the function spaces `V` and `Q` along with the boundary conditions. As the `ft` contains markers for facets, we use this class to find the facets for the inlet and walls.

# %%
v_cg2 = VectorElement("CG", mesh.ufl_cell(), 2)
s_cg1 = FiniteElement("CG", mesh.ufl_cell(), 1)
V = FunctionSpace(mesh, v_cg2)
Q = FunctionSpace(mesh, s_cg1)

fdim = mesh.topology.dim - 1

# Define boundary conditions
class InletVelocity():
    def __init__(self, t):
        self.t = t

    def __call__(self, x):
        values = np.zeros((gdim, x.shape[1]),dtype=PETSc.ScalarType)
        values[0] = 4 * 1.5 * np.sin(self.t * np.pi/8) * x[1] * (0.41 - x[1])/(0.41**2)
        return values

# Inlet
u_inlet = Function(V)
inlet_velocity = InletVelocity(t)
bcu_inflow = dirichletbc(u_inlet, locate_dofs_topological(V, fdim, ft.find(inlet_marker)))
# Walls
u_nonslip = np.array((0,) * mesh.geometry.dim, dtype=PETSc.ScalarType)
bcu_walls = dirichletbc(u_nonslip, locate_dofs_topological(V, fdim, ft.find(wall_marker)), V)
# Obstacle
bcu_obstacle = dirichletbc(u_nonslip, locate_dofs_topological(V, fdim, ft.find(obstacle_marker)), V)
bcu = [bcu_inflow, bcu_obstacle, bcu_walls]
# Outlet
bcp_outlet = dirichletbc(PETSc.ScalarType(0), locate_dofs_topological(Q, fdim, ft.find(outlet_marker)), Q)
bcp = [bcp_outlet]

# %% [markdown]
# ## Variational form
# As opposed to [Pouseille flow](./ns_code1.ipynb), we will use a Crank-Nicolson discretization, and an semi-implicit Adams-Bashforth approximation.
# The first step can be written as
# $$
# \rho\left(\frac{u^*- u^n}{\delta t} + \left(\frac{3}{2}u^{n} - \frac{1}{2} u^{n-1}\right)\cdot \frac{1}{2}\nabla (u^*+u^n) \right) - \frac{1}{2}\mu \Delta( u^*+ u^n )+ \nabla p^{n-1/2} = f^{n+\frac{1}{2}} \qquad \text{ in } \Omega
# $$
# $$
# u^{*}=g(\cdot, t^{n+1}) \qquad \text{ on } \partial \Omega_{D}
# $$
# $$
# \frac{1}{2}\nu \nabla (u^*+u^n) \cdot n = p^{n-\frac{1}{2}} \qquad \text{ on } \partial \Omega_{N}
# $$
# where we have used the two previous time steps in the temporal derivative for the velocity, and compute the pressure staggered in time, at the time between the previous and current solution. The second step becomes
# $$
# \nabla \phi = -\frac{\rho}{\delta t} \nabla \cdot u^* \qquad\text{in } \Omega,
# $$
# $$
# \nabla \phi \cdot n = 0 \qquad \text{on } \partial \Omega_D,
# $$
# $$
# \phi = 0 \qquad\text{on } \partial\Omega_N
# $$
# where $p^{n+\frac{1}{2}}=p^{n-\frac{1}{2}} + \phi$.
# Finally, the third step is
# $$
# \rho (u^{n+1}-u^{*}) = -\delta t \phi. 
# $$
# We start by defining all the variables used in the variational formulations.

# %%
u = TrialFunction(V)
v = TestFunction(V)
u_ = Function(V) = "u"
u_s = Function(V)
u_n = Function(V)
u_n1 = Function(V)
p = TrialFunction(Q)
q = TestFunction(Q)
p_ = Function(Q) = "p"
phi = Function(Q)

# %% [markdown]
# Next, we define the variational formulation for the first step, where we have integrated the diffusion term, as well as the pressure term by parts.

# %%
f = Constant(mesh, PETSc.ScalarType((0,0)))
F1 = rho / k * dot(u - u_n, v) * dx 
F1 += inner(dot(1.5 * u_n - 0.5 * u_n1, 0.5 * nabla_grad(u + u_n)), v) * dx
F1 += 0.5 * mu * inner(grad(u + u_n), grad(v))*dx - dot(p_, div(v))*dx
F1 += dot(f, v) * dx
a1 = form(lhs(F1))
L1 = form(rhs(F1))
A1 = create_matrix(a1)
b1 = create_vector(L1)

# %% [markdown]
# Next we define the second step

# %%
a2 = form(dot(grad(p), grad(q))*dx)
L2 = form(-rho / k * dot(div(u_s), q) * dx)
A2 = assemble_matrix(a2, bcs=bcp)
b2 = create_vector(L2)

# %% [markdown]
# We finally create the last step

# %%
a3 = form(rho * dot(u, v)*dx)
L3 = form(rho * dot(u_s, v)*dx - k * dot(nabla_grad(phi), v)*dx)
A3 = assemble_matrix(a3)
b3 = create_vector(L3)

# %% [markdown]
# As in the previous tutorials, we use PETSc as a linear algebra backend.

# %%
# Solver for step 1
solver1 = PETSc.KSP().create(mesh.comm)
pc1 = solver1.getPC()

# Solver for step 2
solver2 = PETSc.KSP().create(mesh.comm)
pc2 = solver2.getPC()

# Solver for step 3
solver3 = PETSc.KSP().create(mesh.comm)
pc3 = solver3.getPC()

# %% [markdown]
# ## Verification of the implementation compute known physical quantities
# As a further verification of our implementation, we compute the drag and lift coefficients over the obstacle, defined as
# $$
#     C_{\text{D}}(u,p,t,\partial\Omega_S) = \frac{2}{\rho L U_{mean}^2}\int_{\partial\Omega_S}\rho \nu n \cdot \nabla u_{t_S}(t)n_y -p(t)n_x~\mathrm{d} s,
# $$    
# $$ 
#     C_{\text{L}}(u,p,t,\partial\Omega_S) = -\frac{2}{\rho L U_{mean}^2}\int_{\partial\Omega_S}\rho \nu n \cdot \nabla u_{t_S}(t)n_x + p(t)n_y~\mathrm{d} s,
# $$
# where $u_{t_S}$ is the tangential velocity component at the interface of the obstacle $\partial\Omega_S$, defined as $u_{t_S}=u\cdot (n_y,-n_x)$, $U_{mean}=1$ the average inflow velocity, and $L$ the length of the channel. We use `UFL` to create the relevant integrals, and assemble them at each time step.

# %%
n = -FacetNormal(mesh) # Normal pointing out of obstacle
dObs = Measure("ds", domain=mesh, subdomain_data=ft, subdomain_id=obstacle_marker)
u_t = inner(as_vector((n[1], -n[0])), u_)
drag = form(2 / 0.1 * (mu / rho * inner(grad(u_t), n) * n[1] - p_ * n[0]) * dObs)
lift = form(-2 / 0.1 * (mu / rho * inner(grad(u_t), n) * n[0] + p_ * n[1]) * dObs)
if mesh.comm.rank == 0:
    C_D = np.zeros(num_steps, dtype=PETSc.ScalarType)
    C_L = np.zeros(num_steps, dtype=PETSc.ScalarType)
    t_u = np.zeros(num_steps, dtype=np.float64)
    t_p = np.zeros(num_steps, dtype=np.float64)

# %% [markdown]
# We will also evaluate the pressure at two points, on in front of the obstacle, $(0.15, 0.2)$, and one behind the obstacle, $(0.25, 0.2)$. To do this, we have to find which cell is containing each of the points, so that we can create a linear combination of the local basis functions and coefficients.

# %%
tree = BoundingBoxTree(mesh, mesh.geometry.dim)
points = np.array([[0.15, 0.2, 0], [0.25, 0.2, 0]])
cell_candidates = compute_collisions(tree, points)
colliding_cells = compute_colliding_cells(mesh, cell_candidates, points)
front_cells = colliding_cells.links(0)
back_cells = colliding_cells.links(1)
if mesh.comm.rank == 0:
    p_diff = np.zeros(num_steps, dtype=PETSc.ScalarType)

# %% [markdown]
# ## Solving the time-dependent problem
# ```{admonition} Stability of the Navier-Stokes equation
# Note that the current splitting scheme has to fullfil the a [Courant–Friedrichs–Lewy condition]( This limits the spatial discretization with respect to the inlet velocity and temporal discretization.
# Other temporal discretization schemes such as the second order backward difference discretization or Crank-Nicholson discretization with Adams-Bashforth linearization are better behaved than our simple backward difference scheme.
# ```
# As in the previous example, we create output files for the velocity and pressure and solve the time-dependent problem. As we are solving a time dependent problem with many time steps, we use the `tqdm`-package to visualize the progress. This package can be install with `pip3`.

# %%
vtx_u = VTXWriter(mesh.comm, "dfg2D-3-u.bp", [u_])
vtx_p = VTXWriter(mesh.comm, "dfg2D-3-p.bp", [p_])
progress = tqdm.autonotebook.tqdm(desc="Solving PDE", total=num_steps)
for i in range(num_steps):
    # Update current time step
    t += dt
    # Update inlet velocity
    inlet_velocity.t = t

    # Step 1: Tentative velocity step
    assemble_matrix(A1, a1, bcs=bcu)
    with b1.localForm() as loc:
    assemble_vector(b1, L1)
    apply_lifting(b1, [a1], [bcu])
    b1.ghostUpdate(addv=PETSc.InsertMode.ADD_VALUES, mode=PETSc.ScatterMode.REVERSE)
    set_bc(b1, bcu)
    solver1.solve(b1, u_s.vector)

    # Step 2: Pressure corrrection step
    with b2.localForm() as loc:
    assemble_vector(b2, L2)
    apply_lifting(b2, [a2], [bcp])
    b2.ghostUpdate(addv=PETSc.InsertMode.ADD_VALUES, mode=PETSc.ScatterMode.REVERSE)
    set_bc(b2, bcp)
    solver2.solve(b2, phi.vector)

    p_.vector.axpy(1, phi.vector)

    # Step 3: Velocity correction step
    with b3.localForm() as loc:
    assemble_vector(b3, L3)
    b3.ghostUpdate(addv=PETSc.InsertMode.ADD_VALUES, mode=PETSc.ScatterMode.REVERSE)
    solver3.solve(b3, u_.vector)

    # Write solutions to file

    # Update variable with solution form this time step
    with u_.vector.localForm() as loc_, u_n.vector.localForm() as loc_n, u_n1.vector.localForm() as loc_n1:

    # Compute physical quantities
    # For this to work in paralell, we gather contributions from all processors
    # to processor zero and sum the contributions. 
    drag_coeff = mesh.comm.gather(assemble_scalar(drag), root=0)
    lift_coeff = mesh.comm.gather(assemble_scalar(lift), root=0)
    p_front = None
    if len(front_cells) > 0:
        p_front = p_.eval(points[0], front_cells[:1])
    p_front = mesh.comm.gather(p_front, root=0)
    p_back = None
    if len(back_cells) > 0:
        p_back = p_.eval(points[1], back_cells[:1])
    p_back = mesh.comm.gather(p_back, root=0)
    if mesh.comm.rank == 0:
        t_u[i] = t
        t_p[i] = t - dt / 2
        C_D[i] = sum(drag_coeff)
        C_L[i] = sum(lift_coeff)
        # Choose first pressure that is found from the different processors
        for pressure in p_front:
            if pressure is not None:
                p_diff[i] = pressure[0]
        for pressure in p_back:
            if pressure is not None:
                p_diff[i] -= pressure[0]

Then there is something wrong with your installation, as this tells you that each communicator is treated as a stand alone program.

Could you state exactly how you created your environment with conda? (So that someone with a Mac can try to reproduce it).

Thanks for the answer, to create the environment I have used the following commands

conda create --name <env_name>
conda activate <env_name>
conda install -c conda-forge python=3.10 fenics-dolfinx petsc=*=real* mpich pyvista meshio

I have also tried the instructions you provide in the docs, with the same outcome.

Can you share conda list --explicit -n <env name>?

Yes, of course: here’s the response:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64

I’ve tried with your exact env list, and with your install command (the given install command conda install -c conda-forge python=3.10 fenics-dolfinx petsc=*=real* mpich pyvista meshio does not produce the env that was shared, see this comparison, but mpich and mpi4py are the same, which ought to be the only ones that matter for the basic test). And it produced the same results. So my next hunch is that there’s something in $PATH or other env that’s the cause. So I have the next questions:

  1. does it occur with a clean env (fresh install, start with just mpich, mpi4py - conda create -n test-env python=3.10 mpich mpi4py)?
  2. what is which -a mpirun? Is there more than one result?
  3. Are there any mpi environment variables set (env | grep MPI)?
1 Like

I have created a new conda env as suggeted and tested with mpirun -n 2 python3 -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank, MPI.COMM_WORLD.size)", with the following output

0 1
0 1

thus, the same issue of before. I did suspect that there was something related to the PATH but I was unable to fix it. which -a mpirun gives:


That’s very strange indeed, but I think we’ve found the problem. You have /usr/local/bin on your PATH ahead of conda. The source of this is likely in your ~/.bashrc or ~/.bash_profile file (assuming bash, not sure where it is in zsh).

You can manually edit your PATH by running:

echo $PATH

and then

export PATH=...

without /usr/local/bin on the front to set it again.

You can also try removing the (presumably) homebrew-installed mpi, if you are not using it.

Thank you very much for the help!
I report the last outputs for completeness.

  • mpirun -n 4 python3 -c "from mpi4py import MPI; print(MPI.COMM_WORLD.rank, MPI.COMM_WORLD.size) gives:
0 4
2 4
1 4
3 4

Moreover, the NS tutorial now takes around 40 seconds to complete 1 sec of simulation using 4 mpi processes.

Thanks again a lot for the help!