Scallar assembly of a internal surface integral misbehaves in parallel

Hello,

I have run into some weird issues when assembling a scalar:

MWE script:

from mpi4py import MPI
from dolfinx import io, fem, __version__
import ufl
if MPI.COMM_WORLD.rank == 0:
    print(__version__)

path = 'path/to/mesh.msh'
domain, cell_tags, facet_tags = io.gmshio.read_from_msh(path, MPI.COMM_WORLD, 0, gdim=3)
dS = ufl.Measure("dS", domain=domain, subdomain_data=facet_tags)
scalar_value = fem.assemble_scalar(fem.form(8*dS(2)))
print(scalar_value)

When I run this with mpirun -n 1 python3 issue_dS.py I get this output which seems correct.

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.8593967035684853

When I run this with mpirun -n 2 python3 issue_dS.py I get this output which is also correct.

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.47619510276556726
0.38313327447585555

But when I increase the number of processses even more, such as mpirun -n 20 python3 issue_dS.py, the program hangs in this state and never finishes:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

I do not know if this is issue with the mesh or with dolfinx. I am doing something wrong? The issue arises when I increase the number of processors above 3. I think it happens when the mesh is partitioned in such a way that there no facets with mark 2 present in some of the processes.

The mesh has been generated using gmsh from a step file. Here is a Link to the mesh (58MB).

Is this issue in dolfinx? If so I will report it on Github.

Thank you for any reply.

There are probably two different issues here:

  1. using dS in parallel requires a specific ghost mode, called GhostMode.shared_facet. See for instance dolfinx/python/demo/demo_biharmonic.py at b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106 · FEniCS/dolfinx · GitHub for how to use it with a built in demo. io.gmshio.read_from_msh has an optional argument for that too, I’ll let you look in the help what’s its name (still ghost_mode, probably).
  2. when running in mpirun -n 2 you are seeing two different values because the integral is only computed on the local part of the mesh. That is why demos use parallel communication to collect the results across all processes, see e.g. dolfinx/python/demo/demo_lagrange_variants.py at b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106 · FEniCS/dolfinx · GitHub
2 Likes

Thank you @francesco-ballarin for the response,

I am aware that I have to collect the local contributions if I want the “global” value. My issue was that when I use mpirun -n 20 I should get 20 such local contributions but I get only 11 (and those are all zero).The rest of the ranks never returns as if they gets stuck/blocked in some MPI call. I guess that the ranks that have the facets I am trying to integrate over cannot finish for some reason.

I have tried your suggestion (I hope that I use the current API of read_from_msh correctly) but the issue persists:

from mpi4py import MPI
from dolfinx import io, fem, __version__
import ufl
from dolfinx.mesh import GhostMode, create_cell_partitioner
if MPI.COMM_WORLD.rank == 0:
    print(__version__)

path = 'meshes/urbanek/mesh.msh'
partitioner = create_cell_partitioner(GhostMode.shared_facet)
domain, cell_tags, facet_tags = io.gmshio.read_from_msh(path, MPI.COMM_WORLD, 0, gdim=3, partitioner=partitioner)
dS = ufl.Measure("dS", domain=domain, subdomain_data=facet_tags)
scalar_value = fem.assemble_scalar(fem.form(8*dS(2)))
print(f"rank = {MPI.COMM_WORLD.rank}, value = ", scalar_value)

When I run this with mpirun -n 4 or higher number of processes I do not get the right number of local values, the rest hangs and does not return:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
rank = 0, value =  0.0
rank = 2, value =  0.0

rank 1 and 3 do not return anything and the process keeps running forever.

Btw. when I use dS instead of dS(2) it does work and every rank returns its local value.

The result when the program contains dS instead of dS(2) is this:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
rank = 0, value =  1159.8587287839784
rank = 2, value =  1257.3998569644205
rank = 3, value =  233.38589352944297
rank = 1, value =  165.13436971572628

Temporary fix is to all

for i in range(domain.topology.dim + 1):
    domain.topology.create_entities(i)

prior to assembly.

Issue posted at:

1 Like

I see that a PR that should solve this has been also requested by you.

Thank you for looking into this, you are the best @dokken !

Yes, it might need some TLC, but it at least fixes it when I run the example you posted on my system, and my smaller manufactured example. There are some slightly bigger issues that this points to, that also has to be addressed (but it shouldn’t affect the API directly).