Scallar assembly of a internal surface integral misbehaves in parallel

LiborKudela · May 14, 2024, 2:55pm

Hello,

I have run into some weird issues when assembling a scalar:

MWE script:

from mpi4py import MPI
from dolfinx import io, fem, __version__
import ufl
if MPI.COMM_WORLD.rank == 0:
    print(__version__)

path = 'path/to/mesh.msh'
domain, cell_tags, facet_tags = io.gmshio.read_from_msh(path, MPI.COMM_WORLD, 0, gdim=3)
dS = ufl.Measure("dS", domain=domain, subdomain_data=facet_tags)
scalar_value = fem.assemble_scalar(fem.form(8*dS(2)))
print(scalar_value)

When I run this with mpirun -n 1 python3 issue_dS.py I get this output which seems correct.

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.8593967035684853

When I run this with mpirun -n 2 python3 issue_dS.py I get this output which is also correct.

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.47619510276556726
0.38313327447585555

But when I increase the number of processses even more, such as mpirun -n 20 python3 issue_dS.py, the program hangs in this state and never finishes:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

I do not know if this is issue with the mesh or with dolfinx. I am doing something wrong? The issue arises when I increase the number of processors above 3. I think it happens when the mesh is partitioned in such a way that there no facets with mark 2 present in some of the processes.

The mesh has been generated using gmsh from a step file. Here is a Link to the mesh (58MB).

Is this issue in dolfinx? If so I will report it on Github.

Thank you for any reply.

francesco-ballarin · May 14, 2024, 3:43pm

There are probably two different issues here:

using dS in parallel requires a specific ghost mode, called GhostMode.shared_facet. See for instance dolfinx/python/demo/demo_biharmonic.py at b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106 · FEniCS/dolfinx · GitHub for how to use it with a built in demo. io.gmshio.read_from_msh has an optional argument for that too, I’ll let you look in the help what’s its name (still ghost_mode, probably).
when running in mpirun -n 2 you are seeing two different values because the integral is only computed on the local part of the mesh. That is why demos use parallel communication to collect the results across all processes, see e.g. dolfinx/python/demo/demo_lagrange_variants.py at b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106 · FEniCS/dolfinx · GitHub

LiborKudela · May 14, 2024, 7:49pm

Thank you @francesco-ballarin for the response,

I am aware that I have to collect the local contributions if I want the “global” value. My issue was that when I use mpirun -n 20 I should get 20 such local contributions but I get only 11 (and those are all zero).The rest of the ranks never returns as if they gets stuck/blocked in some MPI call. I guess that the ranks that have the facets I am trying to integrate over cannot finish for some reason.

I have tried your suggestion (I hope that I use the current API of read_from_msh correctly) but the issue persists:

from mpi4py import MPI
from dolfinx import io, fem, __version__
import ufl
from dolfinx.mesh import GhostMode, create_cell_partitioner
if MPI.COMM_WORLD.rank == 0:
    print(__version__)

path = 'meshes/urbanek/mesh.msh'
partitioner = create_cell_partitioner(GhostMode.shared_facet)
domain, cell_tags, facet_tags = io.gmshio.read_from_msh(path, MPI.COMM_WORLD, 0, gdim=3, partitioner=partitioner)
dS = ufl.Measure("dS", domain=domain, subdomain_data=facet_tags)
scalar_value = fem.assemble_scalar(fem.form(8*dS(2)))
print(f"rank = {MPI.COMM_WORLD.rank}, value = ", scalar_value)

When I run this with mpirun -n 4 or higher number of processes I do not get the right number of local values, the rest hangs and does not return:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
rank = 0, value =  0.0
rank = 2, value =  0.0

rank 1 and 3 do not return anything and the process keeps running forever.

Btw. when I use dS instead of dS(2) it does work and every rank returns its local value.

The result when the program contains dS instead of dS(2) is this:

0.8.0
Info    : Reading 'meshes/urbanek/mesh.msh'...
Info    : 75 entities
Info    : 224103 nodes
Info    : 1255543 elements                                              
Info    : Done reading 'meshes/urbanek/mesh.msh'                           
rank = 0, value =  1159.8587287839784
rank = 2, value =  1257.3998569644205
rank = 3, value =  233.38589352944297
rank = 1, value =  165.13436971572628

dokken · May 15, 2024, 8:04am

Temporary fix is to all

for i in range(domain.topology.dim + 1):
    domain.topology.create_entities(i)

prior to assembly.

Issue posted at:

github.com/FEniCS/dolfinx

interior_facet_assembly causes deadlock if there are no interior facets on the process

opened 08:02AM - 15 May 24 UTC

jorgensd

bug high-priority backport?

### Summarize the issue Issue reported at: https://fenicsproject.discourse.gr…oup/t/scallar-assembly-of-a-internal-surface-integral-misbehaves-in-parallel/14655/3 The problem is that not all processes get to `create_entity_permutations` https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/assemble_scalar_impl.h#L190-L192 which calls `mesh.topology.create_entities(d)` for d=0,....,mesh.topology.dim. In some cases, `create_entities(d)` needs parallel communication, causing a deadlock. We need a code re-design to avoid these deadlock situations, as we cannot have `create_entity_permutations` inside loops that needs to be true on all processes. To fix this for assembly, I would change how we compute `integration_entities`, i.e, https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/cpp/dolfinx/fem/utils.cpp#L131-L151 as we currently send in a full meshtags object, which doesn't have the subdomain ids from the form in question https://github.com/FEniCS/dolfinx/blob/b16ef8abc20d70c28d8f1850fe42a0ee0bcf5106/python/dolfinx/fem/forms.py#L214-L215 I think this should take the meshtag information, and the id's we are interested in extracting (currently we extract way more than we need). This would yield integrals on all processes (with 0 facets on some). ### How to reproduce the bug Run following code with 1, 2 and 4 processes. Works on 1 and 2 processes, deadlocks on 4. ### Minimal Example (Python) ```Python from mpi4py import MPI import numpy as np import dolfinx import ufl if MPI.COMM_WORLD.rank == 0: print(dolfinx.__version__, dolfinx.git_commit_hash) domain = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, 5, 6,6, ghost_mode=dolfinx.mesh.GhostMode.shared_facet) def marker(x, eps=1e-14): return np.logical_and(x[0]<=0.5+eps,np.isclose(x[1], 0.5)) domain.topology.create_connectivity(domain.topology.dim - 1, domain.topology.dim) domain.topology.create_connectivity(domain.topology.dim, domain.topology.dim-1) facet_map = domain.topology.index_map(domain.topology.dim - 1) num_facets_on_proc = facet_map.size_local + facet_map.num_ghosts facets = np.arange(num_facets_on_proc, dtype=np.int32) facet_values = np.zeros(num_facets_on_proc, dtype=np.int32) facet_values[dolfinx.mesh.locate_entities(domain, domain.topology.dim - 1, marker)] = 2 ft = dolfinx.mesh.meshtags(domain, domain.topology.dim - 1, facets, facet_values) dS = ufl.Measure("dS", domain=domain, subdomain_data=ft, subdomain_id=2) form = dolfinx.fem.form(1*dS) print(dolfinx.fem.assemble_scalar(form)) ``` ### Output (Python) ```bash 0.9.0.0 dbbbf9167f662f4bcc2c45da5a5998433ae24863 ``` ### Version main branch ### DOLFINx git commit dbbbf9167f662f4bcc2c45da5a5998433ae24863 ### Installation _No response_ ### Additional information _No response_

LiborKudela · May 15, 2024, 9:40am

I see that a PR that should solve this has been also requested by you.

Thank you for looking into this, you are the best @dokken !

dokken · May 15, 2024, 9:51am

Yes, it might need some TLC, but it at least fixes it when I run the example you posted on my system, and my smaller manufactured example. There are some slightly bigger issues that this points to, that also has to be addressed (but it shouldn’t affect the API directly).

Topic		Replies	Views
Regarding the use of dolfinx.fem.assemble_scalar General	2	837	January 23, 2023
Assemble Scalar for Boundary Integral in Parallel	1	842	December 27, 2021
Ffcx compilation error Errors dolfinx	2	40	December 18, 2024
Incompatible function arguments for assemble_scalar Errors	1	1131	April 29, 2022
Why is 'fem.assemble_scalar(fem.form(ufl.cross(ufl_vec, sigma(u)n)[1]ds(ds_section)))' throwing "ValueError: Index out of bounds." General	3	34	February 3, 2025

Scallar assembly of a internal surface integral misbehaves in parallel

Related topics