Regarding the issue of being unable to import grids in the presence of too many processes（2000+）

wanghao · September 13, 2025, 5:21am

Dear all,
I don’t know if you have encountered the problem of not being able to import grids when using dolfinx for large-scale operations.I tested the 1024 process and 2048 process, and found that the 1024 process can import easily, but there will be no output in the 2048 process. Through my multiple output debugging attempts, I am confident that the code did not run smoothly when importing the grid.

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "grid.xdmf", "r")  
msh = read_mesh.read_mesh()

The specific situation is that there is no output and no error is reported.

dokken · September 13, 2025, 5:59am

Could you try setting log level to debug?
dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG

You also break the command into multiple steps, see Line 94-118 in:

github.com/FEniCS/dolfinx

python/test/unit/mesh/test_mesh_partitioners.py

d6a310424


      
          # Read topology data
          with XDMFFile(MPI.COMM_WORLD, filename, "r") as file:
              cell_shape, cell_degree = file.read_cell_type()
              x = file.read_geometry_data()
              topo = file.read_topology_data()
          
          num_local_coor = x.shape[0]
          all_sizes = mpi_comm.allgather(num_local_coor)
          all_sizes.insert(0, 0)
          all_ranges = np.cumsum(all_sizes)
          
          # Testing the premise: coordinates are read contiguously in chunks
          rank = mpi_comm.rank
          assert np.all(x_global[all_ranges[rank] : all_ranges[rank + 1]] == x)
          
          domain = ufl.Mesh(element("Lagrange", cell_shape.name, cell_degree, shape=(3,)))
          
          # Partition mesh in layers, capture geometrical data and topological
          # data from outer scope
          def partitioner(*args):

This file has been truncated. show original

wanghao · September 13, 2025, 8:41am

Thank dokken,
I try setting log level to debug

comm = MPI.COMM_WORLD  
rank=MPI.COMM_WORLD.Get_rank()  
if rank==0:
    dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG)

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "big_cut.xdmf" ,"r")      
msh=read_mesh.read_mesh()     
gdim = msh.geometry.dim

There was no output and I killed it after running for 1 hour.
Could it be caused by too many processes reading the one grid file?

dokken · September 13, 2025, 9:15am

Set debugging on all processes.

Secondly, try first just reading the topology and geometry data, not actually creating the mesh (which calls a mesh partitioner), as illustrated in the referenced code above.

That should not be an issue. DOLFINx has been run with over 100 000 procs.

wanghao · September 13, 2025, 11:49am

code and logs are as follows:

comm = MPI.COMM_WORLD  
rank=MPI.COMM_WORLD.Get_rank()  

dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG)

mpi_comm = MPI.COMM_WORLD
filename = "big_cut.xdmf"

with XDMFFile(MPI.COMM_SELF, filename, "r") as file:
    x_global = file.read_geometry_data()

with XDMFFile(MPI.COMM_WORLD, filename, "r") as file:
    cell_shape, cell_degree = file.read_cell_type()
    x = file.read_geometry_data()
    topo = file.read_topology_data()

num_local_coor = x.shape[0]
all_sizes = mpi_comm.allgather(num_local_coor)
all_sizes.insert(0, 0)
all_ranges = np.cumsum(all_sizes)

rank = mpi_comm.rank
assert np.all(x_global[all_ranges[rank] : all_ranges[rank + 1]] == x)

domain = ufl.Mesh(element("Lagrange", cell_shape.name, cell_degree, shape=(3,)))

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "big_cut.xdmf" ,"r")      
msh=read_mesh.read_mesh()     
gdim = msh.geometry.dim

srun: ROUTE: split_hostlist: hl=a3105n04,b2101r5n[1-8],b3201r3n[6-8],b3201r4n1,b3205r3n7,b3208r6n[3-4],b3302r3n[7-8],b3303r3n[3-4],b3303r4n[1-4],b3309r4n8,b3309r5n[1-2],b3309r7n[2-3],e2306r2n[1-3] tree_width 0
[2025-09-13 18:11:36.581] [info] Opened HDF5 file with id "72057594037927936"
[2025-09-13 18:11:36.581] [info] Opened HDF5 file with id "72057594037927936"
......
[2025-09-13 18:11:49.867] [info] HDF5 Read data rate: 43.15997436780609 MB/s
[2025-09-13 18:11:49.868] [info] HDF5 Read data rate: 44.45196286907281 MB/s
[2025-09-13 18:11:49.868] [info] HDF5 Read data rate: 43.764222893425725 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.18377704887125 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.25279702747072 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.23919898165143 MB/s

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 22222733 ON a3105n04 CANCELLED AT 2025-09-13T19:11:13 DUE TO TIME LIMIT ***
slurmstepd: error: *** STEP 22222733.0 ON a3105n04 CANCELLED AT 2025-09-13T19:11:13 DUE TO TIME LIMIT ***

Topology information was not successfully read.

dokken · September 14, 2025, 11:51am

Could you provide your xdmf file?
It is very weird that it is able to read the geometry, but not the topology.

wanghao · September 14, 2025, 12:58pm

https://airportal.cn/171828/Rr9NUfZLHi
If you are unable to download, please feel free to contact me at any time
171828

dokken · September 14, 2025, 1:51pm

Just looking at the size of your grid:

<?xml version="1.0"?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="3.0" xmlns:xi="https://www.w3.org/2001/XInclude">
  <Domain>
    <Grid Name="mesh" GridType="Uniform">
      <Topology TopologyType="Hexahedron" NumberOfElements="342924" NodesPerElement="8">
        <DataItem Dimensions="342924 8" NumberType="Int" Format="HDF">big_cut.h5:/Mesh/mesh/topology</DataItem>
      </Topology>
      <Geometry GeometryType="XYZ">
        <DataItem Dimensions="353080 3" Format="HDF">big_cut.h5:/Mesh/mesh/geometry</DataItem>
      </Geometry>
    </Grid>
  </Domain>
</Xdmf>

why are you trying to use more than 2000 procs to read this? There will likely be more overhead in parallel communication than the reduced size per process will give you in speedup.

I do unfortunately not have a system with 2000+ procs available, so I won’t be able to reproduce the issue.

I’m assuming something goes wrong when you are calling: dolfinx/cpp/dolfinx/io/xdmf_mesh.cpp at d6a31042464a91a2465d34c94e18b3e0d6ee9788 · FEniCS/dolfinx · GitHub
and maybe this assertion: dolfinx/cpp/dolfinx/io/xdmf_utils.h at d6a31042464a91a2465d34c94e18b3e0d6ee9788 · FEniCS/dolfinx · GitHub

How did you install dolfinx on your cluster? Is there a possibility to run in debug mode (compile c++ part of DOLFINx so that assertions are not skipped).

wanghao · September 15, 2025, 3:33am

Although the grid is not large, with only 3 * 10 ^ 5 cells, there are also 2 * 10 ^ 6 dofs defined for the second-order scalar field above. I need to solve the mixed finite element equations of (u, p, B, T) four physical fields in this grid The total number of degrees of freedom is 20 million .I want to conduct strong scalability testing.

I installed dolfinx through conda on the cluster.version is 0.9

dokken · September 15, 2025, 8:44am

2000 processes for 20 million cells seems overboard, as you are down to 10 000 cells per process.

Do you have access to an HPC system with 2000 processes that uses conda?
I would strongly suggest using Spack on HPC systems, as conda is pre-built binaries not optmized for that particular system.

wanghao · September 15, 2025, 10:36am

Dear dokken,
Thank you for your suggestion.
I only want to conduct strong scalability testing to analyze the performance of the solver, and I won’t actually use 2000+processes to run the code.
Is this problem probably not caused by Conda?

dokken · September 15, 2025, 11:50am

It could be caused by conda, I am not certain.

As I’ve mentioned above, my guess is that something might go wrong in these lines:

however, it would suprise me, as there are still plenty of cells to distribute on each process (even if it is not enough cells to be efficient).

I would strongly advice not using conda on HPC systems, as it:

Doesn’t use the system compilers to make the code as efficient as it should
One easily get into mpi mismatch issues.

nate · September 15, 2025, 3:05pm

Second to what @dokken states, using Conda for an HPC build is typically a really bad idea. To get performance from HPC clusters you really have to tailor your build to the system. This is not trivial.

Have you tried solving your problem on a simple workstation? 20M DoFs is really not enough to warrant use of 2k processes.

Topic		Replies	Views
Instability of mesh read with HPC I/O	17	607	February 26, 2024
Problems paralelizing a mesh generated using gmsh python api General mesh , dolfinx , mpi , gmsh	9	255	August 5, 2024
Missing Cells When Reading in Mesh in Parallel General mesh , dolfinx	9	131	August 9, 2024
Dolfinx petsc error for large mesh Errors	44	1580	August 31, 2023
Mesh conversion (XDMF) in a MPI environment mesh	4	1823	June 16, 2020

Regarding the issue of being unable to import grids in the presence of too many processes（2000+）

Related topics