Regarding the issue of being unable to import grids in the presence of too many processes(2000+)

Dear all,
I don’t know if you have encountered the problem of not being able to import grids when using dolfinx for large-scale operations.I tested the 1024 process and 2048 process, and found that the 1024 process can import easily, but there will be no output in the 2048 process. Through my multiple output debugging attempts, I am confident that the code did not run smoothly when importing the grid.

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "grid.xdmf", "r")  
msh = read_mesh.read_mesh()   

The specific situation is that there is no output and no error is reported.

Could you try setting log level to debug?
dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG

You also break the command into multiple steps, see Line 94-118 in:

Thank dokken,
I try setting log level to debug

comm = MPI.COMM_WORLD  
rank=MPI.COMM_WORLD.Get_rank()  
if rank==0:
    dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG)

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "big_cut.xdmf" ,"r")      
msh=read_mesh.read_mesh()     
gdim = msh.geometry.dim     

There was no output and I killed it after running for 1 hour.
Could it be caused by too many processes reading the one grid file?

Set debugging on all processes.

Secondly, try first just reading the topology and geometry data, not actually creating the mesh (which calls a mesh partitioner), as illustrated in the referenced code above.

That should not be an issue. DOLFINx has been run with over 100 000 procs.

code and logs are as follows:

comm = MPI.COMM_WORLD  
rank=MPI.COMM_WORLD.Get_rank()  

dolfinx.log.set_log_level(dolfinx.log.LogLevel.DEBUG)

mpi_comm = MPI.COMM_WORLD
filename = "big_cut.xdmf"

with XDMFFile(MPI.COMM_SELF, filename, "r") as file:
    x_global = file.read_geometry_data()

with XDMFFile(MPI.COMM_WORLD, filename, "r") as file:
    cell_shape, cell_degree = file.read_cell_type()
    x = file.read_geometry_data()
    topo = file.read_topology_data()

num_local_coor = x.shape[0]
all_sizes = mpi_comm.allgather(num_local_coor)
all_sizes.insert(0, 0)
all_ranges = np.cumsum(all_sizes)

rank = mpi_comm.rank
assert np.all(x_global[all_ranges[rank] : all_ranges[rank + 1]] == x)

domain = ufl.Mesh(element("Lagrange", cell_shape.name, cell_degree, shape=(3,)))

read_mesh = io.XDMFFile(MPI.COMM_WORLD, "big_cut.xdmf" ,"r")      
msh=read_mesh.read_mesh()     
gdim = msh.geometry.dim   
srun: ROUTE: split_hostlist: hl=a3105n04,b2101r5n[1-8],b3201r3n[6-8],b3201r4n1,b3205r3n7,b3208r6n[3-4],b3302r3n[7-8],b3303r3n[3-4],b3303r4n[1-4],b3309r4n8,b3309r5n[1-2],b3309r7n[2-3],e2306r2n[1-3] tree_width 0
[2025-09-13 18:11:36.581] [info] Opened HDF5 file with id "72057594037927936"
[2025-09-13 18:11:36.581] [info] Opened HDF5 file with id "72057594037927936"
......
[2025-09-13 18:11:49.867] [info] HDF5 Read data rate: 43.15997436780609 MB/s
[2025-09-13 18:11:49.868] [info] HDF5 Read data rate: 44.45196286907281 MB/s
[2025-09-13 18:11:49.868] [info] HDF5 Read data rate: 43.764222893425725 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.18377704887125 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.25279702747072 MB/s
[2025-09-13 18:11:49.870] [info] HDF5 Read data rate: 44.23919898165143 MB/s

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 22222733 ON a3105n04 CANCELLED AT 2025-09-13T19:11:13 DUE TO TIME LIMIT ***
slurmstepd: error: *** STEP 22222733.0 ON a3105n04 CANCELLED AT 2025-09-13T19:11:13 DUE TO TIME LIMIT ***

Topology information was not successfully read.

Could you provide your xdmf file?
It is very weird that it is able to read the geometry, but not the topology.

https://airportal.cn/171828/Rr9NUfZLHi
If you are unable to download, please feel free to contact me at any time
171828

Just looking at the size of your grid:

<?xml version="1.0"?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="3.0" xmlns:xi="https://www.w3.org/2001/XInclude">
  <Domain>
    <Grid Name="mesh" GridType="Uniform">
      <Topology TopologyType="Hexahedron" NumberOfElements="342924" NodesPerElement="8">
        <DataItem Dimensions="342924 8" NumberType="Int" Format="HDF">big_cut.h5:/Mesh/mesh/topology</DataItem>
      </Topology>
      <Geometry GeometryType="XYZ">
        <DataItem Dimensions="353080 3" Format="HDF">big_cut.h5:/Mesh/mesh/geometry</DataItem>
      </Geometry>
    </Grid>
  </Domain>
</Xdmf>

why are you trying to use more than 2000 procs to read this? There will likely be more overhead in parallel communication than the reduced size per process will give you in speedup.

I do unfortunately not have a system with 2000+ procs available, so I won’t be able to reproduce the issue.

I’m assuming something goes wrong when you are calling: dolfinx/cpp/dolfinx/io/xdmf_mesh.cpp at d6a31042464a91a2465d34c94e18b3e0d6ee9788 · FEniCS/dolfinx · GitHub
and maybe this assertion: dolfinx/cpp/dolfinx/io/xdmf_utils.h at d6a31042464a91a2465d34c94e18b3e0d6ee9788 · FEniCS/dolfinx · GitHub

How did you install dolfinx on your cluster? Is there a possibility to run in debug mode (compile c++ part of DOLFINx so that assertions are not skipped).

Although the grid is not large, with only 3 * 10 ^ 5 cells, there are also 2 * 10 ^ 6 dofs defined for the second-order scalar field above. I need to solve the mixed finite element equations of (u, p, B, T) four physical fields in this grid The total number of degrees of freedom is 20 million .I want to conduct strong scalability testing.

I installed dolfinx through conda on the cluster.version is 0.9

2000 processes for 20 million cells seems overboard, as you are down to 10 000 cells per process.

Do you have access to an HPC system with 2000 processes that uses conda?:open_mouth:
I would strongly suggest using Spack on HPC systems, as conda is pre-built binaries not optmized for that particular system.

Dear dokken,
Thank you for your suggestion.
I only want to conduct strong scalability testing to analyze the performance of the solver, and I won’t actually use 2000+processes to run the code.
Is this problem probably not caused by Conda?

It could be caused by conda, I am not certain.

As I’ve mentioned above, my guess is that something might go wrong in these lines:

however, it would suprise me, as there are still plenty of cells to distribute on each process (even if it is not enough cells to be efficient).

I would strongly advice not using conda on HPC systems, as it:

  • Doesn’t use the system compilers to make the code as efficient as it should
  • One easily get into mpi mismatch issues.

Second to what @dokken states, using Conda for an HPC build is typically a really bad idea. To get performance from HPC clusters you really have to tailor your build to the system. This is not trivial.

Have you tried solving your problem on a simple workstation? 20M DoFs is really not enough to warrant use of 2k processes.