Trying to convert mesh from MPI.COMM_SELF to MPI.COMM_WORLD

I’m working on a simulation that needs certain mesh refinements to be done before any computations are made, and because of the number of cores required I’m finding it too slow to do these refinements in parallel, so I’m trying to do them in serial, then partition the mesh to benefit from multiprocessing.

Here is the code I’m currently working on. It runs fine with a single core but throws undescribed malloc() errors when I call it with more cores.

        comm = MPI.COMM_WORLD
        serial_mesh = self.geometry.mesh

        rank = comm.rank

        if rank == 0:
            dim = serial_mesh.topology.dim
            cell_name = serial_mesh.ufl_cell().cellname()
            coords = serial_mesh.geometry.x
            cells = serial_mesh.topology.create_connectivity(dim, 0)
            cells = serial_mesh.topology.connectivity(dim, 0).array.reshape(-1, dim + 1)
            coords_shape = coords.shape
            cells_shape = cells.shape
            
        else:
            dim = None
            cell_name = None
            coords_shape = None
            cells_shape = None
        
        # Broadcast metadata
        dim = comm.bcast(dim, root=0)
        cell_name = comm.bcast(cell_name, root=0)
        coords_shape = comm.bcast(coords_shape, root=0)
        cells_shape = comm.bcast(cells_shape, root=0)

        # Allocate buffers
        if rank != 0:
            coords = np.empty(coords_shape, dtype=np.float64)
            cells = np.empty(cells_shape, dtype=np.int32)
        
        coords = comm.bcast(coords, root=0)
        cells = comm.bcast(cells, root=0)
        coords = coords[:, :dim]
        gdim = coords.shape[1]

        # Create vector Lagrange element for coordinates
        element = basix.ufl.element("Lagrange", cell_name, 1, shape=(gdim,))

        # Wrap into coordinate element 
        coord_element = ufl.Mesh(element)
        
        # Create parallel mesh from broadcasted data
        
        print(f"{rank}: Cell_Shape {cells.shape}, Coord_Shape {coords.shape}", flush= True)
        parallel_mesh = mesh.create_mesh(comm, cells, coords, coord_element)

Here, the mesh assigned to serial_mesh is the fully refined mesh, created with MPI.COMM_SELF on rank 0

Most of the refinement algorithm (Plaza) in DOLFINx should scale fine with an increasing number of processes. There is a some overhead when doing non-uniform refinement, where one has to iterate between processes to check if a shared edge has been refined on another process.

Additionally, as you haven’t provided a mesh in the code above, it is hard to say if the issue is with the code as it is, or the serial_mesh. Could you modify your code to perform the same on serial_mesh = dolfinx.mesh.create_unit_square(MPI.COMM_SELF, 10, 10) ?

Please also include all import statements to make it a fully reproducible code.

Why are you broadcasing all the cells to all the processes?
This seems problematic.
See for instance:

for a tutorial on using create_mesh in parallel.

1 Like

Ahh, I see what I did wrong now. Thanks!

Adding the now correct code to the post could potentially help others with similar issues in the future:)