Read_meshtags causing invalid rank issues

Hi I have a 3D mesh that I’ve converted from gmsh with mesh tags. I import it as follows

from mpi4py import MPI
from dolfinx.io import XDMFFile
print('loading mesh')
with XDMFFile(MPI.COMM_WORLD, f'./mesh/AoA0_v2naca0012_AR2.xdmf', 'r') as xdmf:
    mesh = xdmf.read_mesh(name='Grid')
    MPI.COMM_WORLD.Barrier()
    print('loaded mesh', flush = True)
    ct = xdmf.read_meshtags(mesh, name='Grid')
    print('loaded mesh tags', flush = True)
MPI.COMM_WORLD.Barrier()

I launch it with mpirun -n 1 python3 test.py But for some reason the import seems to fail on some meshes, where the only change was the value I gave to the cell sizing in gmsh.model.geo.add_point(x, y, z, sizing) for a number of defined points.

Here I have provided two meshes on a google drive.

The one labelled v1 runs to completion with albeit with [WARNING] yaksa: 1 leaked handle pool objects but the one labelled v2 has the following output

loading mesh
loaded mesh
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffdfdd813b3, count=1, MPI_BYTE, 1, 1, comm=0x84000001, request=0x55fb920560c4) failed
internal_Issend(78).: Invalid rank has value 1 but must be nonnegative and less than 1
Abort(943321862) on node 0 (rank 0 in comm 464): application called MPI_Abort(comm=0x84000001, 943321862) - process 0

If I don’t put flush=True on the print it doesn’t even print loaded mesh.

This occurs in a conda install from the dev version early this year and uses openmpi installed with conda and with impi installed on the cluster. I also tried this on a later spack install that uses openmpi installed on the cluster and this is the output

MPI_ERR_RANK: invalid rank
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 5 DUP FROM 3
with errorcode 6.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

I also sometimes see this when trying a denser mesh with multiple processors spread over multiple nodes

Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffca70f1883, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x5574f8faaad4) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(741995270) on node 188 (rank 188 in comm 416): application called MPI_Abort(comm=0xC4000001, 741995270) - process 188
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffd923518c3, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x563fa4f59944) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(406450950) on node 235 (rank 235 in comm 416): application called MPI_Abort(comm=0xC4000001, 406450950) - process 235
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffff129ebf3, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x55722c38b770) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(876212998) on node 242 (rank 242 in comm 416): application called MPI_Abort(comm=0xC4000001, 876212998) - process 242
Abort(413215375) on node 242 (rank 242 in comm 448): Fatal error in internal_Barrier: Other MPI error, error stack:
internal_Barrier(84).......................: MPI_Barrier(comm=0x84000007) failed
MPID_Barrier(167)..........................: 
MPIDI_Barrier_allcomm_composition_json(132): 
MPIDI_POSIX_mpi_bcast(224).................: 
MPIR_Bcast_impl(444).......................: 
MPIR_Bcast_allcomm_auto(370)...............: 
MPIR_Bcast_intra_binomial(105).............: 
MPIC_Recv(187).............................: 
MPIC_Wait(64)..............................: 
MPIR_Wait_state(886).......................: 
MPID_Progress_wait(335)....................: 
MPIDI_progress_test(158)...................: 
MPIDI_OFI_handle_cq_error(625).............: OFI poll failed (ofi_events.c:627:MPIDI_OFI_handle_cq_error:Input/output error)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 282872 RUNNING AT n688
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:14@n679] HYD_pmcd_pmip_control_cmd_cb (proxy/pmip_cb.c:480): assert (!closed) failed
[proxy:0:14@n679] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:14@n679] main (proxy/pmip.c:127): demux engine error waiting for event

Finally, I tried it with a spack install of dolfinx 0.6.0 on a different cluster with spack installed openmpi,
and it reports a segmentation violation. Although there was once where I got a mesh that wasn’t working to get past that segmentation violation and then it seg faulted on the read_meshtags for the facets.

Any insight on this would be much appreciated!

Bumping this to see if anyone has any insights. I’ve also found a mesh that works with 1 processor but doesn’t with 264 processors. Maybe it’s to do with the extra degrees of freedom from the ghost values? I don’t think it’s a memory issue though since I have 3.5TB of memory shared between the nodes and the utilisation % output is often very low.

Hello @hermanmak and @dokken,

I am getting the same issue while reading the meshtags, whereas there is no issue in reading the .msh from gmsh and writing to .xdmf file

MWE :

from mpi4py import MPI
from petsc4py import PETSc
import numpy as np
import dolfinx
import meshio

dtype = PETSc.ScalarType  # type: ignore
comm = MPI.COMM_WORLD
rank = comm.rank

def create_mesh(mesh, cell_type, prune_z=False):
    cells = mesh.get_cells_type(cell_type)
    cell_data = mesh.get_cell_data("gmsh:physical", cell_type)
    points = mesh.points[:, :2] if prune_z else mesh.points
    out_mesh = meshio.Mesh(points=points, cells={cell_type: cells}, cell_data={"name_to_read": [cell_data.astype(np.int32)]})
    return out_mesh

if rank == 0:
    # Read in mesh
    domain = meshio.read("unsymm2.msh")
    # Create and save one file for the mesh, and one file for the facets
    tetra_mesh = create_mesh(domain, "tetra", prune_z=False)
    tri_mesh = create_mesh(domain, "triangle", prune_z=False)
    meshio.write("mesh.xdmf", tetra_mesh)
    meshio.write("mt.xdmf", tri_mesh)
MPI.COMM_WORLD.barrier()

with dolfinx.io.XDMFFile(MPI.COMM_WORLD, "mesh.xdmf", "r") as xdmf:
    domain = xdmf.read_mesh(name="Grid")  # cell tags
    ct = xdmf.read_meshtags(domain, name="Grid")
domain.topology.create_connectivity(domain.topology.dim - 1, domain.topology.dim)
with dolfinx.io.XDMFFile(MPI.COMM_WORLD, "mt.xdmf", "r") as xdmf:
    facet_tag = xdmf.read_meshtags(domain, name="Grid") # facet tags
print(facet_tag.indices, facet_tag.values)

Error:

Invalid rank, error stack:
internal_Issend(60788): MPI_Issend(buf=0x559cd6a0f041, count=1, MPI_BYTE, 1, 1, comm=0x84000001, request=0x559cd584b5b4) failed
internal_Issend(60749): Invalid rank has value 1 but must be nonnegative and less than 1
Abort(876217094) on node 0 (rank 0 in comm 464): application called MPI_Abort(comm=0x84000001, 876217094) - process 0

I am attaching the link where you can find the .geo and .msh file for reference.
I highly appreciate your help in helping me resolve this issue sooner.

https://drive.google.com/drive/folders/1HSfOmvsDF_G2TPDTPqPNb_kGpRn3FkuF?usp=sharing

Thanks!