Hi I have a 3D mesh that I’ve converted from gmsh with mesh tags. I import it as follows
from mpi4py import MPI
from dolfinx.io import XDMFFile
print('loading mesh')
with XDMFFile(MPI.COMM_WORLD, f'./mesh/AoA0_v2naca0012_AR2.xdmf', 'r') as xdmf:
mesh = xdmf.read_mesh(name='Grid')
MPI.COMM_WORLD.Barrier()
print('loaded mesh', flush = True)
ct = xdmf.read_meshtags(mesh, name='Grid')
print('loaded mesh tags', flush = True)
MPI.COMM_WORLD.Barrier()
I launch it with mpirun -n 1 python3 test.py
But for some reason the import seems to fail on some meshes, where the only change was the value I gave to the cell sizing in gmsh.model.geo.add_point(x, y, z, sizing)
for a number of defined points.
Here I have provided two meshes on a google drive.
The one labelled v1 runs to completion with albeit with [WARNING] yaksa: 1 leaked handle pool objects
but the one labelled v2 has the following output
loading mesh
loaded mesh
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffdfdd813b3, count=1, MPI_BYTE, 1, 1, comm=0x84000001, request=0x55fb920560c4) failed
internal_Issend(78).: Invalid rank has value 1 but must be nonnegative and less than 1
Abort(943321862) on node 0 (rank 0 in comm 464): application called MPI_Abort(comm=0x84000001, 943321862) - process 0
If I don’t put flush=True
on the print it doesn’t even print loaded mesh
.
This occurs in a conda install from the dev version early this year and uses openmpi installed with conda and with impi installed on the cluster. I also tried this on a later spack install that uses openmpi installed on the cluster and this is the output
MPI_ERR_RANK: invalid rank
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 5 DUP FROM 3
with errorcode 6.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I also sometimes see this when trying a denser mesh with multiple processors spread over multiple nodes
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffca70f1883, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x5574f8faaad4) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(741995270) on node 188 (rank 188 in comm 416): application called MPI_Abort(comm=0xC4000001, 741995270) - process 188
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffd923518c3, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x563fa4f59944) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(406450950) on node 235 (rank 235 in comm 416): application called MPI_Abort(comm=0xC4000001, 406450950) - process 235
Invalid rank, error stack:
internal_Issend(118): MPI_Issend(buf=0x7ffff129ebf3, count=1, MPI_BYTE, 264, 1, comm=0xc4000001, request=0x55722c38b770) failed
internal_Issend(78).: Invalid rank has value 264 but must be nonnegative and less than 264
Abort(876212998) on node 242 (rank 242 in comm 416): application called MPI_Abort(comm=0xC4000001, 876212998) - process 242
Abort(413215375) on node 242 (rank 242 in comm 448): Fatal error in internal_Barrier: Other MPI error, error stack:
internal_Barrier(84).......................: MPI_Barrier(comm=0x84000007) failed
MPID_Barrier(167)..........................:
MPIDI_Barrier_allcomm_composition_json(132):
MPIDI_POSIX_mpi_bcast(224).................:
MPIR_Bcast_impl(444).......................:
MPIR_Bcast_allcomm_auto(370)...............:
MPIR_Bcast_intra_binomial(105).............:
MPIC_Recv(187).............................:
MPIC_Wait(64)..............................:
MPIR_Wait_state(886).......................:
MPID_Progress_wait(335)....................:
MPIDI_progress_test(158)...................:
MPIDI_OFI_handle_cq_error(625).............: OFI poll failed (ofi_events.c:627:MPIDI_OFI_handle_cq_error:Input/output error)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 282872 RUNNING AT n688
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:14@n679] HYD_pmcd_pmip_control_cmd_cb (proxy/pmip_cb.c:480): assert (!closed) failed
[proxy:0:14@n679] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:14@n679] main (proxy/pmip.c:127): demux engine error waiting for event
Finally, I tried it with a spack install of dolfinx 0.6.0 on a different cluster with spack installed openmpi,
and it reports a segmentation violation. Although there was once where I got a mesh that wasn’t working to get past that segmentation violation and then it seg faulted on the read_meshtags for the facets.
Any insight on this would be much appreciated!