Hi all, I tried to generate the mesh with DOLFINX in parallel. The MWE is as follows.
from mpi4py import MPI
import dolfinx
from dolfinx.cpp.mesh import CellType
mesh = dolfinx.BoxMesh(MPI.COMM_WORLD,[[0.0,0.0,0.0], [200, 200, 200]], [96, 96, 96], CellType.hexahedron)
It works normally when I use 14 cores (or less than 14) with the command mpirun -n 14 python3 test.py
.
However, when I use 15 (or more than 15) cores with the command mpirun -n 15 python3 test.py
, an error message appears as
Assertion failed in file src/mpi/comm/comm_rank.c at line 55: 0
Assertion failed in file src/mpi/comm/comm_rank.c at line 55: 0
Assertion failed in file src/mpi/comm/comm_rank.c at line 55: 0
/lib/x86_64-linux-gnu/libmpich.so.12(MPL_backtrace_show+0x39) [0x7f8925586069]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x29bfe8) [0x7f89254e2fe8]
/lib/x86_64-linux-gnu/libmpich.so.12(MPI_Comm_rank+0x218) [0x7f89253e3fb8]
/usr/local/petsc/linux-gnu-real-32/lib/libpetsc.so.3.16(PetscFPrintf+0x9e) [0x7f891e7ce92e]
/usr/local/petsc/linux-gnu-real-32/lib/libpetsc.so.3.16(PetscErrorPrintfDefault+0x9e) [0x7f891e89e71e]
/usr/local/petsc/linux-gnu-real-32/lib/libpetsc.so.3.16(PetscSignalHandlerDefault+0x149) [0x7f891e89f889]
/usr/local/petsc/linux-gnu-real-32/lib/libpetsc.so.3.16(+0x17fad7) [0x7f891e89fad7]
/lib/x86_64-linux-gnu/libc.so.6(+0x41040) [0x7f8926187040]
/lib/x86_64-linux-gnu/libc.so.6(+0x1831cc) [0x7f89262c91cc]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x28eb99) [0x7f89254d5b99]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x326c82) [0x7f892556dc82]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x327e97) [0x7f892556ee97]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x32552e) [0x7f892556c52e]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x31caf0) [0x7f8925563af0]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x1e2e7e) [0x7f8925429e7e]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x1e3240) [0x7f892542a240]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x27b7d1) [0x7f89254c27d1]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x217da2) [0x7f892545eda2]
/lib/x86_64-linux-gnu/libmpich.so.12(+0x15893e) [0x7f892539f93e]
/lib/x86_64-linux-gnu/libmpich.so.12(PMPI_Alltoallv+0xaf1) [0x7f89253a04b1]
/usr/local/dolfinx-real/lib/libdolfinx.so.0.3(_ZN7dolfinx5graph5build10distributeEiRKNS0_13AdjacencyListIlEERKNS2_IiEE+0x89a) [0x7f89200aeeaa]
/usr/local/dolfinx-real/lib/libdolfinx.so.0.3(_ZN7dolfinx4mesh11create_meshEiRKNS_5graph13AdjacencyListIlEERKNS_3fem17CoordinateElementERKN2xt17xtensor_containerINSA_7uvectorIdSaIdEEELm2ELNSA_11layout_typeE1ENSA_22xtensor_expression_tagEEENS0_9GhostModeERKSt8functionIFKNS2_IiEEiiiS5_SK_EE+0x15f) [0x7f8920127abf]
/usr/local/dolfinx-real/lib/libdolfinx.so.0.3(+0x10605d) [0x7f892008805d]
/usr/local/dolfinx-real/lib/libdolfinx.so.0.3(_ZN7dolfinx10generation7BoxMesh6createEiRKSt5arrayIS2_IdLm3EELm2EES2_ImLm3EENS_4mesh8CellTypeENS8_9GhostModeERKSt8functionIFKNS_5graph13AdjacencyListIiEEiiiRKNSD_IlEESA_EE+0x85) [0x7f89200884d5]
/usr/local/dolfinx-real/lib/python3.8/dist-packages/dolfinx/cpp.cpython-39-x86_64-linux-gnu.so(+0x128e94) [0x7f892032ae94]
/usr/local/dolfinx-real/lib/python3.8/dist-packages/dolfinx/cpp.cpython-39-x86_64-linux-gnu.so(+0x4e443) [0x7f8920250443]
python3() [0x54350c]
python3(_PyObject_MakeTpCall+0x39b) [0x521d6b]
python3(_PyEval_EvalFrameDefault+0x5be8) [0x51b9f8]
python3() [0x514a75]
python3(_PyFunction_Vectorcall+0x342) [0x52d302]
python3(_PyEval_EvalFrameDefault+0x559c) [0x51b3ac]
internal ABORT - process 0
So could anyone please tell me what happens?
Another possibly related question: I have a code solving the nonlinear elasticity problem. It works for the 48-48-48 mesh perfectly. However, the residual is -NaN in the first Newton iteration for the 96-96-96 mesh. I tried all the methods mentioned in the forum including avoiding the division by zero, adding a small number in the ufl.sqrt()
, using a random initial guess and so on. I still could not solve this problem. I would like to know is it related to the above problem?
Thanks!