[dolfin] transfer matrix crash with petsc 3.20

The conda-forge fenics package has been updated to petsc 3.20, which seems to have a problem with the transfer matrix with this sample code:

from fenics import *

mesh = UnitSquareMesh(8,8)
V = FunctionSpace(mesh, 'CG', 1)
W = FunctionSpace(mesh, 'CG', 2)

transfer_matrix = as_backend_type(PETScDMCollection.create_transfer_matrix(V, W)).mat()
_, temp = transfer_matrix.getVecs() # fails

Originally opened on GitHub.

In particular, creating the transfer matrix fails at first with a segfault, ultimately tracked down to the fact that PetscInitialize has not been called. It seems like petsc should be initialized before using petsc objects, so that seems like clearly a bug in fenics, though I’m not sure how it hasn’t come up before.

That step can be fixed easily enough with

from mpi4py import MPI
from petsc4py import PETSc

before importing fenics. However, that only gets us one step further, failing with a new error in the same step:

PetscDolfinErrorHandler: line '208', function 'PetscCommDuplicate', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/tagm.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------
MPI error 70865157 Invalid communicator, error stack:
                MPII_Comm_get_attr(85): MPI_Comm_get_attr(comm=0x84000003, keyval=0xa4400000, attribute_val=0x305e76450, flag=0x305e76444) failed
                MPII_Comm_get_attr(58): Invalid communicator
------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '51', function 'PetscHeaderCreate_Private', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/inherit.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '26', function 'PetscHeaderCreate_Function', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/sys/objects/inherit.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '101', function 'VecCreateWithLayout_Private', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/vec/vec/interface/veccreate.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

------------------------------------------------------------------------------
PetscDolfinErrorHandler: line '9468', function 'MatCreateVecs', file '/Users/runner/miniforge3/conda-bld/petsc_1696941223631/work/src/mat/interface/matrix.c',
                       : error code '98' (General MPI error), message follows:
------------------------------------------------------------------------------

which I’ve ultimately tracked down to the MPI Comm in the PetscLayout of the matrix being invalid, somehow despite the MPI Comm in the PetscMatrix still being fine. I suspect it’s related to either initialization or premature finalization of something, but there are too many layers for me to debug whether fenics is doing something wrong that relied on undefined behavior, or if there’s just a bug in petsc.

I’m not sure what has changed in petsc 3.20 that caused these failures, but it worked fine with petsc 3.19. All of the dolfin transfer matrix tests still pass, despite this.

I’ve tried to extract the transfer matrix construction into a standalone C++ program, but it’s apparently missing the key part that causes the crash.

More info in:

2 Likes