Manual ffcx_jit compilation + assemble_matrix effects in PETSC ERROR: Caught signal number 11 SEGV

Hello,

I have been working on a pipeline to combine custom kernels with custom assembler. As a part of sanity tests, I am trying to manually compile a standard UFL formulation using ffcx_jit, and then to import it and assemble. However, apparently I am doing something incorrectly, because I receive the PETSC runtime SEGV error. The minimal code is the following:

import numpy
import dolfinx
import ufl
import numba.core.typing.cffi_utils as cffi_support
from dolfinx.fem import Function, functionspace
from dolfinx.fem import (dirichletbc, locate_dofs_geometrical)
from dolfinx.fem.petsc import assemble_matrix as petsc_assemble_matrix
from dolfiny.function import unroll_dofs
from dolfinx.jit import ffcx_jit
from mpi4py import MPI

mesh = dolfinx.mesh.create_box(MPI.COMM_WORLD, [[0.0, 0.0, 0.0], [1.0, 1.0, 1.0]], [2,2,2])
V = functionspace(mesh, ("Lagrange", 1, (mesh.geometry.dim, )))
u = Function(V)

u_left = Function(V)
u_right = Function(V)

left_dofs = locate_dofs_geometrical(V, lambda x: numpy.isclose(x[0], 0.0))
right_dofs = locate_dofs_geometrical(V, lambda x: numpy.isclose(x[0], 1.0))

bcs = [dirichletbc(u_left, left_dofs), dirichletbc(u_right, right_dofs)]

# Trial and test functions
du = ufl.TrialFunction(V)
v = ufl.TestFunction(V)

# Functions
u = Function(V)

# Kinematics
d = 3
Id = ufl.Identity(d)
F = Id + ufl.grad(u)
C = F.T * F

# Invariants of deformation tensors
Ic = ufl.tr(C)
J = ufl.det(F)

# Elasticity parameters
E = 10.0
nu = 0.3
mu = E / (2 * (1 + nu))
lmbda = E * nu / ((1 + nu) * (1 - 2 * nu))

# Stored strain energy density (compressible neo-Hookean model)
psi = (mu / 2) * (Ic - 3) - mu * ufl.ln(J) + (lmbda / 2) * (J - 1) ** 2

# Total potential energy
Pi = psi * ufl.dx(mesh, metadata={"quadrature_degree": 2})

# First variation of Pi (directional derivative about u in the direction of v)
R = -ufl.derivative(Pi, u, v)

# Compute Jacobian of F
K = -ufl.derivative(R, u, du)

ufc_form_K, module_K, code_k = ffcx_jit(mesh.comm, K)
ffc_K_ptr = module_K.ffi.addressof(ufc_form_K)

# Prepare dolfinx.Form based on provided kernels
cells = numpy.arange(mesh.topology.index_map(mesh.topology.dim).size_local, dtype=numpy.int32)

ptrA = module_K.ffi.cast("intptr_t", ffc_K_ptr)
integrals_a = {dolfinx.fem.IntegralType.cell: [(-1, ptrA, cells)]}

_a = dolfinx.cpp.fem.Form_float64([V._cpp_object, V._cpp_object], integrals_a, [], [], False, {}, None)
a = dolfinx.fem.Form(_a)

A_dolfinx_asm = petsc_assemble_matrix(a, bcs)

Thank you in advance for helping me to understand where my mistake is. :slight_smile:

1 Like

Can reproduce on main branch with the following minimal example:


from mpi4py import MPI


import numpy
import dolfinx
import ufl
from dolfinx.fem import  assemble_scalar

from dolfinx.jit import ffcx_jit

mesh = dolfinx.mesh.create_unit_interval(MPI.COMM_SELF, 1)

# Trial and test functions
K = 1 * ufl.dx(domain=mesh)
ufc_form_K, module_K, code_k = ffcx_jit(mesh.comm, K)
ffc_K_ptr = module_K.ffi.addressof(ufc_form_K)

# Prepare dolfinx.Form based on provided kernels
cells = numpy.arange(mesh.topology.index_map(mesh.topology.dim).size_local, dtype=numpy.int32)

ptrA = module_K.ffi.cast("uintptr_t", ffc_K_ptr)
integrals_a = {dolfinx.fem.IntegralType.cell: [(-1, ptrA, cells, numpy.empty(0, dtype=numpy.uint8))]}


_J = dolfinx.cpp.fem.Form_float64([], integrals_a, [], [], False, {}, mesh._cpp_object)
func = dolfinx.fem.Form(_J)

val = assemble_scalar(func)

will investigate and get back to you.

1 Like

The following works for me:

integrals_b = {dolfinx.fem.IntegralType.cell: [(-1, cells)]}
_a = dolfinx.cpp.fem.create_form_float64(ptrA,
        [V._cpp_object, V._cpp_object],
        {"w0": u._cpp_object},
        {},
        integrals_b,
        {},
        mesh._cpp_object)
a = dolfinx.fem.Form(_a)

Could you test it as well?

Edit:
This is only possible on the main branch.
One thing I see as an issue is that you are not passing the coefficients u into your form.

Thank you Jorgen for having a closer look at this.

As I am currently working with the version 0.8.0, I am not able to test your version immediately. I will need to first try to install the version from the main branch (I am relatively new to FEniCS). Of course, the preference would be to have it working in the version 0.8.0…

In the original (not minimal) version I was actually passing the coefficients:

dolfinx.cpp.fem.Form_float64(
       [V._cpp_object, V._cpp_object], 
       integrals_a, 
       [u._cpp_object], 
       [], False, {}, None)

so it does not seem to be the (fundamental) problem.

The provided pipeline worked normally for me for the custom jit kernels, similarly as in the unit test:
python/test/unit/fem/test_custom_jit_kernels.py
My desire was to have a single pipeline that will work both for ufl and custom kernels, so I can validate the results and debug my custom assemblers. Therefore, the best would be to have a solution that works the same for both types of kernels.

I was also looking at the kernel C code generated by ffcx_jit, and I could not find anything there that can explain the memory allocation error (the loop ranges and the access to lists seemed to be correct).

You should be able to use something along the lines of:

integrals_a = {dolfinx.fem.IntegralType.cell: [(-1, cells)]}
_J = dolfinx.cpp.fem.Form_float64(aq, [], [], [], integrals_a,  {}, mesh._cpp_object)

where one adapts the second list to be the coefficients. (first list is the function spaces, second coeffs, third constants).

To me it looks like something goes wrong in the nanobind layer to C++.
I’ve made an issue at: C++ form creation causes segfault when going through form constructor · Issue #3411 · FEniCS/dolfinx · GitHub

Thank you Jorgen for proposing the alternative (working) constructor and for raising an issue in the separate thread.

While further investigating the problem, I discovered that the type of module_K.ffi.addressof(ufc_form_K) is cdata 'struct ufcx_form *', while in the unit test for test_custom_jit_kernels.py, the return type of ffi.addressof(lib, "tabulate_tensor_poissonA") is sth. like cdata 'void(*)(double *, double *, double *, double *, int *, uint8_t *)'.

In order to make both frameworks compatible, my workaround hack trick was to directly access the respective field in the struct ufcx_form *, i.e.,
ffc_K_ptr = ufc_form_K.form_integrals[0].tabulate_tensor_float64
The whole minimal (working) example reads:

from mpi4py import MPI

import numpy
import dolfinx
import ufl
from dolfinx.fem import  assemble_scalar

from dolfinx.jit import ffcx_jit

mesh = dolfinx.mesh.create_unit_interval(MPI.COMM_SELF, 1)

# Trial and test functions
K = 1 * ufl.dx(domain=mesh)
ufc_form_K, module_K, code_k = ffcx_jit(mesh.comm, K)
# ffc_K_ptr = module_K.ffi.addressof(ufc_form_K)
ffc_K_ptr = ufc_form_K.form_integrals[0].tabulate_tensor_float64

# Prepare dolfinx.Form based on provided kernels
cells = numpy.arange(mesh.topology.index_map(mesh.topology.dim).size_local, dtype=numpy.int32)

ptrA = module_K.ffi.cast("uintptr_t", ffc_K_ptr)
# integrals_a = {dolfinx.fem.IntegralType.cell: [(-1, ptrA, cells, numpy.empty(0, dtype=numpy.uint8))]}
integrals_a = {dolfinx.fem.IntegralType.cell: [(-1, ptrA, cells)]}

_J = dolfinx.cpp.fem.Form_float64([], integrals_a, [], [], False, {}, mesh._cpp_object)
func = dolfinx.fem.Form(_J)

val = assemble_scalar(func)

(Note the two commented lines that were replaced to make it working)

Is there a more proper (and safer) way to do the same (in version 0.8.0)?

Right, that is what goes wrong.

I guess it is a shame that nanobind doesn’t pick up the invalid cast.

I guess that depends on what you want to to with the forms.
Now that you know that the sanity test requires you to pass just the tabulate_tensor_float64, which makes much more sense from a “custom”-assembler perspective, I am not sure what you would want.

I would need to see an actual use-case to understand what would be more “proper”.

Thank you for confirming. And thank you again for your support on solving that issue.

My only hesitation is that ufc_form_K.form_integrals[0] relies heavily on the low-level representation of data structures. But I guess, there is not too much to improve in that particular case, as the example itself is a bit non-standard.

I would then use the other constructor, as it is more “high-level”, i.e.

True, it works for the standard petsc_assemble_matrix(...). But if I want to call the tabulate tensor itself in my custom assembler, then I need to have a direct access to ufc_form_K.form_integrals[0].tabulate_tensor_float64. In the unit test for custom assemblers I can see:

    # Get the one and only kernel
    kernel = getattr(ufcx_form.form_integrals[0], f"tabulate_tensor_{np.dtype(dtype).name}")

which partially confirms that this is the preferred way to do it. I was just wondering if there is anything more “high-level” that I could use for that purpose. If not, then I can live with what is available :slight_smile: