Malloc while assembling Jacobian in parallel

Hello everyone,
I am trying to implement a fluid solid interaction problem using dolfinx (0.6.0) and multiphenicsx.
I used gmsh to create the mesh and converted it to xdmf using meshio. It consists of a fluid region (blue) and a solid region (red) separated by an interior boundary (with associated measure dS). It is available here.

I have a working version that gives expected results when I run the script on one processor. When I try to run it on multiple procs using mpirun however, it throws this error (here for 2 procs), that I have been trying to workaround without success :

Traceback (most recent call last):
File “/stck/ppenet/Scripts/RANS_flatplate/flatplate_small/FSI/./minimal_exemple.py”, line 250, in
Traceback (most recent call last):
File “/stck/ppenet/Scripts/RANS_flatplate/flatplate_small/FSI/./minimal_exemple.py”, line 250, in
J_mat.assemble()
File “PETSc/Mat.pyx”, line 1189, in petsc4py.PETSc.Mat.assemble
J_mat.assemble()
File “PETSc/Mat.pyx”, line 1189, in petsc4py.PETSc.Mat.assemble
petsc4py.PETSc.Error: error code 63
[0] MatAssemblyEnd() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/interface/matrix.c:5718
[0] MatAssemblyEnd_MPIAIJ() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/impls/aij/mpi/mpiaij.c:701
[0] MatSetValues_MPIAIJ() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/impls/aij/mpi/mpiaij.c:480
[0] Argument out of range
[0] New nonzero at (19639,19639) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check
petsc4py.PETSc.Error: error code 63
[1] MatAssemblyEnd() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/interface/matrix.c:5718
[1] MatAssemblyEnd_MPIAIJ() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/impls/aij/mpi/mpiaij.c:701
[1] MatSetValues_MPIAIJ() at /tmp/hmmak/spack-stage/spack-stage-petsc-3.18.3-zcibad34aiuayllssedrsj3tbf3mk4t7/spack-src/src/mat/impls/aij/mpi/mpiaij.c:480
[1] Argument out of range
[1] New nonzero at (13535,13535) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check

ie there seems to be a problem with the Jacobian assembly in parallel.

This is a minimal working version where I isolated the problem :

import numpy as np
import os, sys

from ufl import div, grad, transpose, avg, Identity, inv, det, conj # algebra
from ufl import inner, FiniteElement, VectorElement, TestFunction, TrialFunction, Measure # tools for weak formulations
from ufl import derivative, MinCellEdgeLength,sqrt, CellDiameter # misc

import dolfinx.io
from dolfinx.fem import locate_dofs_topological, dirichletbc, Function, FunctionSpace, Constant
from dolfinx.mesh import meshtags
from dolfinx.mesh import (CellType, GhostMode, create_unit_square,create_cell_partitioner, create_mesh)


from multiphenicsx.fem.petsc import create_matrix_block, create_vector_block, assemble_matrix_block, assemble_vector_block, BlockVecSubVectorWrapper
from multiphenicsx.fem import DofMapRestriction
from dolfinx.io import XDMFFile
from dolfinx import cpp

from mpi4py import MPI
from petsc4py import PETSc

# Physical parameters
Re = 3000. # Reynolds number 
Es = 1.e0 # non-dimensional Young modulus
Ms = 1.e0 # non-dimensional density ratio

mesh_name = "flatplate_JFM_FSI"

with XDMFFile(MPI.COMM_WORLD, "{:s}.xdmf".format(mesh_name), "r") as xdmf:
    mesh = xdmf.read_mesh(name="Grid", ghost_mode = cpp.mesh.GhostMode.shared_facet)
    ct = xdmf.read_meshtags(mesh, name="Grid")
mesh.topology.create_connectivity(mesh.topology.dim, mesh.topology.dim-1)

with XDMFFile(MPI.COMM_WORLD, "{:s}_facets.xdmf".format(mesh_name), "r") as xdmf:
    ft = xdmf.read_meshtags(mesh, name="Grid")

# Subdomain tags 
fluid_marker = 2
solid_marker = 10
inlet_marker = 1
outlet_marker = 4
farfield_marker = 3
wall_marker = 20
compliant_wall_marker = 200
clamp_marker = 2000

boundary_markers = [wall_marker,inlet_marker,outlet_marker,farfield_marker,clamp_marker]
fdim = mesh.topology.dim - 1

# Define boundaries and associated measures.
boundaries_entities = dict()
boundaries_values = dict()
for marker in boundary_markers:
    boundaries_entities[marker] = ft.indices[ft.values == marker]
    boundaries_values[marker] = np.full(
        boundaries_entities[marker].shape, marker, dtype=np.intc)
boundary_entities = np.hstack(list(boundaries_entities.values()))
boundary_values = np.hstack(list(boundaries_values.values()))
# Important: need to sort entities to define mesh tags
asort = np.argsort(boundary_entities)
boundary_entities = boundary_entities[asort]
boundary_values = boundary_values[asort]
boundaries = meshtags(mesh, mesh.topology.dim - 1, boundary_entities, boundary_values)
ds = Measure("ds", domain=mesh, subdomain_data=boundaries)


# Define compliant wall interface and associated measures.
interfaces_entities = dict()
interfaces_values = dict()
for marker in [compliant_wall_marker]:
    interfaces_entities[marker] = ft.indices[ft.values == marker]
    interfaces_values[marker] = np.full(
        interfaces_entities[marker].shape, marker, dtype=np.intc)
interfaces = meshtags(mesh, mesh.topology.dim - 1,
                      np.hstack(list(interfaces_entities.values())),
                      np.hstack(list(interfaces_values.values())))
dS = Measure("dS")(subdomain_data=interfaces)

# Define fluid and solid subdomains and associated measures
subdomains_entities = dict()
subdomains_values = dict()
for marker in [fluid_marker, solid_marker]:
    subdomains_entities[marker] = ct.indices[ct.values == marker]
    subdomains_values[marker] = np.full(
        subdomains_entities[marker].shape, marker, dtype=np.intc)
subdomains = meshtags(
    mesh, mesh.topology.dim,
    np.hstack(list(subdomains_entities.values())), np.hstack(list(subdomains_values.values())))
dx = Measure("dx")(subdomain_data=subdomains)

# Function spaces
V_element = VectorElement("Lagrange", mesh.ufl_cell(), 2)
Q_element = FiniteElement("Lagrange", mesh.ufl_cell(), 1)
E_element = VectorElement("Lagrange", mesh.ufl_cell(), 2)
S_element = VectorElement("Lagrange", mesh.ufl_cell(), 2)
PS_element = FiniteElement("Lagrange", mesh.ufl_cell(), 1)
V = FunctionSpace(mesh, V_element) # fluid velocity space
Q = FunctionSpace(mesh, Q_element) # fluid pressure space
MV = V.clone() # velocity continuity Lagrange multiplier space
E = FunctionSpace(mesh, E_element) # ALE extension displacement space
ME = E.clone() # displacement continuity Lagrange multiplier space
S = FunctionSpace(mesh, S_element) # solid displacement space
PS = FunctionSpace(mesh, PS_element) # solid pressure space

# Define restrictions

entities_fluid = subdomains.indices[subdomains.values == fluid_marker]
dofs_V_fluid = locate_dofs_topological(V, subdomains.dim, entities_fluid)
restriction_V_fluid = DofMapRestriction(V.dofmap, dofs_V_fluid)

dofs_Q_fluid = locate_dofs_topological(Q, subdomains.dim, entities_fluid)
restriction_Q_fluid = DofMapRestriction(Q.dofmap, dofs_Q_fluid)

facets_compliant_wall_MV = ft.indices[ft.values == compliant_wall_marker] 
dofs_MV_compliant_wall = locate_dofs_topological(MV, boundaries.dim, facets_compliant_wall_MV)
restriction_MV_compliant_wall = DofMapRestriction(MV.dofmap, dofs_MV_compliant_wall)

entities_E_fluid = subdomains.indices[subdomains.values == fluid_marker] 
dofs_E_fluid = locate_dofs_topological(E, subdomains.dim, entities_E_fluid)
restriction_E_fluid = DofMapRestriction(E.dofmap, dofs_E_fluid)

facets_compliant_wall_ME = ft.indices[ft.values == compliant_wall_marker] 
dofs_ME_compliant_wall = locate_dofs_topological(ME, boundaries.dim, facets_compliant_wall_ME)
restriction_ME_compliant_wall = DofMapRestriction(ME.dofmap, dofs_ME_compliant_wall)

entities_solid = subdomains.indices[subdomains.values == solid_marker] 
dofs_S_solid = locate_dofs_topological(S, subdomains.dim, entities_solid)
restriction_S_solid = DofMapRestriction(S.dofmap, dofs_S_solid)

dofs_PS_solid = locate_dofs_topological(PS, subdomains.dim, entities_solid)
restriction_PS_solid = DofMapRestriction(PS.dofmap, dofs_PS_solid)


restriction = [restriction_V_fluid, restriction_Q_fluid, restriction_MV_compliant_wall, restriction_E_fluid, restriction_E_fluid, restriction_ME_compliant_wall, restriction_S_solid, restriction_S_solid, restriction_PS_solid]


# Test and trial functions
(v, q, mv, psi, vpsi, mpsi, txS, tvS, tpS) = (TestFunction(V), TestFunction(Q), TestFunction(MV), TestFunction(E), TestFunction(E), TestFunction(ME), TestFunction(S), TestFunction(S), TestFunction(PS))
(du, dp, dlu, dxi, dvxi, dlxi, dxS, dvS, dpS) = (TrialFunction(V), TrialFunction(Q), TrialFunction(MV), TrialFunction(E), TrialFunction(E), TrialFunction(ME), TrialFunction(S),TrialFunction(S), TrialFunction(PS))
(u, p, lu, xi, vxi, lxi, xS, vS, pS) = (Function(V), Function(Q), Function(MV), Function(E), Function(E), Function(ME), Function(S), Function(S), Function(PS))

# Variational forms

def FF(xi):
    return Identity(len(xi)) + grad(xi)

def JacF(xi):
    return det(FF(xi))

def PhiF(xi):
    return JacF(xi) * inv(FF(xi))

def SigmaF(u, p, xi):
    return - p * Identity(len(u)) + 1./Re * 1./JacF(xi) * (grad(u) * PhiF(xi) + transpose(PhiF(xi)) *  transpose(grad(u)))

#Residual and Jacobian variational form

Residual = [
        inner(grad(u) * PhiF(xi) * (u - vxi), v) * dx(fluid_marker) + inner(SigmaF(u, p, xi), grad(v)) * dx(fluid_marker)+ inner(avg(lu), avg(v)) * dS(compliant_wall_marker) ,
        - div(PhiF(xi) * u) * conj(q) * dx(fluid_marker),
        inner(avg(u) - avg(vxi), avg(mv)) * dS(compliant_wall_marker),
        - inner(vxi, psi) * dx(fluid_marker),
        inner(grad(xi), grad(vpsi)) * dx(fluid_marker) - inner(avg(lxi), avg(vpsi)) * dS(compliant_wall_marker),
        inner(avg(xS) - avg(xi), avg(mpsi)) * dS(compliant_wall_marker),
        - inner(vS, txS) * dx(solid_marker),
        Es/3. * inner(grad(xS) + transpose(grad(xS)), grad(tvS)) * dx(solid_marker) - pS * conj(div(tvS)) * dx(solid_marker) - inner(avg(lu), avg(tvS)) * dS(compliant_wall_marker),
        div(xS) * conj(tpS) * dx(solid_marker)
    ]
Jacobian = [ [derivative(Residual[i], base, direction) for base, direction in zip((u, p, lu, xi, vxi, lxi, xS, vS, pS), (du,dp, dlu, dxi, dvxi, dlxi, dxS, dvS, dpS))] for i in range(len(Residual)) ] # one-liner for auto-diff of Residual
JacobianForm = dolfinx.fem.form(Jacobian)
ResidualForm = dolfinx.fem.form(Residual)


# Boundary conditions
def u_wall_eval(x):
    """
    Evaluates the no-slip velocity (i.e. zero ...) at x.
    """
    return np.zeros((2, x.shape[1]))

u_wall = Function(V)
u_wall.interpolate(u_wall_eval)

xE_clamp = Function(E)
xE_clamp.interpolate(lambda x: np.zeros((2, x.shape[1])))

xS_clamp = Function(S)
xS_clamp.interpolate(lambda x: np.zeros((2, x.shape[1])))

# Inlet
bfacets_V_in = ft.indices[ft.values == inlet_marker]
bdofs_V_in = locate_dofs_topological(V, mesh.topology.dim - 1, bfacets_V_in)
inlet_bc = dirichletbc(u_wall, bdofs_V_in)

# Wall (no-slip)
bfacets_V_wall = ft.indices[ft.values == wall_marker]
bdofs_V_wall = locate_dofs_topological(V, mesh.topology.dim - 1, bfacets_V_wall)
wall_bc = dirichletbc(u_wall, bdofs_V_wall)

# outlet boundary
bfacets_V_outlet = ft.indices[ft.values == outlet_marker]
bdofs_V_outlet = locate_dofs_topological((V.sub(1), V.sub(1).collapse()[0]), mesh.topology.dim - 1, bfacets_V_outlet)
outlet_bc = dirichletbc(u_wall.sub(1).collapse(), bdofs_V_outlet, V.sub(1))

# Farfield boundary
bfacets_V_farfield = ft.indices[ft.values == farfield_marker]
bdofs_V_farfield = locate_dofs_topological(V, mesh.topology.dim - 1, bfacets_V_farfield)
farfield_bc = dirichletbc(u_wall, bdofs_V_farfield)

# Extension clamped boundary
bfacets_E_clamp = ft.indices[(ft.values == wall_marker) + (ft.values == inlet_marker) + (ft.values == farfield_marker) + (ft.values == outlet_marker)]
bdofs_E_clamp = locate_dofs_topological(E, mesh.topology.dim - 1, bfacets_E_clamp)
clamp_E_bc = dirichletbc(xE_clamp, bdofs_E_clamp)

# Solid clamped boundary
bfacets_S_clamp = ft.indices[ft.values == clamp_marker]
bdofs_S_clamp = locate_dofs_topological(S, mesh.topology.dim - 1, bfacets_S_clamp)
clamp_S_bc = dirichletbc(xS_clamp, bdofs_S_clamp)

bc = [inlet_bc, wall_bc, farfield_bc, clamp_E_bc, clamp_S_bc]


# Assemble the matrix
J_mat = create_matrix_block(JacobianForm, restriction=(restriction, restriction))
assemble_matrix_block(J_mat,JacobianForm, bc, diagonal=1.e30, restriction=(restriction, restriction))

J_mat.assemble()

It runs nicely in serial but yields a malloc error when assembling the matrix in parallel. I noticed that the assembly was completed in parallel without any problem when I take out the clamped boundary conditions “clamp_E_bc” and “clamp_S_bc”, but it fails otherwise. I haven’t figured out the reason for that and was hoping anyone had any idea what caused this.
Thanks in advance.

It would be great if you could report if assembling using {assemble,create}_matrix_block, {assemble,create}_vector_block from dolfinx.fem.petsc (and thus dropping the restriction argument) rather than multiphenicsx.fem.petsc works, in which case this might be a multiphenicsx bug.

Can you report the answer with:

  1. the forms as in your code above, and
  2. a modified version in which you try to add a zero * inner(trial, test) * dx to every diagonal element in the jacobian, where zero = dolfinx.fem.Constant(mesh, petsc4py.PETSc.ScalarType(0.0))

?

Hello Francesco, thanks for the quick answer !

I ran my scripts with your recommandations.
For 1., the assembly was not successful with the exact same malloc error, so it was not a problem on multiphenicsx.
For 2., the assembly went on successfully in parallel, so it appears this resolved the problem ! Here is the correction you suggested that did the trick:

trials = [du, dp, dlu, dxi, dvxi, dlxi, dxS, dvS, dpS]
tests = [v, q, mv, psi, vpsi, mpsi, txS, tvS, tpS]
zeroConstant = Constant(mesh, PETSc.ScalarType(0))
for i in range(len(Residual)):
    Jacobian[i][i]+= zeroConstant*inner(trials[i],tests[i])*dx

Is this something I am missing or a workaround to a known problem ?
Thanks a lot !

OK, good to know it is not a multiphenicsx issue.

You can easily understand where the error comes from if you simplify your problem to a Navier-Stokes only. Imagine that you are willing to assign Dirichlet boundary conditions on the pressure. Since the (pressure trial, pressure test) block is empty, enforcing the diagonal value will cause a malloc.
This case is handled by dolfinx with a runtime error

Your case is a bit more complicated, in the sense the diagonal form might be non-None, but only integrate on a narrow part of the domain (say, the diagonal form has an only an integral over the interface between solid and fluid, while the boundary conditions set the value of the fluid part which is far away from the interface).
I often use that trick in my tutorials, especially those in the folder 06_optimal_control. If you wish not allocate too many matrix entries you may create a subdomain of cells (call it subdomain 99) attached to the Dirichlet boundary, and use dx(99) in the workaround. I’d only do that if profiling reveals that allocating those entries is detrimental for the performance of the code.

1 Like

Thank you very much for taking the time to explain ! This makes more sense to me now.