I’m computing elastic Bloch band structures using dolfinx_mpc.MultiPointConstraint for periodic boundary conditions. Each k-point builds an MPC with the appropriate Bloch phase, assembles K and M, and solves a generalized Hermitian eigenvalue problem via SLEPc with shift-invert + MUMPS.
After many k-point cycles in a single process (anywhere from ~164 to ~660 depending on configuration), the process crashes. The MPC physics is correct on individual solves — I’ve verified zero-mode counts, Hermiticity, and spectral structure. But three different reproducer configurations give three different failure signatures, all consistent with cumulative resource accumulation at the MPC+SLEPc library boundary. I’d appreciate guidance on the recommended usage pattern.
Environment
DOLFINx: 0.8.0
dolfinx_mpc: 0.8.0.post1 (built from source against above)
petsc4py: 3.21.0
PETSc: complex128, 32-bit indices (linux-gnu-complex128-32)
Python: 3.10
Base image: dolfinx/dolfinx:v0.8.0
OS: Ubuntu 22.04 (container)
Three observed failure modes
Mode 1 (production code): SIGSEGV / free(): invalid pointer at cumulative solve ~164. Backtrace inside SLEPc.EPS.solve. C-level memory corruption — bypasses Python exception handling.
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation
free(): invalid pointer
Abort(59) on node 0 (rank 0 in comm 0)
Mode 2: Minimal reproducer — three sequential create_periodic_constraint_geometrical calls per iteration, SMALLEST_MAGNITUDE eigensolve (no factorization). Crashes at iter ~221 (~663 MPC init calls) with MPI communicator exhaustion:
MPIR_Get_contextid_sparse_group(591): Too many communicators
(0/2048 free on this process; ignore_id=0)
Mode 3: Same reproducer as Mode 2 but with the production SLEPc config (shift-invert + KSP/PC=preonly/lu + MUMPS, target=0). Crashes at iter 225 with MUMPS analysis workspace exhausted:
[0] MatLUFactorSymbolic_AIJMUMPS() at mumps.c:2358
[0] MUMPS error in analysis: INFOG(1)=-13
Setting mat_mumps_icntl_14=200 (via either code-level setValue or env PETSC_OPTIONS) has no effect — the option appears on PETSc’s “unused” list despite MUMPS clearly running.
What I already tried
Explicit cleanup between iterations:
python
eps_solver.destroy()
K.destroy()
M.destroy()
del mpc
gc.collect()
None of this prevents any of the three failure modes.
What I’ve ruled out for Mode 1
| Hypothesis | Status | Evidence |
|---|---|---|
| PETSc options database pollution | Ruled out | Option count stable at 13 across 220 solves |
| MUMPS workspace (Mode 1 specifically) | Ruled out | Mode 1 backtrace is inside EPS.solve, not MUMPS analysis |
| Python-level PETSc object leak | Ruled out | gc.get_objects() count stable at 3151 across 210 solves |
| pytest fixture interaction | Ruled out | Reproduces identically outside pytest |
Modes 2 and 3 confirm that resource accumulation is happening at C/Fortran level, just with library-level signatures that surface before Mode 1’s lower-level SIGSEGV threshold.
Minimal reproducer (Mode 3)
python
import numpy as np
import gc
from mpi4py import MPI
from dolfinx import fem, mesh
from dolfinx_mpc import MultiPointConstraint, assemble_matrix
from petsc4py import PETSc
from slepc4py import SLEPc
import ufl
comm = MPI.COMM_WORLD
msh = mesh.create_unit_square(comm, 8, 8, mesh.CellType.triangle)
V = fem.functionspace(msh, ("Lagrange", 2, (2,)))
E, nu, rho = 2.5e9, 0.34, 1180.0
mu = E / (2 * (1 + nu))
lam = E * nu / ((1 + nu) * (1 - 2 * nu))
def eps_sym(u): return ufl.sym(ufl.grad(u))
def sigma(u): return lam * ufl.tr(eps_sym(u)) * ufl.Identity(2) + 2 * mu * eps_sym(u)
u, v = ufl.TrialFunction(V), ufl.TestFunction(V)
k_form = fem.form(ufl.inner(sigma(u), eps_sym(v)) * ufl.dx)
m_form = fem.form(rho * ufl.inner(u, v) * ufl.dx)
# Production SLEPc options (shift-invert + MUMPS)
opts = PETSc.Options()
opts["eps_type"] = "krylovschur"
opts["eps_target"] = 0.0
opts["st_type"] = "sinvert"
opts["st_shift"] = 0.0
opts["st_ksp_type"] = "preonly"
opts["st_pc_type"] = "lu"
opts["st_pc_factor_mat_solver_type"] = "mumps"
opts["mat_mumps_icntl_14"] = 200
def build_mpc_and_solve(k_vec):
mpc = MultiPointConstraint(V)
# Three shifts matching supercell pattern (third is dummy no-op here)
def reloc_a1(x): return np.array([x[0] - 1.0, x[1], x[2]])
def ind_a1(x): return np.isclose(x[0], 1.0) & (x[1] > 1e-10) & (x[1] < 1 - 1e-10)
def reloc_a2(x): return np.array([x[0], x[1] - 1.0, x[2]])
def ind_a2(x): return np.isclose(x[1], 1.0) & (x[0] > 1e-10) & (x[0] < 1 - 1e-10)
def reloc_a3(x): return np.array([x[0] + 1.0, x[1] - 1.0, x[2]])
def ind_a3(x): return np.zeros(x.shape[1], dtype=bool)
phase = np.exp(1j * k_vec[0])
mpc.create_periodic_constraint_geometrical(V, ind_a1, reloc_a1, [], phase)
mpc.create_periodic_constraint_geometrical(V, ind_a2, reloc_a2, [], phase)
mpc.create_periodic_constraint_geometrical(V, ind_a3, reloc_a3, [], phase)
mpc.finalize()
K = assemble_matrix(k_form, mpc)
M = assemble_matrix(m_form, mpc)
K.assemble(); M.assemble()
eps_solver = SLEPc.EPS().create(comm)
eps_solver.setOperators(K, M)
eps_solver.setProblemType(SLEPc.EPS.ProblemType.GHEP)
eps_solver.setDimensions(nev=12)
eps_solver.setFromOptions()
eps_solver.solve()
nconv = eps_solver.getConverged()
eigvals = [eps_solver.getEigenvalue(i) for i in range(min(nconv, 12))]
eps_solver.destroy()
K.destroy()
M.destroy()
del mpc
gc.collect()
return nconv, eigvals
for i in range(250):
k_vec = np.array([0.1 * (i % 60), 0.0])
nconv, eigvals = build_mpc_and_solve(k_vec)
if i % 10 == 0:
ev0 = eigvals[0] if eigvals else "none"
print(f"iter {i}: nconv={nconv}, first_ev={ev0}", flush=True)
print("Completed 250 iterations — no crash")
This crashes at iter ~225 with the Mode 3 signature. Mode 2 is the same script with eps_solver.setWhichEigenpairs(SLEPc.EPS.Which.SMALLEST_MAGNITUDE) instead of the shift-invert options block — crashes later (iter ~221 but with no factorization work) with the MPI communicator signature.
Questions
- Is there a recommended usage pattern for long k-point sweeps? Specifically:
- Should
MultiPointConstraintbe built once per mesh and reused across Bloch phases, rather than rebuilt per k-point? If so, doesdolfinx_mpcsupport changing the Bloch phase afterfinalize()? - Is there a required cleanup sequence beyond
destroy()/del/gc.collect()that I’m missing?
- Is the MPI communicator leak in Mode 2 a known issue in
create_periodic_constraint_geometrical? ~3 communicators per MPC construction × 663 constructions = 2048 limit is a close match. - Could Mode 1’s SIGSEGV be the same communicator leak hitting sooner, due to additional SLEPc internal communicators in Mode 1 bringing the process closer to the 2048 limit before MPC init alone would reach it?
- Is MUMPS’s
icntl_14=200option not propagating through my setup? I set it via bothPETSc.Options()andPETSC_OPTIONS=env; it appears on PETSc’s “unused database options” list at exit, yet MUMPS runs and eventually fails with INFOG(1)=-13 on analysis workspace.
Happy to share the full production reproducer script (requires a Kekulé hexagonal supercell mesh, ~1000 DOFs) if useful. Any direction appreciated.