Discontinuities at MPI partition boundaries with manual periodic MPC on large high-order curved mesh

Hi,

I’m running a Poisson problem with periodic BCs in two directions on a high-order Lagrange space, and I’m seeing discontinuities (“lightning”) in the solution that align with MPI partition boundaries. Single-rank produces a smooth solution; running with 4 or 16 ranks introduces visible jumps. I’ve isolated the problem extensively and ruled out the standard suspects. A minimal reproducer (attached) on a unit cube does not exhibit the issue, so the question is what additional aspect of my real setup triggers it.

Setup

  • dolfinx 0.10, dolfinx_mpc 0.10.0 (contains PR #155 backsubstitution fix)
  • P4 Lagrange on a curved hex27 mesh from a turbine cascade CFD (PyFR), ~17M DOFs
  • Periodic in two translation directions (pitch + span)
  • Inlet/outlet: homogeneous Neumann
  • Single Dirichlet pin on a non-slave DOF to break the constant null space
  • Solver: dolfinx_mpc.LinearProblem with CG + bjacobi + icc

MPC construction (manual)

# slaves: locally owned dofs on x_max and y_max periodic boundaries
# masters: collected via allgather of all owned dofs near x_min OR y_min,
#          matched per-translation with KDTree
slaves_arr   = np.array(slaves_list, dtype=np.int32)   # local indices
masters_arr  = np.array(masters_list, dtype=np.int64)  # global indices
owners_arr   = np.array(owners_list, dtype=np.int32)
coeffs_arr   = np.ones(len(slaves_arr), dtype=np.float64)
offsets_arr  = np.arange(len(slaves_arr) + 1, dtype=np.int32)

mpc = dolfinx_mpc.MultiPointConstraint(V)
mpc.add_constraint(V, slaves_arr, masters_arr, coeffs_arr, owners_arr, offsets_arr)
mpc.finalize()

Master/slave detection reports max distance = 2.5e-14, 0 misses. Total 198,711 slaves after finalize.

Solve sequence (per dokken’s recommendation from thread 13241)

psi_h = problem.solve()
psi_h.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
mpc.backsubstitution(psi_h)

What I’ve verified

All checks below pass on every rank in parallel:

  1. psi_h.function_space is mpc.function_space returns True.

  2. For every local slave (all 198,711 across ranks), I compute

    max_i | psi_h[slave_i] - sum_j coeff_ij * psi_h[master_ij] |
    

    Result: 0.000e+00 on every rank. Backsubstitution is bit-exact.

  3. For 1000 ghost DOFs per rank, I compare the ghost value to the corresponding owned value on the owning rank (looked up via KDTree on allgathered owned coordinates). Result: 0.000e+00 on every rank. Ghost synchronization is bit-exact.

  4. Writing a Function(V) filled with 1.0 to a VTKFile and viewing in ParaView shows a perfectly smooth constant field — no partition-boundary artifacts in the output pipeline.

  5. Interpolating psi_h to a P1 space and writing that also shows the same discontinuities, ruling out high-order Lagrange visualization artifacts.

  6. The source term (interpolated from CFD data) is smooth in the visualization — no discontinuities. The discontinuities only appear in psi_h.

  7. The matrix diagonal is checked: slave rows have diagonal 1.000e+00 ± 1e-3 on all ranks. Non-slave diagonals vary across ranks but only in proportion to local cell size (METIS variation), which is expected.

  8. The Neumann/pin compatibility condition is enforced: ∫ f dV is computed before solve and the mean is subtracted from the source to satisfy ∫ f dV = 0 exactly. The discontinuities persist.

  9. MWE on a unit cube hex mesh with the same MPC construction logic (attached: mwe.py) produces a smooth solution at any rank count. The bug only appears in the production setup.

Symptom

The discontinuities (“lightning”) in psi_h:

  • Align with MPI partition boundaries (verified by overlaying a DG-0 rank field)
  • Are at magnitudes ~1e-4 in a field where typical values are ~1e-4 to ~1e-3 (i.e. they’re significant)
  • Become fewer/less visible with fewer ranks (n=4 vs n=16)
  • Were not present in the previous setup (Dirichlet at inlet/outlet, no MPC)

Question

Given the MWE does NOT reproduce the issue, but my production setup does, the key differences between them are:

  • Curved hex27 mesh (P2 coordinate element) from CFD data, vs. flat hex8 in the MWE
  • 17M DOFs at P4, vs. ~10k in the MWE
  • METIS partitioning of a thin, anisotropic domain (turbine blade passage), vs. cube
  • Edge DOFs: 519 DOFs lie on the intersection of pitch and span periodic boundaries (corner edges). The MWE has only a few. I tried registering these explicitly with a diagonal translation master (instead of relying on a chain through one of the single-translation masters), but this did not change the symptom.

Could anything in this list specifically interact with dolfinx_mpc in a way that breaks parallel consistency despite backsubstitution and ghost sync both being bit-exact? Or are there additional collective consistency checks I should be doing on the MultiPointConstraint itself before solving?

MWE

[Attach mwe.py — runs in seconds on a unit cube, smooth at all rank counts]

mpirun -n 4 stdbuf -oL python -u mwe.py

Any pointers welcome — happy to add additional diagnostics or share more of the production code.

Thanks!

"""
MWE: dolfinx_mpc parallel periodicity issue.

Setup:
- Unit cube, hex mesh, P4 Lagrange (high-order matters)
- Periodic in x and y (two directions, like turbine cascade pitch + span)
- Neumann at z=0 and z=1, single Dirichlet pin to remove null space
- Periodicity via MANUAL mpc.add_constraint(...) with KDTree-based master/slave
  detection (not create_periodic_constraint_topological)
- dolfinx 0.10.x, dolfinx_mpc 0.10.0

Observation:
- Single-rank run produces smooth solution
- Parallel run (n=4, n=16) shows visible discontinuities ("lightning") in psi
  that follow MPI partition boundaries when visualized in ParaView
- MPC backsubstitution verification passes:
    max |slave - sum(coeff*master)| ~= 0 (machine precision)
  so backsubstitution itself is not the issue
- The function space of the solution is mpc.function_space (correct)
- ghostUpdate + backsubstitution are called explicitly as per Dokken's
  recommendation

Question: why do the partition-boundary discontinuities appear even though the
periodicity constraints are perfectly satisfied?

Run with:
    mpirun -n 1 python mwe.py        # smooth, reference
    mpirun -n 4 python mwe.py        # discontinuities at partitions
"""
import numpy as np
from mpi4py import MPI
from petsc4py import PETSc
from scipy.spatial import KDTree
import ufl

from dolfinx import fem, mesh
from dolfinx.mesh import locate_entities_boundary
from dolfinx.io import VTKFile
import dolfinx_mpc

comm = MPI.COMM_WORLD
rank = comm.rank

# 1. mesh
N_xy, N_z = 16, 4
msh = mesh.create_box(
    comm, [[0.0, 0.0, 0.0], [1.0, 1.0, 1.0]], [N_xy, N_xy, N_z],
    mesh.CellType.hexahedron,
)
tdim = msh.topology.dim
fdim = tdim - 1
msh.topology.create_connectivity(fdim, tdim)
msh.topology.create_connectivity(tdim, fdim)

# 2. function space
ORDER = 4
V = fem.functionspace(msh, ("Lagrange", ORDER))
if rank == 0:
    print(f"DOFs: {V.dofmap.index_map.size_global}", flush=True)
comm.Barrier()
if rank == 0: print("CP1: function space ready", flush=True)

# 3. periodic MPC setup (manual, KDTree-based)
def facets_at(axis, val):
    return locate_entities_boundary(msh, fdim, lambda x: np.isclose(x[axis], val))

pitch_slave_facets = facets_at(0, 1.0)  # x = 1 -> x = 0
span_slave_facets  = facets_at(1, 1.0)  # y = 1 -> y = 0

pitch_translation = np.array([1.0, 0.0, 0.0])
span_translation  = np.array([0.0, 1.0, 0.0])

dof_coords = V.tabulate_dof_coordinates()
imap = V.dofmap.index_map
n_local = imap.size_local
local_range_start = imap.local_range[0]

# slave dofs, locally owned only
pitch_slave_dofs = fem.locate_dofs_topological(V, fdim, pitch_slave_facets)
pitch_slave_dofs = pitch_slave_dofs[pitch_slave_dofs < n_local]

span_slave_dofs = fem.locate_dofs_topological(V, fdim, span_slave_facets)
span_slave_dofs = span_slave_dofs[span_slave_dofs < n_local]
span_slave_dofs = np.setdiff1d(span_slave_dofs, pitch_slave_dofs)

# gather all candidate master dofs (owned, near x=0 OR y=0) from all ranks
owned_coords = dof_coords[:n_local]
mask = (np.isclose(owned_coords[:, 0], 0.0, atol=1e-4)
        | np.isclose(owned_coords[:, 1], 0.0, atol=1e-4))

local_master_coords  = np.ascontiguousarray(owned_coords[mask])
local_master_globals = np.arange(n_local, dtype=np.int64)[mask] + local_range_start
local_master_owners  = np.full(mask.sum(), rank, dtype=np.int32)

if rank == 0: print(f"CP2: rank {rank} has {len(local_master_coords)} local masters, "
                    f"{len(pitch_slave_dofs)} pitch slaves, {len(span_slave_dofs)} span slaves", flush=True)
comm.Barrier()

all_master_coords  = np.vstack(comm.allgather(local_master_coords))
all_master_globals = np.concatenate(comm.allgather(local_master_globals))
all_master_owners  = np.concatenate(comm.allgather(local_master_owners))

comm.Barrier()
if rank == 0: print(f"CP3: allgather done, {len(all_master_coords)} total masters", flush=True)

master_tree = KDTree(all_master_coords)
comm.Barrier()
if rank == 0: print("CP4: kdtree built", flush=True)

slaves_list, masters_list, owners_list = [], [], []

for sl in pitch_slave_dofs:
    sc = dof_coords[sl]
    mc = sc - pitch_translation
    d, idx = master_tree.query(mc)
    if d > 1e-8:
        continue
    slaves_list.append(int(sl))
    masters_list.append(int(all_master_globals[idx]))
    owners_list.append(int(all_master_owners[idx]))

for sl in span_slave_dofs:
    sc = dof_coords[sl]
    mc = sc - span_translation
    d, idx = master_tree.query(mc)
    if d > 1e-8:
        continue
    slaves_list.append(int(sl))
    masters_list.append(int(all_master_globals[idx]))
    owners_list.append(int(all_master_owners[idx]))

slaves_arr  = np.array(slaves_list,  dtype=np.int32)
masters_arr = np.array(masters_list, dtype=np.int64)
owners_arr  = np.array(owners_list,  dtype=np.int32)
coeffs_arr  = np.ones(len(slaves_arr), dtype=np.float64)
offsets_arr = np.arange(len(slaves_arr) + 1, dtype=np.int32)

comm.Barrier()
if rank == 0: print(f"CP5: matching done, rank {rank} has {len(slaves_arr)} slave-master pairs", flush=True)

mpc = dolfinx_mpc.MultiPointConstraint(V)
comm.Barrier()
if rank == 0: print("CP6: mpc object created", flush=True)

mpc.add_constraint(V, slaves_arr, masters_arr, coeffs_arr, owners_arr, offsets_arr)
comm.Barrier()
if rank == 0: print("CP7: add_constraint done", flush=True)

mpc.finalize()
comm.Barrier()
if rank == 0: print("CP8: finalize done", flush=True)

if rank == 0:
    n_total = comm.allreduce(mpc.num_local_slaves, op=MPI.SUM)
    print(f"total MPC slaves: {n_total}", flush=True)

# 4. single pin on a non-slave dof (rank 0 only)
if rank == 0:
    slave_set = set(mpc.slaves.tolist())
    candidates = [d for d in range(n_local) if d not in slave_set]
    pin_dof = candidates[len(candidates) // 2]
    pinned = np.array([pin_dof], dtype=np.int32)
else:
    pinned = np.array([], dtype=np.int32)
bc_pin = fem.dirichletbc(np.float64(0.0), pinned, V)

# 5. Poisson with periodic source
u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
x = ufl.SpatialCoordinate(msh)
f_src = ufl.sin(2 * ufl.pi * x[0]) * ufl.cos(2 * ufl.pi * x[1])

a = ufl.inner(ufl.grad(u), ufl.grad(v)) * ufl.dx
L = f_src * v * ufl.dx

problem = dolfinx_mpc.LinearProblem(
    a, L, mpc, bcs=[bc_pin],
    petsc_options={"ksp_type": "cg", "pc_type": "bjacobi",
                   "sub_pc_type": "icc", "ksp_rtol": 1e-10,
                   "ksp_converged_reason": None},
    petsc_options_prefix="mwe",
)
psi = problem.solve()
psi.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT,
                             mode=PETSc.ScatterMode.FORWARD)
mpc.backsubstitution(psi)

# 6. verify backsubstitution: slave == sum(coeff * master) ?
slaves = mpc.slaves
masters_adj = mpc.masters
coeffs_full, offsets_full = mpc.coefficients()

max_err = 0.0
for i in range(mpc.num_local_slaves):
    islv = int(slaves[i])
    mlist = masters_adj.links(islv)
    clist = coeffs_full[offsets_full[islv]:offsets_full[islv + 1]]
    if len(mlist) > 0:
        expected = sum(c * psi.x.array[int(m)] for c, m in zip(clist, mlist))
        max_err = max(max_err, abs(psi.x.array[islv] - expected))

max_err_g = comm.allreduce(max_err, op=MPI.MAX)
if rank == 0:
    print(f"MPC verification: max |slave - sum(coeff*master)| = {max_err_g:.3e}",
          flush=True)
    print("  (~0 => backsubstitution is correct)", flush=True)

# 7. output: psi + rank field for diagnostic
V_dg = fem.functionspace(msh, ("DG", 0))
rank_field = fem.Function(V_dg, name="rank")
rank_field.x.array[:] = float(rank)

psi.name = "psi"
with VTKFile(comm, "mwe_psi.pvd", "w") as vtk:
    vtk.write_function(psi, 0.0)
with VTKFile(comm, "mwe_rank.pvd", "w") as vtk:
    vtk.write_function(rank_field, 0.0)

if rank == 0:
    print("\nDone. Open mwe_psi.pvd and mwe_rank.pvd in ParaView.", flush=True)
    print("Expected: smooth psi. Observed (parallel): discontinuities at "
          "partition boundaries.", flush=True)

Why do you need this here? This should happen within solve:

Thank you @dokken for clarifying that the explicit ghostUpdate + backsubstitution after solve() is redundant since it’s already done internally.

I added those calls during diagnostics following advice in thread 12401 where they seemed needed in older versions. Removing them does not change the symptom — the partition-boundary discontinuities persist.

Setup recap (since this is the core question):

  • Poisson problem ∇²u = f on a curved hex27 mesh from CFD data (~17M DOFs, P4 Lagrange)
  • Two periodic translations (pitch + span), no Dirichlet boundary except a single pin
  • ~200k MPC slaves
  • 16 MPI ranks
  • Periodicity built via manual mpc.add_constraint(slaves, masters, coeffs, owners, offsets) because the slave/master pairs come from external CFD mesh data

What I tried after your reply:

I attempted to switch from manual add_constraint to the higher-level create_periodic_constraint_topological and create_periodic_constraint_geometrical, but both immediately crash with:

RuntimeError: Newton method failed to converge for non-affine geometry

This matches the issue from thread 13511 and thread 13421, where you suggested raising the pull-back Newton tolerance (PR #104). That tolerance does not seem exposed via the Python API in v0.10.0, so I’d need to patch and rebuild the C++ side — which is non-trivial in our conda-managed cluster environment.

The manual add_constraint path is therefore the only route that works for me (no crash). With that path, the solve converges, and all internal consistency checks pass on every rank in parallel:

  • u.function_space is mpc.function_spaceTrue
  • max_i | u[slave_i] - sum_j coeff_ij * u[master_ij] | = 0.000e+00
  • Ghost-owned-value consistency check (compare every ghost dof’s value to its owned counterpart on the owning rank, via KDTree on allgathered coords) → 0.000e+00 on every rank

Yet the solution still shows visible jumps along MPI partition boundaries when visualized in ParaView — confirmed to coincide with the partition seams by overlaying a DG-0 rank field, and the number of jumps scales with the rank count (2 jumps with n=2, 15 jumps with n=16).

My questions:

  1. Is there a way to expose the pull-back Newton tolerance via the Python API, so I could try create_periodic_constraint_topological on my curved mesh?
  2. Given that all the standard consistency checks pass on the manual add_constraint path, is there something subtle in the matrix assembly with that low-level API that could lead to partition-dependent results — even when the slave-master values are perfectly enforced?

Could you try to upgrade to dolfinx_mpc v0.10.5 which is on conda forge, as it exposes the tolerance option:

Update: working solution

Thanks to @dokken’s it worked now. Maybe for the others:

What fixed it

  1. Upgraded to dolfinx_mpc 0.10.5 (with openmpi 5.x). The tol parameter in create_periodic_constraint_topological was the key.

  2. Replaced manual mpc.add_constraint(...) with create_periodic_constraint_topological. With a raised tolerance of 5000 * np.finfo(np.float64).eps ≈ 1.1e-12, the Newton pull-back converges on my curved hex27 geometry where it previously crashed with “Newton method failed to converge for non-affine geometry”.

  3. Handled the corner DOFs (intersection of both periodic surfaces) with a diagonal mapping inside the first relation function, instead of trying to register them as a separate constraint:

TOL = 5000.0 * np.finfo(np.float64).eps

def pitch_slave_to_master(x):
    out_x = x.copy()
    out_x[0] -= pitch_translation[0]
    out_x[1] -= pitch_translation[1]
    out_x[2] -= pitch_translation[2]
    # edge DOFs (also on span-slave): apply span translation as well
    on_span = np.isclose(x[2], span_slave_z, atol=TOL)
    out_x[0][on_span] -= span_translation[0]
    out_x[1][on_span] -= span_translation[1]
    out_x[2][on_span] -= span_translation[2]
    return out_x

def span_slave_to_master(x):
    out_x = x.copy()
    out_x[0] -= span_translation[0]
    out_x[1] -= span_translation[1]
    out_x[2] -= span_translation[2]
    # exclude edge DOFs (handled diagonally by pitch_slave_to_master)
    on_pitch = <KDTree query returning dists < 1e-8>
    out_x[0, on_pitch] = np.nan
    out_x[1, on_pitch] = np.nan
    out_x[2, on_pitch] = np.nan
    return out_x

mpc = dolfinx_mpc.MultiPointConstraint(V)
mpc.create_periodic_constraint_topological(V, pitch_mt, PITCH_TAG, pitch_slave_to_master, [], tol=TOL)
mpc.create_periodic_constraint_topological(V, span_mt, SPAN_TAG, span_slave_to_master, [], tol=TOL)
mpc.finalize()
  1. Removed the explicit ghostUpdate + mpc.backsubstitution after solve() — those are already done internally in dolfinx_mpc.LinearProblem.solve() in v0.10.x, as @dokken pointed out.

Result

Parallel runs (16 ranks, ~17M DOFs, P4 Lagrange on curved hex27) now produce smooth solutions with no partition-boundary discontinuities and no corner artifacts.