Multi-core and single-core runs in FEniCSx produce different results

I have a program, but it produces different results when running single core and multi-core, with multi-core appearing to have poor convergence. Many people are very certain that there must be a problem with my code rather than any other reason. The minimum simplified code that can reproduce this error is as follows:(dolfinx: v0.7.2)

from mpi4py import MPI

import numpy
from petsc4py import PETSc

from dolfinx import mesh, fem
from dolfinx.io import XDMFFile
from dolfinx.fem.petsc import LinearProblem
import ufl
from ufl import sqrt, dot, grad, div, dx, rhs, lhs


domain = mesh.create_rectangle(MPI.COMM_WORLD, [(0,0),(12.5e-3,12.5e-3)],(50,50) )

##### Defining FunctionSpace
### Voltage
V = fem.FunctionSpace(domain, ("Lagrange", 1))
### Particle density
P = ufl.FiniteElement("Lagrange", domain.ufl_cell(), 1)
element_P = ufl.MixedElement([P, P])
N = fem.FunctionSpace(domain, element_P)

##### Create boundary condition
### anode_marker
def anode_marker(x):
    return numpy.isclose(x[1], 12.5e-3)
### cathode_marker
def cathode_marker(x):
    return numpy.isclose(x[1], 0)
### bc1
anode_boundary_entity = mesh.locate_entities_boundary(domain, 1, anode_marker)
anode_boundary_dof = fem.locate_dofs_topological(V, 1, anode_boundary_entity)
bc1_voltage = fem.dirichletbc(PETSc.ScalarType(18750), anode_boundary_dof, V)
### bc2
cathode_boundary_entity = mesh.locate_entities_boundary(domain, 1, cathode_marker)
cathode_boundary_dof = fem.locate_dofs_topological(V, 1, cathode_boundary_entity)
bc2_voltage = fem.dirichletbc(PETSc.ScalarType(0), cathode_boundary_dof, V)


##### Create initial condition
def gaussian_distribution(x, y, x0, y0, sigma, amplitude):
    return amplitude * numpy.exp(-((x - x0)**2 + (y - y0)**2) / (sigma**2))
backgroud_density = 1e13
def partical_initial_condition(x):
    return gaussian_distribution(x[0], x[1], 0, 10e-3, 0.4e-3, 5e18) + backgroud_density

partical_n = fem.Function(N)
partical_n.sub(0).interpolate(lambda x: numpy.full(x.shape[1], backgroud_density))
partical_n.sub(1).interpolate(partical_initial_condition)


##### Defining the variational problem
### problem_voltage
air_dielectric = fem.Constant(domain, PETSc.ScalarType(8.85e-12))
elementary_charge = fem.Constant(domain, PETSc.ScalarType(1.6e-19))
partical_n_electron, partical_n_positive = ufl.split(partical_n)
trial_voltage = ufl.TrialFunction(V)
test_voltage = ufl.TestFunction(V)
x = ufl.SpatialCoordinate(domain)
a_voltage = fem.form(dot(grad(trial_voltage), grad(test_voltage)) * x[0] * dx)
L_voltage = fem.form(elementary_charge / air_dielectric
                     * (partical_n_positive-partical_n_electron) * test_voltage * x[0] * dx)
voltage = fem.Function(V)
problem_voltage = LinearProblem(a_voltage, L_voltage, u=voltage,
                                bcs=[bc1_voltage, bc2_voltage],
                                petsc_options={"ksp_type": "cg",
                                               })

### problem_partical
# compute difussion coefficient
E_mod = sqrt(dot(-grad(voltage), -grad(voltage)))
# compute particle velocity
velocity = grad(voltage)

trial_electron, trial_positive = ufl.TrialFunctions(N)
test_electron, test_positive = ufl.TestFunctions(N)
dt = fem.Constant(domain, 2e-12)

def axisymmetric_divergence(u):
    div_u = (1 / x[0]) * (x[0] * u[0]).dx(0) + u[1].dx(1)
    return div_u
F1 = (trial_electron - partical_n_electron) / dt * test_electron * x[0] * dx
F1 += axisymmetric_divergence(trial_electron * velocity) * test_electron * x[0] * dx
F2 = (trial_positive - partical_n_positive) / dt * test_positive * x[0] * dx
F = F1 + F2
a_partical = fem.form(lhs(F))
L_partical = fem.form(rhs(F))
partical = fem.Function(N)
problem_partical = LinearProblem(a_partical, L_partical, u=partical,
                                 petsc_options={"ksp_type": "gmres",
                                                "ksp_rtol": 1e-6})


xdmf_partical_electron = XDMFFile(MPI.COMM_WORLD, "result2.xdmf", "w")
xdmf_partical_electron.write_mesh(domain)

T = 0.2e-9
t = 0
i = 0
while t < T:

    problem_voltage.solve()

    xdmf_partical_electron.write_function(partical_n.sub(0), t)

    t = t + dt.value
    i = i + 1

    problem_partical.solve()
    partical_n.x.array[:] = partical.x.array

The results in XDMF file looks like this:


The one on the left is the result of single core running, while the one on the right is 2-core running. It can be intuitively seen that there is a strange numerical phenomenon in the middle of the 2-core result.

Another point is that if we change "ksp_rtol": 1e-6 to "ksp_rtol": 1e-12, the running results of single core and multi-core can be consistent. So I suspect that the difference in results may be related to the way KSP works, as I have checked my simple code several times and do not think there is a problem.

Can anyone help? Thanks in advance!

Looks like it is due to the loose solver tolerance.

It’s usually helpful to start with a direct (LU) solver for small problems to eliminate any issues related to the iterative solver.

Thank you for your kind reminder. I tried it and found that using the LU direct solver can make the results of multi-core and single core runs consistent.

The code here has been greatly simplified, with very few cells and therefore very few degrees of freedom. I cannot always use a direct solver because there are tens of thousands of degrees of freedom in the original code, using a direct solver will be much slower than an iterative solver.

You can always make stricter tolerances for your iterative solver.

What Garth probably meant is that to debug the problem, one should always try to fall back to a direct solver (if possible) to determine if the problem is general (i.e. if it fails with a direct solver, then something is probably missing in the code), or if it works with a direct solver, and that one has to investigate what iterative solver to use, what the tolerances should be etc.

Thank you very much for the detailed explanation! It really helps clarify the approach to debugging the issue.