# Hessian derivative check failing in parallel

I believe that the issue I am facing is related to an improper use of scatter_forward as discussed here.

I am attempting to differentiate the functional J(u) = 1/2 integral (k1 + k2 u^2) grad(u)^T grad(u) dx. Do a finite difference check for consistency that is |(J(u0 + eps uhat) - J(u0)) / eps - gradJ(u_0)^T uhat| = O(eps). This seems to be done correctly in serial and parallel as indicated by my finite difference plots. I try to do the same finite difference check but for the Hessian of J and it work in serial but not in parallel. That is I check that || (gradJ(u0 + eps * uhat) - gradJ(u0)) / eps - HessianJ(u0) * uhat || = eps.

Any suggestions would be appreciated and if someone can point me to more details on what is occuring with scatter_forward and scatter_backward operations that would be greatly appreciated.

# Import FEniCSx
import dolfinx as dl
import ufl

from mpi4py import MPI
from petsc4py import PETSc

import numpy as np
import matplotlib.pyplot as plt

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Define the finite element mesh. The mesh size h is 1/nx
nx = 16
ny = nx
mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle)

# Define the finite element space V_h as the space of piecewise linear functions on the elements of the mesh.
degree = 1
Vh = dl.fem.FunctionSpace(mesh, ("CG", degree))

# define the homogeneous Dirichlet boundary condition at x = 0
facet_dim = mesh.topology.dim-1
facets_D = dl.mesh.locate_entities_boundary(mesh, dim=facet_dim, \
marker=lambda x: np.isclose(x, 0.0))
dofs_D = dl.fem.locate_dofs_topological(V=Vh, entity_dim=facet_dim, entities=facets_D)
u_bc = dl.fem.Constant(mesh, PETSc.ScalarType(0.0))
bcs = [dl.fem.dirichletbc(u_bc, dofs_D, Vh)]

# ----------- define the functional ------------------
# J(u) = \int_{\Omega} (0.5 * (k1 + k2 u^2) grad(u)^T grad(u) dx
uh  = dl.fem.Function(Vh)
k1 = dl.fem.Constant(mesh, PETSc.ScalarType(0.05))
k2 = dl.fem.Constant(mesh, PETSc.ScalarType(1.0))
J = dl.fem.form(Jform)

u0 = dl.fem.Function(Vh)
u0.interpolate(lambda x: x*(x-1)*x*(x-1))

# --- evaluate J at u = u0
uh.vector.aypx(0.0, u0.vector) # uh = u0 + 0.0 * uh
J0 = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

# evaluate the gradient of J at u = u0

# set up a random vector that obeys the Dirichlet conditions
uhat =  dl.la.create_petsc_vector(Vh.dofmap.index_map, Vh.dofmap.index_map_bs)
uhat.setRandom()
dl.fem.petsc.set_bc(uhat, bcs)

n_eps = 32
epss = 1e-2*np.power(2., -np.arange(n_eps))
for i, eps in enumerate(epss):
uh.vector.aypx(0.0, u0.vector)
uh.vector.axpy(eps, uhat) # uh = uh + eps[i]*dir
Jplus = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

if rank == 0:
plt.loglog(epss, (.5*fdgrad_residual/epss)*epss, "-.k", label=r"First Order, $\propto\varepsilon$")
plt.title("Finite difference check of the first variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{1}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()

H = dl.fem.form(Hform)

# ---- evaluate the Hessian of J at u0
uh.vector.aypx(0.0, u0.vector)
H_0 = dl.fem.petsc.assemble_matrix(H, bcs)
H_0.assemble()

# ----- compute H(u0) * udir
H_0udir   = H_0.createVecLeft()
H_0.mult(uhat, H_0udir)
H_0udir.assemble()

# -----
fdH_residuals = np.zeros(n_eps)
for i, eps in enumerate(epss):
# grad(J) evaluated at u = u0 + eps * uhat
uh.vector.aypx(0.0, u0.vector)
uh.vector.axpy(eps, uhat)

# ------- diff_grad = [ dJ(u0 + eps * uhat) - dJ(u0) ] / eps - H(u0) * uhat
# ------ || diff_grad ||_2 = O(eps)

if rank == 0:
plt.figure()
plt.loglog(epss, fdH_residuals, "-ob", label="Error Hessian")
plt.loglog(epss, (.5*fdH_residuals/epss)*epss, "-.k", label="First Order")
plt.title("Finite difference check of the second variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{2}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()


My understanding of this is that all computations in parallel with MPI_COMMWORLD must deal with ghost cells.

In a nutcase, ghost cells duplicate data, allowing each processor to access the values they need on other processors for computations such as finite differences.

This means that you cannot expect your computations to work as intended without using scatter_forward once in a while.

There’s more info (and a nice illustration) there : c - process to process computation and communication in mpi - Stack Overflow. This is not a dolfinx thing, really a parallel thing.

I think scatter_forward refers to swaps where the processor owning the original data updates the ghost cells on other processors and that’s the most useful one for usual operations.

Why is dolfinx coded this way, with code working in serial but failing in parallel ? The only reason I can see is that this updating lf the ghost cells is expensive, especially for a large number of processors, so it’s nice to have control over it I guess.

This is the exact reason for making these computations explicit.

Note that the operations in the post above are petsc4py operations.

A tutorial on mpi (in general) and in the context of dolfinx can be found at Parallel Computations with Dolfinx using MPI — MPI tutorial

It is not just Ghost cells, but ghost, vertices, facets, dofs etc.

Whenever the code above uses axpy, it should have a reverse mode communication, see for instance oasisx/fracstep.py at main · ComputationalPhysiology/oasisx · GitHub

1 Like

I would also like to ammend to my previous post with noting that vector operations can be done with less communication (But not None) if one uses the u.x which is a dolfinx::la::Vector, Where you have explicit access to ghost values, which is a bit more complicated with the PETSc vector.

Whenever the code above uses axpy, it should have a reverse mode communication

Why would axpy require a ghost update with reverse scatter with insert mode “add” (from the quoted example)? After axpy, scatter forward could be used to update the ghost values to be consistent with the owned values. But if the ghost values of the axpy’ed vectors are nonzero, scatter reverse/add would clearly lead to wrong results.
Moreover, shouldn’t this operation only ever be necessary after assemble_vector?

I agree with this, and this is not done in the original code.

Thank you @dokken, @dajuno and @hawkspar for all this information. I have been slowly parsing it.

So the complication seems to be in how to appropriately deal with ghost cells. Is it expected that the issue should go away if I do not use ghost cells as in mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle, ghost_mode = dl.cpp.mesh.GhostMode.none)?

You cannot expect to obtain the same value in serial and parallel without using ghost cells. Ghost cells are the canonical way to give to cells bordering the domain of another processor the information they need to compute derivatives at the border.

If you want to leverage parallelism without ghost cells, the only thing I can tell you is to look into MPI_COMMSELF, which is a totally different paradigm. It means that instead of splitting your mesh in different parts to solve it faster, each processor has a full mesh but solves a different problem on it. Obviously that can give rise to other problems, mostly about RAM use, but that’s an option with no ghost cells.

Ghost cells are only needed when you use interior facet integrals (the dS measure).

They are not needed in general (Lets say for a normal Poisson problem).

I have a simpler case for which I am not trying to use Dirichlet conditions at all and face the same problem.

It seems most appropriate to update the gradient as vector.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE) but I am not sure what is most appropriate for the Hessian vector product, is there something I should do to the vector before I apply it to the Hessian?

# Import FEniCSx
import dolfinx as dl
import ufl

from mpi4py import MPI
from petsc4py import PETSc

import numpy as np
import matplotlib.pyplot as plt

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Define the finite element mesh. The mesh size h is 1/nx
nx = 16
ny = nx
mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle)
# Create mesh facet entities and conncectivity between facets and cells
facet_dim = mesh.topology.dim - 1

mesh.topology.create_entities(facet_dim)
mesh.topology.create_connectivity(facet_dim, mesh.topology.dim)

# Define the finite element space V_h as the space of piecewise linear functions on the elements of the mesh.
degree = 1
Vh = dl.fem.FunctionSpace(mesh, ("CG", degree))

# ----------- define the functional ------------------
# J(u) = \int_{\Omega} (0.5 * (k1 + k2 u^2) grad(u)^T grad(u) dx
uh  = dl.fem.Function(Vh)
k1 = dl.fem.Constant(mesh, PETSc.ScalarType(0.05))
k2 = dl.fem.Constant(mesh, PETSc.ScalarType(1.0))
J = dl.fem.form(Jform)

u0 = dl.fem.Function(Vh)
u0.interpolate(lambda x: x*(x-1)*x*(x-1))

# --- evaluate J at u0
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
J0 = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

# evaluate the gradient of J at u = u0

# set up a random vector that obeys the Dirichlet conditions
uhat = dl.fem.Function(Vh)
uhat.vector.setRandom()

n_eps = 32
epss = 1e-2*np.power(2., -np.arange(n_eps))
for i, eps in enumerate(epss):
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
uh.vector.axpy(eps, uhat.vector) # uh = uh + eps[i]*dir
Jplus = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

if rank == 0:
plt.loglog(epss, (.5*fdgrad_residual/epss)*epss, "-.k", label=r"First Order, $\propto\varepsilon$")
plt.title("Finite difference check of the first variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{1}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()

# -------- determine the Hessian and evaluate it at u0
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
H = dl.fem.form(Hform)
H0 = dl.fem.petsc.assemble_matrix(H, [])
H0.assemble()

# -------- evaluate the gradient at u0

## ----- compute H(u0) * udir
H0_udir = dl.fem.Function(Vh)
H0.mult(uhat.vector, H0_udir.vector)
#H0_udir.x.scatter_forward()

## ----- finite difference check to verify Hessian
fdH_residuals = np.zeros(n_eps)

for i, eps in enumerate(epss):
# grad(J) evaluated at u = u0 + eps * uhat
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
uh.vector.axpy(eps, uhat.vector)

# diff_grad = [ (grad(u0 + eps * uhat) - grad(u0)) / eps - H(u0) * uhat]

# record finite difference approximation error

if rank == 0:
plt.figure()
plt.loglog(epss, fdH_residuals, "-ob", label="Error Hessian")
plt.loglog(epss, (.5*fdH_residuals/epss)*epss, "-.k", label="First Order")
plt.title("Finite difference check of the second variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{2}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()


So ghost cells are not needed for this problem or is there some underlying use of the dS measure with the calls to ufl.derivative?

No, ufl.derivative does not change the integration measures. You still Need Ghost updating Even if you do not have Ghost cells, as you have shared degrees of freedom at partition boundaries.

2 Likes

I’ve tried a number of additional things adding extra communication calls but it hasn’t fixed the issue.

I suppose the issue is that I still don’t see how I should treat updating the derivative and the Hessian vector products differently.

# Import FEniCSx
import dolfinx as dl
import ufl

from mpi4py import MPI
from petsc4py import PETSc

import numpy as np
import matplotlib.pyplot as plt

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Define the finite element mesh. The mesh size h is 1/nx
nx = 16
ny = nx
mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle, ghost_mode = dl.cpp.mesh.GhostMode.shared_facet)
# Create mesh facet entities and conncectivity between facets and cells
facet_dim = mesh.topology.dim - 1

mesh.topology.create_entities(facet_dim)
mesh.topology.create_connectivity(facet_dim, mesh.topology.dim)

# Define the finite element space V_h as the space of piecewise linear functions on the elements of the mesh.
degree = 1
Vh = dl.fem.FunctionSpace(mesh, ("CG", degree))

# ----------- define the functional ------------------
# J(u) = \int_{\Omega} (0.5 * (k1 + k2 u^2) grad(u)^T grad(u) dx
uh  = dl.fem.Function(Vh)
k1 = dl.fem.Constant(mesh, PETSc.ScalarType(0.05))
k2 = dl.fem.Constant(mesh, PETSc.ScalarType(1.0))
J = dl.fem.form(Jform)

u0 = dl.fem.Function(Vh)
u0.interpolate(lambda x: x*(x-1)*x*(x-1))

# --- evaluate J at u0
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
J0 = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

# evaluate the gradient of J at u = u0

# set up a random vector that obeys the Dirichlet conditions
uhat = dl.fem.Function(Vh)
uhat.vector.setRandom()

n_eps = 32
epss = 1e-2*np.power(2., -np.arange(n_eps))
for i, eps in enumerate(epss):
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
uh.vector.axpy(eps, uhat.vector) # uh = uh + eps[i]*dir
Jplus = mesh.comm.allreduce(dl.fem.assemble_scalar(J), op = MPI.SUM)

if rank == 0:
plt.loglog(epss, (.5*fdgrad_residual/epss)*epss, "-.k", label=r"First Order, $\propto\varepsilon$")
plt.title("Finite difference check of the first variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{1}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()

# -------- determine the Hessian and evaluate it at u0
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
H = dl.fem.form(Hform)
H0 = dl.fem.petsc.assemble_matrix(H, [])
H0.assemble()

# -------- evaluate the gradient at u0

## ----- compute H(u0) * udir
H0_udir = dl.fem.Function(Vh)
H0.mult(uhat.vector, H0_udir.vector)

## ----- finite difference check to verify Hessian
fdH_residuals = np.zeros(n_eps)

for i, eps in enumerate(epss):
# grad(J) evaluated at u = u0 + eps * uhat
uh.vector.zeroEntries()
uh.vector.axpy(1.0, u0.vector)
uh.vector.axpy(eps, uhat.vector)

# diff_grad = [ (grad(u0 + eps * uhat) - grad(u0)) / eps - H(u0) * uhat]

# record finite difference approximation error

if rank == 0:
plt.figure()
plt.loglog(epss, fdH_residuals, "-ob", label="Error Hessian")
plt.loglog(epss, (.5*fdH_residuals/epss)*epss, "-.k", label="First Order")
plt.title("Finite difference check of the second variation")
plt.xlabel(r"$\varepsilon$")
plt.ylabel(r"$r_{2}(\varepsilon)$, finite-difference error")
plt.legend(loc = "upper left")
plt.show()


If you could simplify your code to only contain the a vector hessian product (i.e. the operations you want to do that doesn’t work in parallel), I can have a look. Currently there are way too many operations here for me to sit down and go through them all.

# Import FEniCSx
import dolfinx as dl
import ufl

from mpi4py import MPI
from petsc4py import PETSc

import numpy as np
import matplotlib.pyplot as plt

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# Define the finite element mesh. The mesh size h is 1/nx
nx = 16
ny = nx
mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle)
# Create mesh facet entities and conncectivity between facets and cells
facet_dim = mesh.topology.dim - 1

mesh.topology.create_entities(facet_dim)
mesh.topology.create_connectivity(facet_dim, mesh.topology.dim)

# Define the finite element space V_h as the space of piecewise linear functions on the elements of the mesh.
degree = 1
Vh = dl.fem.FunctionSpace(mesh, ("CG", degree))

# ----------- define the functional ------------------
# J(u) = \int_{\Omega} (0.5 * (k1 + k2 u^2) grad(u)^T grad(u) dx
uh  = dl.fem.Function(Vh)
k1 = dl.fem.Constant(mesh, PETSc.ScalarType(0.05))
k2 = dl.fem.Constant(mesh, PETSc.ScalarType(1.0))
J = dl.fem.form(Jform)

# --------- determine gradient and Hessian ---------------

H = dl.fem.form(Hform)

# -------- evaluate uh at specified function --------------
uh.interpolate(lambda x: x*(x-1)*x*(x-1))

# --------- evaluate Hessian at uh
H0 = dl.fem.petsc.assemble_matrix(H, [])
H0.assemble()

# -------- apply a random vector to the Hessian
uhat = dl.fem.Function(Vh)
H0_uhat = dl.fem.Function(Vh)
uhat.vector.setRandom()

# --------- ensure consistency of ghost values of uhat (primal vector)

# -------- compute Hessian-vector product
H0.mult(uhat.vector, H0_uhat.vector)

# --------- since Hessian maps to dual space accumulate entries as one would do for a gradient (linear form)


I’m not sure how helpful it will be to have just this code, since the issue is an inconsistency with the gradient and Hessian.

Recall that my goal is to verify |(J(x_{0}+\varepsilon \hat{x})-J(x_{0}))/\varepsilon-\nabla J\cdot\hat{x}|=\mathcal{O}(\varepsilon) and \|(\nabla J(x_{0}+\varepsilon \hat{x})-\nabla J(x_{0}))/\varepsilon-\nabla^{2} J\cdot\hat{x}\|_{2}=\mathcal{O}(\varepsilon), so unfortunately there needs to be a sweep over \varepsilon values and both Hessian \nabla^{2}J and gradient \nabla J need to be computed.

Consider the minimal following example, which gives the same vector in serial and parallel, and is cross-verified by computing the matrix-vector product with ufl:

# Import FEniCSx
import dolfinx as dl
import ufl

from mpi4py import MPI
from petsc4py import PETSc

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

nx = 16
ny = nx
mesh = dl.mesh.create_unit_square(comm, nx, ny, dl.mesh.CellType.triangle)
V = dl.fem.FunctionSpace(mesh, ("Lagrange", 1))
uh = dl.fem.Function(V)
du = dl.fem.Function(V)
dv = dl.fem.Function(V)

Jh = 0.5*ufl.inner(uh,uh)*ufl.dx
dJdu = ufl.derivative(Jh, uh)
dJdudv = ufl.derivative(dJdu, uh)

H = dl.fem.form(dJdudv)

# -------- evaluate uh at specified function --------------
uh.interpolate(lambda x: x*(x-1)*x*(x-1))
du.interpolate(lambda x: x+x)
dv.interpolate(lambda x: 3*x+2*x)

# --------- evaluate Hessian at uh
H0 = dl.fem.petsc.assemble_matrix(H, [])
H0.assemble()

# -------- compute Hessian-vector product
H0_du = dl.fem.Function(V)
H0.mult(du.vector, H0_du.vector)

dJdudv_du = ufl.action(dJdudv, du)
H_du = dl.fem.Function(V)
dl.fem.petsc.assemble_vector(H_du.vector, dl.fem.form(dJdudv_du))

root@dokken-XPS-15-9560:/fenics/shared# python3 mwe123.py