Different results NewtonSolve parallel mpi.comm_world

Dear all,

I’m very new to fenicsx and mpi. I try to run my code in parallel as I’m going to use big meshes.
I broke down the code to a small example in order to compare the results for c for running it on one core and on multiple cores.

Somehow, I get every time I run the code in parallel (randomly) a different result (3 different possible solutions).

Maybe I’m missing out some essential part… I also tried to update the ghosts and tried set_bc…same result.

I’d appreciate some help and advice! Thank you!

import numpy as np
from mpi4py import MPI
from dolfinx import mesh, fem, default_scalar_type, nls, plot
from dolfinx.mesh import create_mesh, create_unit_cube
from dolfinx.fem import functionspace
from dolfinx import *
import dolfinx.nls.petsc
import ufl
from ufl import SpatialCoordinate
from mmapy import gcmmasub, mmasub, subsolv, asymp, concheck, raaupdate
import dolfinx
from petsc4py import PETSc


#create mesh / create_unit_cube: cubic mesh with corners [0,0,0] and [1,1,1]
domain = mesh.create_unit_cube(MPI.COMM_WORLD, 20, 20, 20, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
#
#, ghost_mode=dolfinx.cpp.mesh.GhostMode.none
tdim = domain.topology.dim
fdim = tdim - 1
#domain.topology.create_entities(fdim)
domain.topology.create_connectivity(fdim, tdim)

#define function space
V = functionspace(domain, ("Lagrange", 1))

#concentration c, porosity eps
eps = fem.Function(V) #design variable
phi = ufl.TestFunction(V)

eps.x.array[:] = 0.5 #uD = fem.Function(V)

##Dirichlet 
def on_boundary(x):
    return np.isclose(x[0], 0)
dofs_L = fem.locate_dofs_geometrical(V , on_boundary)
bc_L = fem.dirichletbc(default_scalar_type(1), dofs_L, V) ##Dirichlet = 1

#Parameter
D0 = 1;
k0 = -1.0;
eta2 = 2.0;
gamma = 2.0;
Rfl = 0.0072

for loop in range(0,11):
    c = fem.Function(V)
    f = fem.Constant(domain, default_scalar_type(0))
    R = ufl.inner(D0*eps**eta2*ufl.grad(c), ufl.grad(phi))*ufl.dx - ufl.inner(k0*(1-eps)**gamma*c,phi)*ufl.dx - ufl.inner(f,phi) *ufl.ds 
    R_form = dolfinx.fem.form(R)

    problem_u = fem.petsc.NonlinearProblem(R, c, bcs=[bc_L])

    solver = nls.petsc.NewtonSolver(domain.comm,problem_u)

    uh = solver.solve(c)

    c.x.scatter_forward()
    
    np.save('c',c.x.array)

If you refer to

Giving different solutions in parallel, this is not suprising, as when the code is executed with MPI the mesh is distributed, which in turn means that the solution is distributed across the requested processes. No single process store the full solution.
Some time ago I posted how to gather solutions: Gather solutions in parallel in FEniCSX - #2 by dokken
and an explanation of parallel distribution at
Parallel Computations with Dolfinx using MPI — MPI tutorial

Dear dokken,

Many thanks!
I implemented the gathering and it seems to work well!
I’m still reading through your posts in order to understand the mpi better, but I have another question. So, I have to gather the solution for rank == 0. Now, if I want to use this gathered solution as an input of another PDE in the same iteration loop, how could I implement that?
Or can I let the whole code run in parallel and gather the final results?
Thank you!

I would never advise gathering results in the way you suggested above (writing out the degrees of freedom as an array).
There are specific output formats (.pvd, .xdmf, .bp) that handles the parallel writing of data related to meshes/functions.

What do you need the dof values for (as they need a context, i.e. their degree of freedom, corresponding cell etc to be of any usage)?

1 Like

Thank you again!
What I actually would like to do is topology optimization, in particular:

  1. solve the equation for c (above)
  2. add the values for c into a sensitivity equation and solve it
  3. use mma / gcmma in order to update a design value
    And ideally do that with multiple cores using mpi.
    So, when I use comm_self it all works fine (as the whole mesh is processed on every core ?). But, when I run it with comm_world, I get odd solutions, which I think is coming from inserting local/partial results instead of the gathered one into the sensitivity and update equations.
    Could you elaborate on the specific output formats, please?

Appreciate your help very much! Thank you!

You should be able to run this in parallel, see for instance

or

http://jsdokken.com/FEniCS-workshop/src/applications/optimal_control.html#the-gradient-of-the-functional

Thank you!

So, if I understand correct, I have to take care of using petsc.assemble and updating ghosts at the right places. As the mesh is divided for all equations used in the code, the calculations should be correct (but divided). Then in the end, I have to gather the solutions for the input needed for the next iteration (as in the case for c above).

Yes. The forward and adjoint computations can be done with a nonlinear or linear solver in DOLFINx. However, when computing the sensitivity, you have to accumulate contributions across processes.

Many thanks!

I have another question…
So, if I use an Expression and interpolate…sth like that:

dJde = dolfinx.fem.Expression(-(k0*gamma*((1-eps)**(gamma-1))*c+ufl.dot(D0*eta2*(eps**(eta2-1))*ufl.grad(c),ufl.grad(lamda))+ufl.inner(k0*gamma*((1-eps)**(gamma-1))*c,lamda)), V.element.interpolation_points(),MPI.COMM_WORLD)
sens = fem.Function(V)
sens.interpolate(dJde)

how can I deal with that in parallel?

And for the iteration loop…
let’s say I let the whole code run in parallel. And every iteration I update equations with the new values for some equation parameters. Do I have to take care of gathering the results (parameters such as c, sensititvity) and add them into the equations after collecting from all process in the next iteration step? Or will all iterations run in divided parts on mulitple cores and I should simply collect my design variable at the end of all iterations?

Thank you!

Interpolation runs in parallel. Nothing is required as long as you interpolate over the whole domain.

This depends on how you set up your problem, and you would have to supply a code of what you want to do for anyone to give you advice.

Thank you very much!

So, in detail, I want to run the following code in parallel. It works well for running it on 1 processor, but the solution gets odd when running it in parallel.

import numpy as np
import pandas as pd
from mpi4py import MPI
from dolfinx import mesh, fem, default_scalar_type, nls, plot
from dolfinx.mesh import create_mesh, create_unit_cube
from dolfinx.fem import functionspace
from dolfinx import *
import dolfinx.nls.petsc
import ufl
from ufl import SpatialCoordinate
import pyvista
from mmapy import gcmmasub, mmasub, subsolv, asymp, concheck, raaupdate
import sys
import matplotlib.pyplot as plt
from petsc4py import PETSc

#create mesh / create_unit_cube: cubic mesh with corners [0,0,0] and [1,1,1]
domain = mesh.create_unit_cube(MPI.COMM_WORLD, 20, 20, 20, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
tdim = domain.topology.dim
fdim = tdim - 1
domain.topology.create_connectivity(fdim, tdim)

#define function space
V = functionspace(domain, ("Lagrange", 1))

#concentration c, porosity eps
c = fem.Function(V)
eps = fem.Function(V) #design variable
phi = ufl.TestFunction(V)

eps.x.array[:] = 0.5 #uD = fem.Function(V)

#Dirichlet
def on_boundary(x):
    return np.isclose(x[0], 0)
dofs_L = fem.locate_dofs_geometrical(V , on_boundary)
bc_L = fem.dirichletbc(default_scalar_type(1), dofs_L, V) ##Dirichlet = 1

#Parameter
D0 = 1;
k0 = -1.0;
eta2 = 2.0;
gamma = 2.0;
Rfl = 0.0072
#Rfl = 0.0106
#Rfl = 0.0018

xval = eps.x.array[np.newaxis].copy().T
xold1 = xval.copy()
xold2 = xval.copy()
move = 0.2

m=1; #number of constraint
n = eps.x.array.size
xmin = np.zeros((n,1))
xmax = np.ones((n,1)) 
low = np.zeros((n,1))
upp = np.ones((n,1))

c2=1000*np.ones((m,1))
d=np.zeros((m,1))
a0=1.0
a=np.zeros((m,1))
volfrac = 0.6

data = pd.DataFrame(columns=['i','aveps','eps','c','J','sen','a','xmin','xmax','low','upp','f0val','df0dx','fval','dfdx'])
glob_J = pd.DataFrame(columns=['glob_J'])
loop=1
change=1

#while change>0.00001 and loop<maxiter:
for loop in range(0,51):
    f = fem.Constant(domain, default_scalar_type(0))
    R = ufl.inner(D0*eps**eta2*ufl.grad(c), ufl.grad(phi))*ufl.dx - ufl.inner(k0*(1-eps)**gamma*c,phi)*ufl.dx - ufl.inner(f,phi) *ufl.ds 
    
    #Forming and solving the linear system
    problem_u = fem.petsc.NonlinearProblem(R, c, bcs=[bc_L])
    solver = nls.petsc.NewtonSolver(MPI.COMM_WORLD,problem_u)
    solver.rtol = 1e-16
    uh = solver.solve(c)

    #solving objective function
    J = -ufl.inner((k0*(1-eps)**gamma),c)*ufl.dx #Objective function
    J_form = fem.form(J)
    J_old = fem.assemble_scalar(J_form)
    global_J = domain.comm.allreduce(J_old, op=MPI.SUM)

    #solving the adjoint vector
    lhs = ufl.adjoint(ufl.derivative(R,c))
    rhs = -ufl.derivative(J,c) 
    def on_boundary_1(x):
        return np.isclose(x[0], 0)
        
    dofs_L_adj = fem.locate_dofs_geometrical(V , on_boundary_1)
    bc_L_adj = fem.dirichletbc(default_scalar_type(0), dofs_L_adj, V) ##Dirichlet = 1
    bcs_adjoint = [bc_L_adj]

    #adjective problem
    problem_adj = fem.petsc.LinearProblem(lhs, rhs, bcs_adjoint, petsc_options={"ksp_type": "gmres", "pc_type": "lu","pc_factor_mat_solver_type": "mumps"})
    lamda = problem_adj.solve()
        
    ##### CALCULATE SENSITIVITY #####
    dJde = dolfinx.fem.Expression(-(k0*gamma*((1-eps)**(gamma-1))*c+ufl.dot(D0*eta2*(eps**(eta2-1))*ufl.grad(c),ufl.grad(lamda))+ufl.inner(k0*gamma*((1-eps)**(gamma-1))*c,lamda)), V.element.interpolation_points(),domain.comm)
    sens = fem.Function(V)
    sens.interpolate(dJde)

    # Sensitivities
    df0dx=sens.x.array[np.newaxis].copy().T

    voltot = 1*ufl.dx(domain=domain)
    voltot_form = fem.form(voltot)
    voltot_val = fem.assemble_scalar(voltot_form)
    constraint = eps*1*ufl.dx
    constraint_form = fem.form(constraint)
    constraint_val = fem.assemble_scalar(constraint_form)
    h_c = constraint_val/(volfrac*voltot_val)-1
    
    #MMA optimization
    f0val= np.array([[J_old]])
    fval = h_c

    constraint_der = ufl.derivative(constraint,eps)
    constraint_der_form = fem.form(constraint_der)
    dfdxval = fem.assemble_vector(constraint_der_form)
    dfdxval2 = dfdxval.array[np.newaxis]/(volfrac*voltot_val)
    dfdx = dfdxval2
    
    xmma,ymma,zmma,lam,xsi,eta,mu,zet,s,low,upp = mmasub(m,n,loop,xval,xmin,xmax,xold1,xold2,f0val,df0dx,fval,dfdx,low,upp,a0,a,c2,d,move)
    xold2=xold1.copy()
    xold1=xval.copy()
    change=np.max(abs(xval-xmma))
    xval=xmma.copy()
    eps.x.array[:] = xmma.flatten()

    # determine new value for the design variables 
    
    eps1 = fem.Function(V)
    phi_3 = ufl.TestFunction(V)
    filter = ufl.inner((Rfl**2)*ufl.grad(eps1), ufl.grad(phi_3))*ufl.dx + ufl.inner(eps1,phi_3)*ufl.dx - ufl.inner(eps,phi_3)*ufl.dx
    filter_eps = fem.petsc.NonlinearProblem(filter, eps1)
    solver = nls.petsc.NewtonSolver(MPI.COMM_WORLD,filter_eps)
    solver.rtol = 1e-16
    solver.max_it = 10000
    solver.solve(eps1)
    eps.interpolate(eps1)
    
    eps.x.array[eps.x.array > 1] = 1
    eps.x.array[eps.x.array < 0] = 0
    
    if domain.comm.rank == 0:
        # Mesh on one proc
        mesh0 = mesh.create_unit_cube(MPI.COMM_SELF, 20, 20, 20, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
        V0 = functionspace(mesh0, ("Lagrange", 1))
        x0 = V0.tabulate_dof_coordinates()
        u0 = fem.Function(V0)
    
    #eps.x.scatter_forward()
    #c.x.scatter_forward()
    #sens.x.scatter_forward()
    #sens.array.scatter.forward()
    
    imap2 = eps.function_space.dofmap.index_map
    imapc = c.function_space.dofmap.index_map
    imapsens = sens.function_space.dofmap.index_map

    local_range2 = np.asarray(imap2.local_range, dtype=np.int32) * \
    eps.function_space.dofmap.index_map_bs
    size_global2 = imap2.size_global * eps.function_space.dofmap.index_map_bs

    local_rangec = np.asarray(imapc.local_range, dtype=np.int32) * \
    c.function_space.dofmap.index_map_bs
    size_globalc = imapc.size_global * c.function_space.dofmap.index_map_bs

    local_rangesens = np.asarray(imapsens.local_range, dtype=np.int32) * \
    sens.function_space.dofmap.index_map_bs
    size_globalsens = imapsens.size_global * sens.function_space.dofmap.index_map_bs
    
    ranges2 = domain.comm.gather(local_range2, root=0)
    data2 = domain.comm.gather(eps.x.petsc_vec.array, root=0)

    rangesc = domain.comm.gather(local_rangec, root=0)
    datac = domain.comm.gather(c.x.petsc_vec.array, root=0)

    rangessens = domain.comm.gather(local_rangesens, root=0)
    datasens = domain.comm.gather(sens.x.petsc_vec.array, root=0)
    
    x2 = V.tabulate_dof_coordinates()[:imap2.size_local]
    x2_glob = domain.comm.gather(x2, root=0)

    xc = V.tabulate_dof_coordinates()[:imapc.size_local]
    xc_glob = domain.comm.gather(xc, root=0)

    xsens = V.tabulate_dof_coordinates()[:imapsens.size_local]
    xsens_glob = domain.comm.gather(xsens, root=0)

    if domain.comm.rank == 0:

        global_array2 = np.zeros(size_global2)
        for r2, d2 in zip(ranges2, data2):
            global_array2[r2[0]:r2[1]] = d2

        global_arrayc = np.zeros(size_globalc)
        for rc, dc in zip(rangesc, datac):
            global_arrayc[rc[0]:rc[1]] = dc

        global_arraysens = np.zeros(size_globalsens)
        for rsens, dsens in zip(rangessens, datasens):
            global_arraysens[rsens[0]:rsens[1]] = dsens
            
        # Create array with all coordinates
        global_x2 = np.zeros((size_global2, 3))
        for r2, x2_ in zip(ranges2, x2_glob):
            global_x2[r2[0]:r2[1], :] = x2_

        global_xc = np.zeros((size_globalc, 3))
        for rc, xc_ in zip(rangesc, xc_glob):
            global_xc[rc[0]:rc[1], :] = xc_

        global_xsens = np.zeros((size_globalsens, 3))
        for rsens, xsens_ in zip(rangessens, xsens_glob):
            global_xsens[rsens[0]:rsens[1], :] = xsens_
        
        serial_to_global2 = []
        for coord in x0:
            serial_to_global2.append(np.abs(global_x2-coord).sum(axis=1).argmin())

        serial_to_globalc = []
        for coord in x0:
            serial_to_globalc.append(np.abs(global_xc-coord).sum(axis=1).argmin())

        serial_to_globalsens = []
        for coord in x0:
            serial_to_globalsens.append(np.abs(global_xsens-coord).sum(axis=1).argmin())
            
        # Create sorted array from
        u_from_global2 = np.zeros(global_array2.shape)
        u_from_globalc = np.zeros(global_arrayc.shape)
        u_from_globalsens = np.zeros(global_arraysens.shape)
        for serial, glob in enumerate(serial_to_global2):
            u_from_global2[serial] = global_array2[glob]
        for serialc, globc in enumerate(serial_to_globalc):
            u_from_globalc[serialc] = global_arrayc[globc]
        for serialsens, globsens in enumerate(serial_to_globalsens):
            u_from_globalsens[serialsens] = global_arraysens[globsens]

        i_d = loop
        eps_mean = np.mean(u_from_global2)
        eps_d = u_from_global2.copy()
        c_d = u_from_globalc.copy()
        sen_d = u_from_globalsens.copy()
        glob_J.loc[loop] = [global_J]
        data.loc[loop] = [i_d,eps_mean,eps_d,c_d,global_J,sen_d,a,xmin,xmax,low,upp,f0val,df0dx,fval,dfdx]
    
    # Create local VTK mesh data structures
    topology, cell_types, geometry = plot.vtk_mesh(V)
    num_cells_local = domain.topology.index_map(domain.topology.dim).size_local
    num_dofs_local = V.dofmap.index_map.size_local * V.dofmap.index_map_bs
    
    topology_dofs = (np.arange(len(topology)) % (num_dofs_per_cell+1)) != 0

    # Map to global dof indices
    global_dofs = V.dofmap.index_map.local_to_global(topology[topology_dofs].copy())
    # Overwrite topology
    topology[topology_dofs] = global_dofs
    # Choose root
    root = 0

    # Gather data
    global_topology = domain.comm.gather(topology[:(num_dofs_per_cell+1)*num_cells_local], root=root)
    global_geometry = domain.comm.gather(geometry[:V.dofmap.index_map.size_local,:], root=root)
    global_ct = domain.comm.gather(cell_types[:num_cells_local])
    global_vals = domain.comm.gather(eps.x.array[:num_dofs_local])
    
    if domain.comm.rank == root and loop%10 == 0:
        # Stack data
        root_geom = np.vstack(global_geometry)
        root_top = np.concatenate(global_topology)
        root_ct = np.concatenate(global_ct)
        root_vals = np.concatenate(global_vals)

        # Plot as in serial
        pyvista.OFF_SCREEN = True
        grid = pyvista.UnstructuredGrid(root_top, root_ct, root_geom)
        grid.point_data["eps"] = root_vals
        grid.set_active_scalars("eps")
        u_plotter = pyvista.Plotter()
        u_plotter.add_mesh(grid, show_edges=False)
        #u_plotter.view_xy()
        u_plotter.screenshot("figure.png")
        u_plotter.save_graphic('3Dtpp3it_' + str(loop) + ".pdf")

The idea is to calculate the concentration c in each iteration step (fem.petsc.NonlinearProblem(R, c, bcs=[bc_L]), which is then used to calculate the sensitivity (lamda solve from fem.petsc.LinearProblem and interpolation of the Expression dJde). Then the design variable eps is updated by mma and a filter is applied. Then it should do the same in the next iteration steps.

In order to have the gathered values after each iteration, in the end values are collected and saved in a dataframe. In addition the solution is plotted every 10th iteration.

Thank you!

You are not summing up these two values over all the processes, as you did in

In general this code is fairly long, and you should work your watt backwards, checking that intermediate variables are consistent before continuing adding complexity.

Thank you!

Yes, I’m not summing up the two values as I’m not interested in them (in the end the summed up values just go to the dataframe).

I tried to remove complexity (like just looking at the sensitivity), but could not find the problem.

It seems like in further iterations, the sensitivity values seems to differ from the one obtained running it on one core. In order to see what happens in higher iterations, the whole code has to run as the for the updated values, the new values has to go into the sensitivity equation and the update function.
I was just wondering about if the updates after the solving steps are correct and if the plotting actually works.

I also wonder if I have to use fem.petsc.assemble_vector here (it seems like the length of the vector is changing in comparison to fem.assemble_vector (also after updating ghosts):

constraint_der = ufl.derivative(constraint,eps)
constraint_der_form = fem.form(constraint_der)
dfdxval = fem.assemble_vector(constraint_der_form)
dfdxval2 = dfdxval.array[np.newaxis]/(volfrac*voltot_val)
dfdx = dfdxval2

I still try to solve the issue. Here is a min example where the sensitivities are calculated.
Then the solution is gathered in order to compare the results.

import numpy as np
import pandas as pd
from mpi4py import MPI
from dolfinx import mesh, fem, default_scalar_type, nls, plot
from dolfinx.mesh import create_mesh, create_unit_cube
from dolfinx.fem import functionspace
from dolfinx import *
import dolfinx.nls.petsc
import ufl
from ufl import SpatialCoordinate
import pyvista
from mmapy import gcmmasub, mmasub, subsolv, asymp, concheck, raaupdate
import sys
import matplotlib.pyplot as plt
from petsc4py import PETSc

#create mesh / create_unit_cube: cubic mesh with corners [0,0,0] and [1,1,1]
domain = mesh.create_unit_cube(MPI.COMM_WORLD, 3, 3, 3, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
#domain = mesh.create_unit_cube(MPI.COMM_WORLD, 20, 20, 20, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.none)
tdim = domain.topology.dim
fdim = tdim - 1
#domain.topology.create_entities(fdim)
#domain.topology.create_connectivity(fdim, tdim)

#define function space
V = functionspace(domain, ("Lagrange", 1))

#concentration c, porosity eps
c = fem.Function(V)
eps = fem.Function(V) #design variable
phi = ufl.TestFunction(V)

eps.x.array[:] = 0.5 #uD = fem.Function(V)

#Parameter
D0 = 1.0
k0 = -1.0
eta2 = 2.0
gamma = 2.0

for loop in range(0,31):
    f = fem.Constant(domain, default_scalar_type(0))
    R = ufl.inner(D0*eps**eta2*ufl.grad(c), ufl.grad(phi))*ufl.dx - ufl.inner(k0*(1-eps)**gamma*c,phi)*ufl.dx - ufl.inner(f,phi)*ufl.ds 
    
    def on_boundary(x):
        return np.isclose(x[0], 0)
    dofs_L = fem.locate_dofs_geometrical(V , on_boundary)
    bc_L = fem.dirichletbc(default_scalar_type(1), dofs_L, V) 

    #Forming and solving the linear system
    problem_u = fem.petsc.NonlinearProblem(R, c, bcs=[bc_L])
    solver = nls.petsc.NewtonSolver(MPI.COMM_WORLD,problem_u)
    solver.rtol = 1e-16
    solver.solve(c)

    #solving objective function
    J = -ufl.inner((k0*(1-eps)**gamma),c)*ufl.dx #Objective function
    J_form = fem.form(J)
    J_old = fem.assemble_scalar(J_form)

    #solving the adjoint vector
    lhs = ufl.adjoint(ufl.derivative(R,c))
    rhs = -ufl.derivative(J,c) 
    def on_boundary_1(x):
        return np.isclose(x[0], 0)
    
    dofs_L_adj = fem.locate_dofs_geometrical(V , on_boundary_1)
    bc_L_adj = fem.dirichletbc(default_scalar_type(0), dofs_L_adj, V) ##Dirichlet = 1
    bcs_adjoint = [bc_L_adj]

    #adjective problem
    problem_adj = fem.petsc.LinearProblem(lhs, rhs, bcs_adjoint, petsc_options={"ksp_type": "gmres", "pc_type": "lu","pc_factor_mat_solver_type": "mumps"})

    lamda = problem_adj.solve()
        
    ##### CALCULATE SENSITIVITY #####
    dJde = dolfinx.fem.Expression(-(k0*gamma*((1-eps)**(gamma-1))*c+ufl.dot(D0*eta2*(eps**(eta2-1))*ufl.grad(c),ufl.grad(lamda))+ufl.inner(k0*gamma*((1-eps)**(gamma-1))*c,lamda)), V.element.interpolation_points(),MPI.COMM_WORLD)
    sens = fem.Function(V)
    sens.interpolate(dJde)

    if domain.comm.rank == 0:
        # Mesh on one proc
        mesh0 = mesh.create_unit_cube(MPI.COMM_SELF, 3, 3, 3, mesh.CellType.hexahedron, ghost_mode=dolfinx.cpp.mesh.GhostMode.shared_facet)
        V0 = functionspace(mesh0, ("Lagrange", 1))
        x0 = V0.tabulate_dof_coordinates()
        u0 = fem.Function(V0)
    
    imapsens = sens.function_space.dofmap.index_map
    local_rangesens = np.asarray(imapsens.local_range, dtype=np.int32) * \
    sens.function_space.dofmap.index_map_bs
    size_globalsens = imapsens.size_global * sens.function_space.dofmap.index_map_bs

    rangessens = domain.comm.gather(local_rangesens, root=0)
    datasens = domain.comm.gather(sens.x.petsc_vec.array, root=0)

    xsens = V.tabulate_dof_coordinates()[:imapsens.size_local]
    xsens_glob = domain.comm.gather(xsens, root=0)

    if domain.comm.rank == 0:

        global_arraysens = np.zeros(size_globalsens)
        for rsens, dsens in zip(rangessens, datasens):
            global_arraysens[rsens[0]:rsens[1]] = dsens

        global_xsens = np.zeros((size_globalsens, 3))
        for rsens, xsens_ in zip(rangessens, xsens_glob):
            global_xsens[rsens[0]:rsens[1], :] = xsens_

        serial_to_globalsens = []
        for coord in x0:
            serial_to_globalsens.append(np.abs(global_xsens-coord).sum(axis=1).argmin())
            
        # Create sorted array from
        u_from_globalsens = np.zeros(global_arraysens.shape)

        for serialsens, globsens in enumerate(serial_to_globalsens):
            u_from_globalsens[serialsens] = global_arraysens[globsens]

        sens_d = u_from_globalsens

        np.save('sens_d4',sens_d)

Now, when I run the code on 1 core and on 4 cores and then compare the sensitivity values, there are some values being different, e.g the second value (which is not the case for c, J or lamda).
Is there anything wrong with the interpolation?

The ouput arrays are:
for -n 1

array([0.62576823, 0.51721359, 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.62576823, 0.51721359, 0.4534996 , 0.4534996 ,
       0.4534996 , 0.4534996 , 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.62576823, 0.51721359, 0.62576823, 0.51721359,
       0.40495807, 0.40495807, 0.40495807, 0.40495807, 0.4534996 ,
       0.4534996 , 0.4534996 , 0.4534996 , 0.62576823, 0.51721359,
       0.62576823, 0.51721359, 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.62576823, 0.51721359, 0.40495807, 0.40495807,
       0.40495807, 0.40495807, 0.4534996 , 0.4534996 , 0.4534996 ,
       0.4534996 , 0.4534996 , 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.40495807, 0.40495807, 0.40495807, 0.40495807,
       0.40495807, 0.4534996 , 0.4534996 , 0.62576823, 0.51721359,
       0.40495807, 0.40495807, 0.4534996 , 0.40495807])

for -n 4

array([0.62576823, 0.25951964, 0.62576823, 0.51721359, 0.62576823,
       0.25951964, 0.62576823, 0.51721359, 0.34898583, 0.34898583,
       0.34898583, 0.4534996 , 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.62576823, 0.51721359, 0.62576823, 0.25951964,
       0.40495807, 0.40495807, 0.40495807, 0.40495807, 0.4534996 ,
       0.4534996 , 0.34898583, 0.4534996 , 0.62576823, 0.51721359,
       0.62576823, 0.51721359, 0.62576823, 0.51721359, 0.62576823,
       0.51721359, 0.62576823, 0.25951964, 0.40495807, 0.40495807,
       0.40495807, 0.40495807, 0.4534996 , 0.4534996 , 0.4534996 ,
       0.34898583, 0.4534996 , 0.62576823, 0.25951964, 0.62576823,
       0.25951964, 0.40495807, 0.40495807, 0.40495807, 0.40495807,
       0.40495807, 0.34898583, 0.4534996 , 0.62576823, 0.51721359,
       0.40495807, 0.40495807, 0.34898583, 0.40495807])

Many thanks!