Gather solutions in parallel in FEniCSX

Basically, I want to do the same thing mentioned in Gather solution in parallel - #2 by dajuno in FEniCSX. That is, I want to gather the results from every processor on processor 0. Specifically, I want to obtain the same result as the one without using parallel computation.

Take the Poisson problem Implementation — FEniCSx tutorial (jorgensd.github.io) as an example, we can print the result with

print(u_vertex_values.T)

as

[[3.   2.62 2.63 3.01 2.28 2.29 2.32 2.66 3.04 1.98 1.99 2.02 2.07 2.37
  2.71 3.09 1.72 1.73 1.76 1.81 1.88 2.14 2.44 2.78 3.16 1.5  1.51 1.54
  1.59 1.66 1.75 1.97 2.23 2.53 2.87 3.25 1.32 1.33 1.36 1.41 1.48 1.57
  1.68 1.86 2.08 2.34 2.64 2.98 3.36 1.18 1.19 1.22 1.27 1.34 1.43 1.54
  1.67 1.81 1.99 2.21 2.47 2.77 3.11 3.49 1.08 1.09 1.12 1.17 1.24 1.33
  1.44 1.57 1.72 1.82 1.96 2.14 2.36 2.62 2.92 3.26 3.64 1.02 1.03 1.06
  1.11 1.18 1.27 1.38 1.51 1.66 1.83 1.89 1.99 2.13 2.31 2.53 2.79 3.09
  3.43 3.81 1.   1.01 1.04 1.09 1.16 1.25 1.36 1.49 1.64 1.81 2.   2.02
  2.08 2.18 2.32 2.5  2.72 2.98 3.28 3.62 4.  ]]

However, when we compute it in 2 processors, we will get two separate results as

[[2.   1.81 1.83 2.02 1.64 1.66 1.72 1.89 2.08 1.49 1.51 1.57 1.67 1.82
  1.99 2.18 1.36 1.38 1.44 1.54 1.68 1.81 1.96 2.13 2.32 1.25 1.27 1.33
  1.43 1.57 1.16 1.18 1.24 1.34 1.48 1.09 1.11 1.17 1.27 1.41 1.04 1.06
  1.12 1.22 1.36 1.54 1.01 1.03 1.09 1.19 1.33 1.   1.02 1.08 1.18 1.32
  2.5  2.31 2.14 1.99 1.86 1.75 1.66 1.59 1.51 1.5  1.72 1.73 1.76 1.81
  1.88 1.97 2.08 2.21 2.36 2.53 2.72]]

[[2.5  2.53 2.72 2.31 2.36 2.62 2.79 2.98 2.14 2.21 2.47 2.77 2.92 3.09
  3.28 1.99 2.08 2.34 2.64 2.98 3.11 3.26 3.43 3.62 1.86 1.97 2.23 2.53
  2.87 3.25 3.36 3.49 3.64 3.81 4.   1.75 1.88 2.14 2.44 2.78 3.16 1.66
  1.81 2.07 2.37 2.71 3.09 1.59 1.76 2.02 2.32 2.66 3.04 1.73 1.99 2.29
  2.63 3.01 1.51 1.72 1.98 2.28 2.62 3.   1.5  1.54 1.32 1.33 1.36 1.41
  1.48 1.57 1.68 1.81 1.96 2.13 2.32]]

Even if we use the function MPI.COMM_WORLD.gather(), we just obtain a simple combination of these two results. So is it possible to resume the result without using parallel computation?

Thanks!

The two separate results you get above is the local data (You can find their ordering by communicating the local range, as done in the code below).

Im not sure why you would want to order the data as in serial, as the ordering of degrees of freedom are different (due to mesh partitioning). However, you can obtain your desired array on a single process with:

import dolfinx
import numpy
import ufl
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank


# Parallel mesh
mesh = dolfinx.UnitSquareMesh(
    MPI.COMM_WORLD, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
V = dolfinx.FunctionSpace(mesh, ("CG", 1))


def func(x):
    return x[0] + 3 * x[1]


if rank == 0:
    # Mesh on one proc
    mesh0 = dolfinx.UnitSquareMesh(
        MPI.COMM_SELF, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
    V0 = dolfinx.FunctionSpace(mesh0, ("CG", 1))
    x0 = V0.tabulate_dof_coordinates()
    u0 = dolfinx.Function(V0)
    u0.interpolate(func)
    # Reference solution
    print(f"Serial interpolation {u0.vector.array}")

# Parallel interpolation
u = dolfinx.Function(V)
u.interpolate(func)
dolfinx.cpp.la.scatter_forward(u.x)

# Get local ranges and global size of array
imap = u.function_space.dofmap.index_map
local_range = numpy.asarray(imap.local_range, dtype=numpy.int32) * \
    u.function_space.dofmap.index_map_bs
size_global = imap.size_global * u.function_space.dofmap.index_map_bs

# Communicate ranges and local data
ranges = comm.gather(local_range, root=0)
data = comm.gather(u.vector.array, root=0)

# Communicate local dof coordinates
x = V.tabulate_dof_coordinates()[:imap.size_local]
x_glob = comm.gather(x, root=0)

if comm.rank == 0:
    # Create array with all values (received)
    global_array = numpy.zeros(size_global)
    for r, d in zip(ranges, data):
        global_array[r[0]:r[1]] = d

    # Create array with all coordinates
    global_x = numpy.zeros((size_global, 3))
    for r, x_ in zip(ranges, x_glob):
        global_x[r[0]:r[1], :] = x_

    serial_to_global = []
    for coord in x0:
        serial_to_global.append(numpy.abs(global_x-coord).sum(axis=1).argmin())

    # Create sorted array from
    u_from_global = numpy.zeros(global_array.shape)
    for serial, glob in enumerate(serial_to_global):
        u_from_global[serial] = global_array[glob]
    print(f"Gathered from parallel: {u_from_global}")
    assert(numpy.allclose(u_from_global, u0.vector.array))

but note that the serial mesh/function space has to be defined to get that mapping.

2 Likes

Thank you, Dokken! Basically I am doing the research about topology optimization, so I need the data in serial to update the design variables with some optimization algorithms.

This code works very well for the scalar field problem, but how should I modify it such that it can be applicable to the vector field problem (such as the displacement vector in the elasticity problem)?

Then you need to use the fact that the tabulate_dof_coordinates is blocked. That means that for a 3D vector space the first coordinate corresponds to dof 0,1,2. you could Also simply use sub spaces and do this per component.

1 Like

Hi dokken,

I’ve got a similar issue

Here is a sample code:

dV = FunctionSpace(mesh, "DG", 0)
A = Function(dV)

I am running a time-dependent problem, so I need the value of A.vector()[2] for the next time step. I can directly print out the [2] value in series. But as you can see if I run the code in 2 processors, the order of dof is different. Is there a simple way I can find the value of the same index number ([2] in series) if I run the code in parallel?

Thanks!

I don’t really want to recommend trying to get the serial ordering from the parallel data.
However, here is a minimal working example (for spaces with block size 1, i.e. not vectorfunctionspaces)

import dolfinx
from mpi4py import MPI
import numpy as np


def f(x):
    return x[0] + 3 * x[1]


N = 4
P = 3
mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, N, N)

pci = mesh.topology.original_cell_index
V = dolfinx.fem.FunctionSpace(mesh, ("Lagrange", P))
u = dolfinx.fem.Function(V)
u.interpolate(f)
dofmap = V.dofmap.list.array

root = 0
ci = mesh.comm.gather(pci, root=root)
dofmaps = mesh.comm.gather(dofmap, root=root)
x = mesh.comm.gather(u.x.array, root=root)
if MPI.COMM_WORLD.rank == root:
    serial_mesh = dolfinx.mesh.create_unit_square(MPI.COMM_SELF, N, N)
    inverse_map = np.argsort(serial_mesh.topology.original_cell_index)
    Vs = dolfinx.fem.FunctionSpace(serial_mesh, ("Lagrange", P))
    us = dolfinx.fem.Function(Vs)
    us.interpolate(f)
    us_array = us.x.array

    dofmap_s = Vs.dofmap.list
    num_dofs = len(Vs.dofmap.list.links(0))
    for rank, (dofm, cell_indices, x_p) in enumerate(zip(dofmaps, ci, x)):
        print(f"DATA from rank {rank}:")
        for i, cell in enumerate(cell_indices):
            serial_cell = inverse_map[cell]
            serial_dofs = dofmap_s.links(serial_cell)
            parallel_dofs = dofm[num_dofs*i:num_dofs*(i+1)]
            for j in range(num_dofs):
                print(x_p[parallel_dofs[j]], us_array[serial_dofs[j]])
                assert np.isclose(x_p[parallel_dofs[j]],
                                  us_array[serial_dofs[j]])

Thank you very much!