Gather solutions in parallel in FEniCSX

Yingqi_Jia · May 29, 2021, 10:22pm

Basically, I want to do the same thing mentioned in Gather solution in parallel - #2 by dajuno in FEniCSX. That is, I want to gather the results from every processor on processor 0. Specifically, I want to obtain the same result as the one without using parallel computation.

Take the Poisson problem Implementation — FEniCSx tutorial (jorgensd.github.io) as an example, we can print the result with

print(u_vertex_values.T)

as

[[3.   2.62 2.63 3.01 2.28 2.29 2.32 2.66 3.04 1.98 1.99 2.02 2.07 2.37
  2.71 3.09 1.72 1.73 1.76 1.81 1.88 2.14 2.44 2.78 3.16 1.5  1.51 1.54
  1.59 1.66 1.75 1.97 2.23 2.53 2.87 3.25 1.32 1.33 1.36 1.41 1.48 1.57
  1.68 1.86 2.08 2.34 2.64 2.98 3.36 1.18 1.19 1.22 1.27 1.34 1.43 1.54
  1.67 1.81 1.99 2.21 2.47 2.77 3.11 3.49 1.08 1.09 1.12 1.17 1.24 1.33
  1.44 1.57 1.72 1.82 1.96 2.14 2.36 2.62 2.92 3.26 3.64 1.02 1.03 1.06
  1.11 1.18 1.27 1.38 1.51 1.66 1.83 1.89 1.99 2.13 2.31 2.53 2.79 3.09
  3.43 3.81 1.   1.01 1.04 1.09 1.16 1.25 1.36 1.49 1.64 1.81 2.   2.02
  2.08 2.18 2.32 2.5  2.72 2.98 3.28 3.62 4.  ]]

However, when we compute it in 2 processors, we will get two separate results as

[[2.   1.81 1.83 2.02 1.64 1.66 1.72 1.89 2.08 1.49 1.51 1.57 1.67 1.82
  1.99 2.18 1.36 1.38 1.44 1.54 1.68 1.81 1.96 2.13 2.32 1.25 1.27 1.33
  1.43 1.57 1.16 1.18 1.24 1.34 1.48 1.09 1.11 1.17 1.27 1.41 1.04 1.06
  1.12 1.22 1.36 1.54 1.01 1.03 1.09 1.19 1.33 1.   1.02 1.08 1.18 1.32
  2.5  2.31 2.14 1.99 1.86 1.75 1.66 1.59 1.51 1.5  1.72 1.73 1.76 1.81
  1.88 1.97 2.08 2.21 2.36 2.53 2.72]]

[[2.5  2.53 2.72 2.31 2.36 2.62 2.79 2.98 2.14 2.21 2.47 2.77 2.92 3.09
  3.28 1.99 2.08 2.34 2.64 2.98 3.11 3.26 3.43 3.62 1.86 1.97 2.23 2.53
  2.87 3.25 3.36 3.49 3.64 3.81 4.   1.75 1.88 2.14 2.44 2.78 3.16 1.66
  1.81 2.07 2.37 2.71 3.09 1.59 1.76 2.02 2.32 2.66 3.04 1.73 1.99 2.29
  2.63 3.01 1.51 1.72 1.98 2.28 2.62 3.   1.5  1.54 1.32 1.33 1.36 1.41
  1.48 1.57 1.68 1.81 1.96 2.13 2.32]]

Even if we use the function MPI.COMM_WORLD.gather(), we just obtain a simple combination of these two results. So is it possible to resume the result without using parallel computation?

Thanks!

dokken · May 30, 2021, 10:37am

The two separate results you get above is the local data (You can find their ordering by communicating the local range, as done in the code below).

Im not sure why you would want to order the data as in serial, as the ordering of degrees of freedom are different (due to mesh partitioning). However, you can obtain your desired array on a single process with:

import dolfinx
import numpy
import ufl
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank


# Parallel mesh
mesh = dolfinx.UnitSquareMesh(
    MPI.COMM_WORLD, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
V = dolfinx.FunctionSpace(mesh, ("CG", 1))


def func(x):
    return x[0] + 3 * x[1]


if rank == 0:
    # Mesh on one proc
    mesh0 = dolfinx.UnitSquareMesh(
        MPI.COMM_SELF, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
    V0 = dolfinx.FunctionSpace(mesh0, ("CG", 1))
    x0 = V0.tabulate_dof_coordinates()
    u0 = dolfinx.Function(V0)
    u0.interpolate(func)
    # Reference solution
    print(f"Serial interpolation {u0.vector.array}")

# Parallel interpolation
u = dolfinx.Function(V)
u.interpolate(func)
dolfinx.cpp.la.scatter_forward(u.x)

# Get local ranges and global size of array
imap = u.function_space.dofmap.index_map
local_range = numpy.asarray(imap.local_range, dtype=numpy.int32) * \
    u.function_space.dofmap.index_map_bs
size_global = imap.size_global * u.function_space.dofmap.index_map_bs

# Communicate ranges and local data
ranges = comm.gather(local_range, root=0)
data = comm.gather(u.vector.array, root=0)

# Communicate local dof coordinates
x = V.tabulate_dof_coordinates()[:imap.size_local]
x_glob = comm.gather(x, root=0)

if comm.rank == 0:
    # Create array with all values (received)
    global_array = numpy.zeros(size_global)
    for r, d in zip(ranges, data):
        global_array[r[0]:r[1]] = d

    # Create array with all coordinates
    global_x = numpy.zeros((size_global, 3))
    for r, x_ in zip(ranges, x_glob):
        global_x[r[0]:r[1], :] = x_

    serial_to_global = []
    for coord in x0:
        serial_to_global.append(numpy.abs(global_x-coord).sum(axis=1).argmin())

    # Create sorted array from
    u_from_global = numpy.zeros(global_array.shape)
    for serial, glob in enumerate(serial_to_global):
        u_from_global[serial] = global_array[glob]
    print(f"Gathered from parallel: {u_from_global}")
    assert(numpy.allclose(u_from_global, u0.vector.array))

but note that the serial mesh/function space has to be defined to get that mapping.

Yingqi_Jia · May 30, 2021, 12:33pm

dokken:

import dolfinx
import numpy
import ufl
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.rank


# Parallel mesh
mesh = dolfinx.UnitSquareMesh(
    MPI.COMM_WORLD, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
V = dolfinx.FunctionSpace(mesh, ("CG", 1))


def func(x):
    return x[0] + 3 * x[1]


if rank == 0:
    # Mesh on one proc
    mesh0 = dolfinx.UnitSquareMesh(
        MPI.COMM_SELF, 3, 2, dolfinx.cpp.mesh.CellType.quadrilateral)
    V0 = dolfinx.FunctionSpace(mesh0, ("CG", 1))
    x0 = V0.tabulate_dof_coordinates()
    u0 = dolfinx.Function(V0)
    u0.interpolate(func)
    # Reference solution
    print(f"Serial interpolation {u0.vector.array}")

# Parallel interpolation
u = dolfinx.Function(V)
u.interpolate(func)
dolfinx.cpp.la.scatter_forward(u.x)

# Get local ranges and global size of array
imap = u.function_space.dofmap.index_map
local_range = numpy.asarray(imap.local_range, dtype=numpy.int32) * \
    u.function_space.dofmap.index_map_bs
size_global = imap.size_global * u.function_space.dofmap.index_map_bs

# Communicate ranges and local data
ranges = comm.gather(local_range, root=0)
data = comm.gather(u.vector.array, root=0)

# Communicate local dof coordinates
x = V.tabulate_dof_coordinates()[:imap.size_local]
x_glob = comm.gather(x, root=0)

if comm.rank == 0:
    # Create array with all values (received)
    global_array = numpy.zeros(size_global)
    for r, d in zip(ranges, data):
        global_array[r[0]:r[1]] = d

    # Create array with all coordinates
    global_x = numpy.zeros((size_global, 3))
    for r, x_ in zip(ranges, x_glob):
        global_x[r[0]:r[1], :] = x_

    serial_to_global = []
    for coord in x0:
        serial_to_global.append(numpy.abs(global_x-coord).sum(axis=1).argmin())

    # Create sorted array from
    u_from_global = numpy.zeros(global_array.shape)
    for serial, glob in enumerate(serial_to_global):
        u_from_global[serial] = global_array[glob]
    print(f"Gathered from parallel: {u_from_global}")
    assert(numpy.allclose(u_from_global, u0.vector.array))

Thank you, Dokken! Basically I am doing the research about topology optimization, so I need the data in serial to update the design variables with some optimization algorithms.

This code works very well for the scalar field problem, but how should I modify it such that it can be applicable to the vector field problem (such as the displacement vector in the elasticity problem)?

dokken · May 30, 2021, 1:16pm

Then you need to use the fact that the tabulate_dof_coordinates is blocked. That means that for a 3D vector space the first coordinate corresponds to dof 0,1,2. you could Also simply use sub spaces and do this per component.

sunsystem · November 1, 2022, 7:21pm

Hi dokken,

I’ve got a similar issue

Here is a sample code:

dV = FunctionSpace(mesh, "DG", 0)
A = Function(dV)

I am running a time-dependent problem, so I need the value of A.vector()[2] for the next time step. I can directly print out the [2] value in series. But as you can see if I run the code in 2 processors, the order of dof is different. Is there a simple way I can find the value of the same index number ([2] in series) if I run the code in parallel?

Thanks!

dokken · November 1, 2022, 8:22pm

I don’t really want to recommend trying to get the serial ordering from the parallel data.
However, here is a minimal working example (for spaces with block size 1, i.e. not vectorfunctionspaces)

import dolfinx
from mpi4py import MPI
import numpy as np


def f(x):
    return x[0] + 3 * x[1]


N = 4
P = 3
mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, N, N)

pci = mesh.topology.original_cell_index
V = dolfinx.fem.FunctionSpace(mesh, ("Lagrange", P))
u = dolfinx.fem.Function(V)
u.interpolate(f)
dofmap = V.dofmap.list.array

root = 0
ci = mesh.comm.gather(pci, root=root)
dofmaps = mesh.comm.gather(dofmap, root=root)
x = mesh.comm.gather(u.x.array, root=root)
if MPI.COMM_WORLD.rank == root:
    serial_mesh = dolfinx.mesh.create_unit_square(MPI.COMM_SELF, N, N)
    inverse_map = np.argsort(serial_mesh.topology.original_cell_index)
    Vs = dolfinx.fem.FunctionSpace(serial_mesh, ("Lagrange", P))
    us = dolfinx.fem.Function(Vs)
    us.interpolate(f)
    us_array = us.x.array

    dofmap_s = Vs.dofmap.list
    num_dofs = len(Vs.dofmap.list.links(0))
    for rank, (dofm, cell_indices, x_p) in enumerate(zip(dofmaps, ci, x)):
        print(f"DATA from rank {rank}:")
        for i, cell in enumerate(cell_indices):
            serial_cell = inverse_map[cell]
            serial_dofs = dofmap_s.links(serial_cell)
            parallel_dofs = dofm[num_dofs*i:num_dofs*(i+1)]
            for j in range(num_dofs):
                print(x_p[parallel_dofs[j]], us_array[serial_dofs[j]])
                assert np.isclose(x_p[parallel_dofs[j]],
                                  us_array[serial_dofs[j]])

sunsystem · November 1, 2022, 10:09pm

Thank you very much!

GiacomoSperoni97 · March 10, 2025, 1:13pm

Dear community, I’ve tried to replicate this MWE but unfortunately it doesn’t work. I’m quite new in FenicsX, so I don’t know if some functions/properties are deprecated.

I’ve change FunctionSpace into functionspace, but I’ve encountered another error as

    dofmap = V.dofmap.list.array
             ^^^^^^^^^^^^^^^^^^^
AttributeError: 'numpy.ndarray' object has no attribute 'array'

Here I paste the whole code for simplicity:

import dolfinx
from mpi4py import MPI
import numpy as np


def f(x):
    return x[0] + 3 * x[1]


N = 4
P = 3
mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, N, N)

pci = mesh.topology.original_cell_index
V = dolfinx.fem.functionspace(mesh, ("Lagrange", P))
u = dolfinx.fem.Function(V)
u.interpolate(f)
dofmap = V.dofmap.list.array

root = 0
ci = mesh.comm.gather(pci, root=root)
dofmaps = mesh.comm.gather(dofmap, root=root)
x = mesh.comm.gather(u.x.array, root=root)
if MPI.COMM_WORLD.rank == root:
    serial_mesh = dolfinx.mesh.create_unit_square(MPI.COMM_SELF, N, N)
    inverse_map = np.argsort(serial_mesh.topology.original_cell_index)
    Vs = dolfinx.fem.functionspace(serial_mesh, ("Lagrange", P))
    us = dolfinx.fem.Function(Vs)
    us.interpolate(f)
    us_array = us.x.array

    dofmap_s = Vs.dofmap.list
    num_dofs = len(Vs.dofmap.list.links(0))
    for rank, (dofm, cell_indices, x_p) in enumerate(zip(dofmaps, ci, x)):
        print(f"DATA from rank {rank}:")
        for i, cell in enumerate(cell_indices):
            serial_cell = inverse_map[cell]
            serial_dofs = dofmap_s.links(serial_cell)
            parallel_dofs = dofm[num_dofs*i:num_dofs*(i+1)]
            for j in range(num_dofs):
                print(x_p[parallel_dofs[j]], us_array[serial_dofs[j]])
                assert np.isclose(x_p[parallel_dofs[j]],
                                  us_array[serial_dofs[j]])

Thanks a lot

dokken · March 10, 2025, 1:47pm

It is important to write down what version of DOLFINx you are using.
For DOLFINx v0.9.0, the code works with:

import dolfinx
from mpi4py import MPI
import numpy as np


def f(x):
    return x[0] + 3 * x[1]


N = 4
P = 3
mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, N, N)

pci = mesh.topology.original_cell_index
V = dolfinx.fem.functionspace(mesh, ("Lagrange", P))
u = dolfinx.fem.Function(V)
u.interpolate(f)
dofmap = V.dofmap.list

root = 0
ci = mesh.comm.gather(pci, root=root)
dofmaps = mesh.comm.gather(dofmap, root=root)
x = mesh.comm.gather(u.x.array, root=root)
if MPI.COMM_WORLD.rank == root:
    serial_mesh = dolfinx.mesh.create_unit_square(MPI.COMM_SELF, N, N)
    inverse_map = np.argsort(serial_mesh.topology.original_cell_index)
    Vs = dolfinx.fem.functionspace(serial_mesh, ("Lagrange", P))
    us = dolfinx.fem.Function(Vs)
    us.interpolate(f)
    us_array = us.x.array

    dofmap_s = Vs.dofmap.list
    num_dofs = dofmap_s.shape[1]
    for rank, (dofm, cell_indices, x_p) in enumerate(zip(dofmaps, ci, x)):
        print(f"DATA from rank {rank}:")
        for i, cell in enumerate(cell_indices):
            serial_cell = inverse_map[cell]
            serial_dofs = dofmap_s[serial_cell]
            parallel_dofs = dofm[i]
            for j in range(num_dofs):
                print(x_p[parallel_dofs[j]], us_array[serial_dofs[j]])
                assert np.isclose(x_p[parallel_dofs[j]], us_array[serial_dofs[j]])

GiacomoSperoni97 · March 11, 2025, 8:01am

Sorry, my version is indeed DOLFINx v0.9.0, so it is perfect! Thank you so much

Topic		Replies	Views
In parall computation(mpirun), how to communicate(scatter & gather) the data between local processes and some global process? dolfinx	5	994	October 26, 2022
Gather solution in parallel	4	2698	January 16, 2020
Gathering and Scattering during parallel runs General	5	510	February 7, 2023
Gather Function in parallel error	4	1341	July 9, 2019
Gathering function on root process (MPI) issue with mixed element General	2	166	February 20, 2024

Gather solutions in parallel in FEniCSX

Related topics