Distributing from processor 0 to multiple processors with Dolfinx

Hello,

I have seen the reply on how to gather solutions from every processor to processor 0 at Gather solutions in parallel in FEniCSX

In the code below I would like to do the opposite: distribute from processor 0 to multiple processors.

I have used the tutorial for linear elasticity on the documentation of fenicsx. The part I added is a “density_0” field defined on processor 0, by which I multiply the bilinear form in line 86. This “density_0” field comes from processor 0. I want to distribute this to multiple processors using “density”. I know that lines 74,75 are not the way to do it I just kept them for the code to run and give an idea of what I m trying to achieve.

Thanks!

import numpy as np
import ufl
from dolfinx import *
import dolfinx
from dolfinx.fem import Function, FunctionSpace, VectorFunctionSpace, dirichletbc, form, locate_dofs_topological, Expression
from dolfinx.fem.petsc import assemble_matrix, assemble_vector, set_bc, _assemble_vector_form
from dolfinx.mesh import CellType, GhostMode, create_box, locate_entities_boundary
from dolfinx.io import XDMFFile, gmshio
from ufl import dx, grad, inner, derivative
from mpi4py import MPI
from petsc4py import PETSc

from petsc4py.PETSc import ScalarType

############################################################################
# INITIALIZATION
############################################################################
# initialization for parallel computing 
comm = MPI.COMM_WORLD
num_used_processors = comm.Get_size()
current_processor = comm.Get_rank()
size = comm.Get_size()
############################################################################
#FINITE ELEMENT SET UP W/ FENICSX
############################################################################
# Dimensions
nelx = 20
nely = 10
nelz = 10
# Properties
mu = 1
rho = 1
delta = W/L
gamma = 0.4*delta**2
beta = 1.25
lambda_ = beta
g = gamma
# creation of the mesh
domain = dolfinx.mesh.create_box(
                            MPI.COMM_WORLD, 
                            points=((0, 0, 0), (nelx, nely, nelz)), 
                            n=(nelx, nely, nelz),
                            cell_type=mesh.CellType.hexahedron)
V = fem.VectorFunctionSpace(domain, ("CG", 1))
D = fem.FunctionSpace(domain, ("DG", 0))
density = fem.Function(D)
#Boundary conditions
def clamped_boundary(x):
    return np.isclose(x[0], 0)

fdim = domain.topology.dim - 1
boundary_facets = mesh.locate_entities_boundary(domain, fdim, clamped_boundary)

u_D = np.array([0,0,0], dtype=ScalarType)
bc = fem.dirichletbc(u_D, fem.locate_dofs_topological(V, fdim, boundary_facets), V)

T = fem.Constant(domain, ScalarType((0, 0, 0)))
ds = ufl.Measure("ds", domain=domain)

# serial mesh/function space
if current_processor == 0:
    domain0 = dolfinx.mesh.create_box(
                            MPI.COMM_WORLD, 
                            points=((0, 0, 0), (nelx, nely, nelz)), 
                            n=(nelx, nely, nelz),
                            cell_type=mesh.CellType.hexahedron)
    D0 = fem.FunctionSpace(domain0, ("DG", 0))
    density_0 = fem.Function(D0)
    #Generate a random density field
    for i in range (len(density_0.vector.array)):
        density_0[i]=random.randrange(0,1)

#Assign the density field from a single processor to multiple processors (THIS PART IS WRONG)
for i in range (len(density.vector.array)):
    density.vector.array = density_0.vector.array[i]


def epsilon(u):
    return ufl.sym(ufl.grad(u)) # Equivalent to 0.5*(ufl.nabla_grad(u) + ufl.nabla_grad(u).T)
def sigma(u):
    return lambda_ * ufl.nabla_div(u) * ufl.Identity(len(u)) + 2*mu*epsilon(u)

u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
f = fem.Constant(domain, ScalarType((0, 0, -rho*g)))
a = ufl.inner(density*sigma(u), epsilon(v)) * ufl.dx
L = ufl.dot(f, v) * ufl.dx + ufl.dot(T, v) * ds
problem = fem.petsc.LinearProblem(a, L, bcs=[bc], petsc_options={"ksp_type": "preonly", "pc_type": "lu"})
uh = problem.solve()

Why do you want to do this? It is a highly inefficient way of assigning data. Could you provide the context where this is actually needed?

The post

gives you the «reverse map» of what you want, so you can use this to create the inverse communicator and do scattering.

Hi Dokken,

Thanks a lot for the fast reply! We are doing topology optimization using the level set method. The densities are received on a single processor as a numpy vector from an external C++ level set optimization code that is wrapped in python. I dont have any experience with parallelization and was wondering whether there is a standard way to do this scattering of the numpy vector to multiple processors in dolfinx?

As you are not familiar with MPI, I would suggest you use my checkpointing implementation:

That can read and write data with different MPI communicators, for instance MPI.COMM_SELF and MPI.COMM_WORLD, see the tests.

Thanks a lot! I will look into it