How to manually assemble a loading vector element-wise in parallel?

Hello,

I am trying to create a custom element-wise assembly routine for the loading vector in parallel, and I am having trouble manually assembling the vector across each MPI rank. The idea is to do an element-wise assembly on the local elements and finally add together the dof points that are shared between each rank.

I am wondering if there’s a FEniCS built-in method to assemble the vector or do I have to use mpi4py to manually perform the MPI communications? Lastly, is there a way to do this for the stiffness matrix as well?

Attached is an MWE of the problem.

import numpy as np
import fenics as fe

# Init mesh and function space
mesh = fe.IntervalMesh(4,0,1)
rank = fe.MPI.rank(fe.MPI.comm_world)
V = fe.FunctionSpace(mesh, "Lagrange", 1)
dm = V.dofmap()
w = fe.TestFunction(V)

# Running with 2 processors:
# rank 0 owns global index [0,1]
# rank 1 owns global index [2,3,4]
# where dof index 2 is shared between rank 0 and rank 1 
# Init vector f to include the unowned dof in this mpi rank:
f = np.zeros( len(dm.dofs()) + len(dm.local_to_global_unowned()) )

# Manually assemble vector f in an element-wise manner on local mpi rank:
for cell in fe.cells(mesh):
    cell_idx = V.dofmap().cell_dofs(cell.index())
    f[cell_idx] += [1,1] # test assembly
# rank 0: f = [1,2,1]
# rank 1: f = [1,2,1]

# How to do this part?
# I want to assemble across all ranks by adding the values on the shared dof nodes 
# and send them back to their the appropriate ranks. 
f_assembled = assemble_across_ranks(f) 

# Add f_assembled back to loading vector b
b = fe.assemble(fe.Constant(0)*w*fe.dx)
b.add_local(f_assembled)

# Running with 2 processors, I am expecting:
# rank 0: b.get_local() = [1,2] 
# rank 1: b.get_local() = [2,2,1]

Thank you very much,
Chayut