Is it possible to assemble in parallel without partitioning?

justinlaughlin · March 2, 2022, 12:44am

Hello, I am curious if it is possible to parallelize assembly/solving (using petsc backend) in fenics without partitioning the mesh? The tl;dr is I am trying to parallelize a mixed-domain system but there is a bug in the assembly code when the mesh is partitioned - I have tried to tackle this but I’m realizing it is a bit beyond my depth.

The workaround I was thinking of is that since the problem is already compartmentalized by having mixed-domains (ie it has a block Jacobian and residual vector), and each block is constructed one at a time, it seems feasible to send the assembly of different blocks to different processors.

This is kind of what I am thinking of (slightly pseudo-code) but I’m not sure how to actually implement it:

import dolfin as d
import petsc4py as p

# I believe I need to specify the MPI communicator here to ensure it doesn't partition? 
mesh = d.Mesh('my_mesh.h5') 
V    = FunctionSpace(mesh, ...)
u    = Function(V)                      # If mesh is partitioned than this will also be partitioned 
v    = TestFunction(V)
F    = u*v*dx + ...                     # my monolithic residual vector
J    = d.derivative(F,u)                # Jacobian

Fblocks = get_blocks(F)                 # Monolithic residual vector separated by domain (j)
Jblocks = get_blocks(d.derivative(F,u)) # Jacobian matrix separated by F domain (i), u domain (j), domain of integration (k) (Jij is the sum of Jijk for all k)

# =============== Serial version of mixed assembly + SNES code ===============
pF = p.Vec() # petsc4py Vector for F
pu = p.Vec() # petsc4py Vector for u
pJ = p.NestMat() # petsc4py Nest Matrix for J 

class snes_problem(pF, pu, pJ):
    def assemble_F:
        for Fj in Fblocks:
            d.assemble_mixed(Fj, tensor=pF)
    def assemble_J:
        for Jijk in Jblocks:
            d.assemble_mixed(Jijk, tensor=pJ)

p.SNESSolver.solve(snes_problem())

# =============== Parallel version of mixed assembly + SNES code??? ===============
class snes_problem(pF, pu, pJ):
    def assemble_F:
        d.assemble_mixed(Fblocks[rank], tensor=pF)
    def assemble_J:
        for j in range(len(Jblocks[rank])):
            d.assemble_mixed(Jblocks[rank][j], tensor=pJ)

p.SNESSolver.solve(snes_problem())

Does something like this make sense or is would the cost of communicating outweigh any benefits? I have a lot to learn regarding MPI but I’ve read that you can use one-sided communication to create an effective shared memory - could each of the other CPUs use this to access the assembly instructions? Or is it possible to generate all the FFC files on root and then tell each processor to use a specific one? It seems like parallelizing over block matrices could easily lead to imbalanced loads and inefficient memory sharing, but I’m guessing something like this will still be better than serial on large problems.

I am curious if it is common to use mesh-partitioning-free parallel assemblies such as this or are they just too inefficient? Once the assembly is complete it seems like SNES should be able to solve in parallel. If anyone has experience in this or could point me to an example I’d really appreciate it, thank you!

BTW: I wanted to keep this example to a minimum so I replaced some calls with psuedo-code but I’m happy to elaborate. I have working code for solving mixed-dimensional non-linear problems using SNES but there are a lot of details that might be unnecessary.

Topic		Replies	Views
Run assemble on each MPI thread individually and broadcast vectors	1	760	March 17, 2020
Parallel code for extraction of submatrices General	0	161	November 16, 2023
Mixed assembly fails in parallel with a particular mesh mesh	1	535	January 31, 2022
Mesh partitioning when using parallel	1	572	September 10, 2021
Malloc while assembling Jacobian in parallel General	4	402	August 30, 2023

Is it possible to assemble in parallel without partitioning?

Related topics