# Parallelization with multiple solves

Dear all,

I am trying to solve an optimization problem of the type:

\min\limits_{a} J(a,u_1, u_2, \ldots, u_N) = \sum_{i=1}^{N} J_i(a, u_i)

where a is a field and each J_i comes from solving an independent PDE for u_i.

The ideal approach would be to solve these PDEs in parallel, but I am having trouble making dolfin-adjoint generate the tape correctly.

Here is my attempt at producing a minimal working example:

""" minimal working example: figuring out parallelism in dolfin-adjoint """

from fenics import *

comm = MPI.comm_world

mesh = UnitSquareMesh(MPI.comm_self, 64, 64)
V = FunctionSpace(mesh, 'CG', 1)
v = TestFunction(V)
bc = DirichletBC(V, 0.0, "on_boundary")
d = Expression("2*pow(pi, i)*sin(pi*x[i])",i=0, degree=3)
f = interpolate(Expression('x[0]+x[1]', degree=1), V)

def forward(g, i):
u = Function(V)
solve(F == 0, u, bc)
d.i=i
J = assemble((0.5*inner(u - d, u - d))*dx + 1e-6*f**2*dx)
return J

if comm.size == 1:
J = forward(f, 0) + forward(f, 1)

else:
if MPI.rank(comm)==0:
Jp = forward(f, 0)
elif MPI.rank(comm)==1:
Jp = forward(f, 1)

data = comm.gather(Jp, root=0)
if MPI.rank(comm)==0:
J = sum(data)

tape = get_working_tape()
tape.visualise()

Looking at the tape generated, using 1 processor I get:

Using two processors, however, the main problem seems to be that only what comes from one of the processors is “visible” to the tape:

So how should I proceed in order to run the optimization with many calls to “solve()” in parallel?

Thank you for the consideration.