Hello,

I’d like to run my optimisation problem based on time-distributed control demo in parallel. I modified the demo to have multiple PDEs (of the same type but with different coefficients) which are solved in the for-loop.

```
for i in range(PDEs_numb):
solve(a[i] == L[i], u[i])
```

I’d like to run this for-loop in parallel. As I understood, one of the options is MPI. I tested my code without modifications and got the following resultts for 30 PDEs and single optimization-iteration allowed each time restarting docker container:

Without MPI: 86 seconds

MPI 2 Cores: 95

MPI 4 Cores: 105

MPI 10 Cores: 150

So it didn’t get faster.

The runtime without MPI would be acceptable If I did’t need to consider much more PDEs (300 or even 1000)

I have few questions:

- From other post I can see, that it’s always being solved in parallel without MPI. Is it really so?
- Can MPI handle this for-loop in parallel? And if it can, how can I make it work?
- What else would you advise to speed up the solution of this optimization in terms of MPI?

Thanks in advance!

My MWE:

```
from fenics import *
from fenics_adjoint import *
from collections import OrderedDict
import numpy as np
import time
set_log_active(False)
data = Expression("16*x[0]*(x[0]-1)*x[1]*(x[1]-1)*sin(pi*t)", t=0, degree=4)
nu = Constant(1e-5)
PDEs_numb = 30
# 30 sbamples
kappa = np.array([1.38584602, 0.74351727, 0.73387536, 0.98770648, 1.27195437,
0.70672424, 0.75884729, 1.20655541, 0.87772363, 1.10871134,
0.83404077, 0.70594961, 1.40019136, 0.89766789, 0.99397856,
1.00917521, 1.20975742, 1.27970173, 1.26007072, 1.19429403,
0.69197102, 0.77906883, 0.85317978, 0.70669097, 1.22831455,
0.87615114, 1.03976916, 0.83006619, 0.82888496, 1.33441591])
mesh = UnitSquareMesh(8, 8)
V = FunctionSpace(mesh, "CG", 1)
dt = Constant(0.1)
T = 2
ctrls = OrderedDict()
t = float(dt)
while t <= T:
ctrls[t] = Function(V)
t += float(dt)
def solve_heat(ctrls):
u = []
v = []
u_0 = []
for i in range(PDEs_numb):
u.append(TrialFunction(V))
v.append(TestFunction(V))
u_0.append(Function(V, name="solution"))
f = Function(V, name="source")
d = Function(V, name="data")
F = []
for i in range(PDEs_numb):
F.append(((u[i] - u_0[i]) / dt * v[i] + kappa[i] * nu * inner(grad(u[i]), grad(v[i])) - f * v[i]) * dx)
a = []
L = []
for i in range(PDEs_numb):
a.append(lhs(F[i]))
L.append(rhs(F[i]))
bc = DirichletBC(V, 0, "on_boundary")
t = float(dt)
j = 0
for i in range(PDEs_numb):
j += 0.5 * float(dt) * assemble((u_0[i] - d) ** 2 * dx)
while t <= T:
f.assign(ctrls[t])
data.t = t
d.assign(interpolate(data, V))
for i in range(PDEs_numb):
solve(a[i] == L[i], u_0[i])
for i in range(PDEs_numb):
j += 0.5 * float(dt) * assemble((u_0[i] - d) ** 2 * dx)
t += float(dt)
return u_0, d, j
u, d, j = solve_heat(ctrls)
alpha = Constant(1e-1)
regularisation = alpha/2*sum([1/dt*(fb-fa)**2*dx for fb, fa in
zip(list(ctrls.values())[1:], list(ctrls.values())[:-1])])
J = j + assemble(regularisation)
m = [Control(c) for c in ctrls.values()]
start_time = time.time()
rf = ReducedFunctional(J, m)
opt_ctrls = minimize(rf, options={"maxiter": 1})
total_time = (time.time() - start_time)
f = open ("Runtime_parallel_test.txt", "w")
f.write(str(total_time))
f.write(" seconds")
f.close()
```