Pyadjoint crash when using project function

stephane · May 3, 2019, 10:27am

Dear all,

I’ve tried to make parameters optimization with pyadjoint for a, quite heavy, bio-poromechanical system, and everything is fine until an error occurs with the project function. Depending on the version of the code, the error doesn’t occurs at the same iteration (638, 1015, 1048, 2035, …). It seems that simpler is the code, greater the number of iterations done.

I’m on Ubuntu 18.04.2 LTS and I’m using FEniCS and Pyadjoint on Docker.

The error is the following, occurs at iteration 2035 - takes 30 seconds running with the MWE - :

File “1flu_1sol_dolfin.py”, line 81, in <module>
p_P1 = project(_p, L)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/projection.py”, line 24, in project
block = ProjectBlock(args[0], args[1], output, bcs, **sb_kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/projection.py”, line 45, in init
super(ProjectBlock, self).init(a == L, output, bcs, *args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/solving.py”, line 76, in init
self._init_dependencies(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/solving.py”, line 145, in _init_dependencies
self.add_dependency(c.block_variable, no_duplicates=True)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block.py”, line 50, in add_dependency
dep.will_add_as_dependency()
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block_variable.py”, line 63, in will_add_as_dependency
self.save_output(overwrite=overwrite)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/tape.py”, line 46, in wrapper
return function(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block_variable.py”, line 51, in save_output
self._checkpoint = self.output._ad_create_checkpoint()
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/types/function.py”, line 142, in _ad_create_checkpoint
deepcopy=True)
File “/usr/local/lib/python3.6/dist-packages/dolfin/function/function.py”, line 526, in sub
self.cpp_object().sub(i),
RuntimeError: *** Error: Duplication of MPI communicator failed (MPI_Comm_dup

I’ve tried to make a minimal working example, but it’s 80 lines long. It’s the classical Terzaghi consolidation problem of poromechanics.

It’s a mixed formulation problem, known and benchmarked. It’s almost sure that the error isn’t a math problem. Here is the code:

from dolfin import *
from dolfin_adjoint import *
from mshr import *
from ufl import nabla_div

# Terzaghi's problem
T, num_steps = 100.0, 10000                        
dt = T / (1.0*num_steps)
rho_l, mu_l, k  = 1000, 1e-2, 1.8e-15 
E, nu = 5000, 0.4
lambda_, mu =(E*nu)/((1+nu)*(1-2*nu)), E/(2*(1+nu))
Kf, Ks, poro=2.2e9, 1e10, 0.2
S=(poro/Kf)+(1-poro)/Ks

# Create mesh and define expression
mesh=RectangleMesh(Point(0.0,0.0), Point(1e-5, 1e-4),8,16,'crossed')
top =  CompiledSubDomain("near(x[1], side)", side = 1e-4)
bottom =  CompiledSubDomain("near(x[1], side)", side = 0.0)
left_right = CompiledSubDomain("(near(x[0], 0.0) )|| (near(x[0], 1e-5) )")

boundary = MeshFunction('size_t', mesh, mesh.topology().dim()-1)
boundary.set_all(1)
top.mark(boundary, 2)
ds = Measure('ds', domain = mesh, subdomain_data = boundary)

# Load on the top boundary
T  = Expression(('0.0', '-100'),degree=1)  

#Define Mixed Space (R2,R) -> (u,p)
V = VectorElement("CG", mesh.ufl_cell(), 2)
W = FiniteElement("CG", mesh.ufl_cell(), 1)
L = dolfin.FunctionSpace(mesh,W)
MS = dolfin.FunctionSpace(mesh, MixedElement([V,W]))

# Define boundary condition
c = Expression(("0.0"), degree=1)
bcu1 = DirichletBC(MS.sub(0).sub(0), c, left_right)  
bcu2 = DirichletBC(MS.sub(0).sub(1), c, bottom)      
bcp = DirichletBC(MS.sub(1), c, top)
bc=[bcu1,bcu2,bcp]

# Define strain
def epsilon(u):
    return 0.5*(nabla_grad(u) + nabla_grad(u).T)

# Define variational problem and initial condition
X0 = Function(MS)
B = TestFunction(MS)

e_u0 = Expression(('0.0', '0.0'), degree=1)
e_p0 = Expression('100.0', degree=1)

u0 = interpolate(e_u0, MS.sub(0).collapse())
p0 = interpolate(e_p0, MS.sub(1).collapse())

Xn = Function(MS)
assign(Xn, [u0, p0])

(u,p)=split(X0)
(u_n,p_n)=split(Xn)
(v,q)=split(B)

F = (1/dt)*nabla_div(u-u_n)*q*dx + (k/(mu_l))*dot(grad(p),grad(q))*dx
F += inner(2*mu*epsilon(u),epsilon(v))*dx + lambda_*nabla_div(u)*nabla_div(v)*dx -p*nabla_div(v)*dx- dot(T,v)*ds(2)

t = 0
for n in range(num_steps):
	print('n=',n)
	t += dt
	
	solve(F == 0, X0, bc)   
	    
	(_u,_p)=X0.split()

	# the error occurs here
	p_P1 = project(_p, L)

	assign(Xn,X0)

If the lines:

from dolfin import *
from dolfin_adjoint import *

are replaced by

from fenics import *

the 10,000 iterations are done.

Have you a lead to avoid this error ? Is it an identified bug ?

Thank you for your time

ayonge3 · May 26, 2019, 3:56am

Was running into a similar problem (always stopped running after ~400 time steps). My code would work for fenics / fenics-adjoint version 2017.2.0, but not for 2018.1.0. I just saw your post, and removing " from dolfin-adjoint import * " solved the problem and ran until it reached the required 1000 time steps. I’m not sure what this means for optimization, but I’ll post a general question tomorrow. Thanks for the post!

Slightly extended error:
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/variational_solver.py”, line 46, in solve
**self._ad_kwargs)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/solving.py”, line 106, in init
self.add_dependency(bc.block_variable, no_duplicates=True)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/block.py”, line 34, in add_dependency
dep.will_add_as_dependency()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/block_variable.py”, line 61, in will_add_as_dependency
overwrite = self.output._ad_will_add_as_dependency()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/overloaded_type.py”, line 332, in _ad_will_add_as_dependency
self._ad_annotate_block()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/overloaded_type.py”, line 349, in _ad_annotate_block
block = self.block_class(*self._ad_args, **self._ad_kwargs)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/types/dirichletbc.py”, line 90, in init
self.collapsed_space = self.function_space.collapse()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/dolfin/function/functionspace.py”, line 195, in collapse
cpp_space, dofs = self._cpp_object.collapse()
RuntimeError: *** Error: Duplication of MPI communicator failed (MPI_Comm_dup

sebastkm · June 9, 2019, 10:13pm

Hi, this problem occurs because each Function.sub(deepcopy=True) call creates a new function space (which in turn uses 1 MPI communicator). Normally DOLFIN will free these communicators when the subfunctions are freed from memory (which would be typical for a for-loop), but in order to track inputs/outputs of operations dolfin-adjoint keeps these in memory.

Now in this case, you don’t have a deepcopy=True split, but dolfin-adjoint stores the values as deepcopies internally. However, we believe the annotation should still work without these deepcopies, so I have now pushed a commit to the master branch of dolfin-adjoint that should fix the issue in your posted code.

This can be installed using

pip install git+https://bitbucket.org/dolfin-adjoint/pyadjoint.git@master

stephane · June 10, 2019, 7:11am

The error vanished! Many thanks!

Minakari · April 1, 2020, 8:31pm

Dear all,

I have the same problem, I need to run the same PDE (Poisson) several times and after 200 steps I get the following error:

Error: Duplication of MPI communicator failed (MPI_Comm_dup

The problem here is that, I need to use dolfin-adjoint lib (from dolfin_adjoint import *) since I need to calculate the gradient of objective function. I’m using_fenics / dolfin-adjoint version 2018.1.0 on virtual ubuntu. I would appreciate it if you could guide me how should I erase the deepcopies.

I have attached the part of code that I need to run multiple times here:

from dolfin import *
from dolfin_adjoint import *

V = FunctionSpace(mesh, “CG”, 1)
W = FunctionSpace(mesh, “DG”, 0)
f = interpolate(Expression(“100”, degree=0), W)
u = Function(V, name=‘State’)
v = TestFunction(V)

F = (exp(k)*inner(grad(u), grad(v)) - f * v) * dx
bc = DirichletBC(V, 0.0, “on_boundary”)
solve(F == 0, u, bc)

alpha = Constant(1e-4) # var is some observation
J = assemble((0.5 * inner(u - var, u - var)) * dx + alpha / 2 *inner(grad(k),grad(k)) * dx)
control = Control(k)
Gradient = compute_gradient(J, control)

dokken · April 2, 2020, 8:53am

Dear @Minakari,

Please take some time to carefully format your question such that code snippets and mathematical formulae are correctly presented.

Furthermore construct a minimal working example as described here.

The more succinct your question, the more likely it is that people will be able to offer help.

yang-sjtu · May 10, 2020, 10:16am

Hi @dokken,

apparently, I had similar issue regarding the MPI communicaor issue when I was testing with different optimizer for my optimization problem. What i found is, it is fine to use built-in functions such as Minimize() or MinimizationProblem(), but I got MPI communicator issue when I calculate gradient mannually and provide it to the in-hosue optimizer.

So, here i try a very simple MWE to show the error, just repeating the Stokes demo with dolfin_adjoint imported:

from dolfin import *
from dolfin_adjoint import *

# Load mesh and subdomains
mesh = Mesh("../dolfin_fine.xml.gz")
sub_domains = MeshFunction("size_t", mesh, "../dolfin_fine_subdomains.xml.gz")

# Define function spaces
P2 = VectorElement("Lagrange", mesh.ufl_cell(), 2)
P1 = FiniteElement("Lagrange", mesh.ufl_cell(), 1)
TH = P2 * P1
W = FunctionSpace(mesh, TH)

# No-slip boundary condition for velocity
# x1 = 0, x1 = 1 and around the dolphin
noslip = Constant((0, 0))
bc0 = DirichletBC(W.sub(0), noslip, sub_domains, 0)

# Inflow boundary condition for velocity
# x0 = 1
inflow = Expression(("-sin(x[1]*pi)", "0.0"), degree=2)
bc1 = DirichletBC(W.sub(0), inflow, sub_domains, 1)

# Collect boundary conditions
bcs = [bc0, bc1]


def solve_stokes():
    # Define variational problem
    (u, p) = TrialFunctions(W)
    (v, q) = TestFunctions(W)
    f = Constant((0, 0))
    a = (inner(grad(u), grad(v)) - div(v)*p + q*div(u))*dx
    L = inner(f, v)*dx

    # Compute solution
    w = Function(W)
    solve(a == L, w, bcs)

    # Split the mixed solution using deepcopy
    # (needed for further computation on coefficient vector)
    (u, p) = w.split(True)

    print("Norm of velocity coefficient vector: %.15g" % u.vector().norm("l2"))
    print("Norm of pressure coefficient vector: %.15g" % p.vector().norm("l2"))

if __name__ == "__main__":
    for i in range(2000):
        print("================= i = %s =======================" % i)
        solve_stokes()

then the error reads

...
================= i = 1019 =======================
Solving linear variational problem.
Traceback (most recent call last):
  File "MPI_comm_issue.py", line 50, in <module>
    solve_stokes()
  File "MPI_comm_issue.py", line 38, in solve_stokes
    solve(a == L, w, bcs)
  File "/usr/local/lib/python3.6/dist-packages/fenics_adjoint/solving.py", line 31, in solve
    output = backend.solve(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/dolfin/fem/solving.py", line 220, in solve
    _solve_varproblem(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/dolfin/fem/solving.py", line 247, in _solve_varproblem
    solver.solve()
RuntimeError: *** Error: Duplication of MPI communicator failed (MPI_Comm_dup

I tried what @sebastkm suggested here, but by installing the updated dolfin-adjoint from neither bitbucket nor github can solve this MPI communicator problem.

sebastkm · May 11, 2020, 10:02pm

If the snippet you wrote is written inside a for-loop, then you will run into the same MPI communicators issue as you mention. You should try to reuse your function spaces (V and W), and also your expression (or replace it with Constant(100)).

sebastkm · May 11, 2020, 10:38pm

I tested this with quay.io/fenicsproject/stable:latest and the latest dolfin-adjoint, and I did not get an error even after 2000 iterations.

yang-sjtu · May 11, 2020, 11:17pm

So you mean create the docker container with fenics image and install (pip?) latest dolfin-adojint inside the container? Thanks a lot for trying it out. Apparently, i was using the doflin-adjoint image directly. I will give it a try as well.

yang-sjtu · May 12, 2020, 12:36am

Indeed, when I follow your way, the MPI communicator error doesn’t exist anymore. However, the latest dolfin-adjoint comes with the issue when I create the mesh with RectangleMesh.create()

from dolfin import *
from dolfin_adjoint import *

mesh = RectangleMesh.create([Point(0.0, 0.0), Point(1.0,1.0)], [30, 30], CellType.Type.triangle)
Q = FunctionSpace(mesh, "CG", 1)  

J = assemble(interpolate(Constant(1.0), Q)*dx)

this MWE gives the error below only when the dolfin_adjoint is also imported

Traceback (most recent call last):
  File "MPI_comm_issue_2.py", line 8, in <module>
    J = assemble(interpolate(Constant(1.0), Q)*dx)
  File "/home/fenics/.local/lib/python3.6/site-packages/fenics_adjoint/assembly.py", line 24, in assemble
    block = AssembleBlock(form)
  File "/home/fenics/.local/lib/python3.6/site-packages/fenics_adjoint/assembly.py", line 72, in __init__
    self.add_dependency(mesh)
  File "/home/fenics/.local/lib/python3.6/site-packages/pyadjoint/block.py", line 51, in add_dependency
    dep._ad_will_add_as_dependency()
AttributeError: 'dolfin.cpp.mesh.Mesh' object has no attribute '_ad_will_add_as_dependency'

sebastkm · May 12, 2020, 6:55am

You aren’t using the latest dolfin-adjoint there. Perhaps that is the latest available from PyPI. You can download the latest sources from: https://github.com/dolfin-adjoint/pyadjoint

There the RectangleMesh.create should be overloaded. Also, if you have this problem with a mesh, you can always do: mesh = Mesh(mesh) to manually overload the mesh.

yang-sjtu · May 12, 2020, 7:57am

I see, thanks a lot for you crrection and the overload tech!

yang-sjtu · August 12, 2020, 2:00am

Here I post the bug in my code leading to the MPI communicator issue. Because I start the optimization without using the ReducedFunctional() and after calculating the gradient I FORGOT to clear the tape so that after long enough optimization iterations the MPI communicator ran out and report the error.

So adding set_working_tape(Tape()) in gradient calculation function can solve the problem, if the graident is calculated manually.

xiaodong_zhang · August 20, 2020, 4:42pm

Dear Stephane,
I used your example program,but the pressure and displacement do not have change.Could you please where the problem is?Thank you.

stephane · September 10, 2020, 3:01pm

Sorry for the delay Xiaodong Zhang. I’m not sure to understand your question, the aim of this code is compare FE solutions with the analytical solution of Terzaghi’s problem, and I can assure you that the displacement and pressure are changing.
Maybe it’s a matter of storage? Add this:
vtkfile_u = File(‘Terzaghi/u.pvd’)
vtkfile_p = File(‘Terzaghi/p.pvd’)
and inside the for-loop, after each iteration:
vtkfile_u << (_u,t)
vtkfile_p << (_p,t)
I’m hoping that I’ve not completely missed something…

xiaodong_zhang · September 11, 2020, 8:21am

OK. Thank you for your reply. I will have a try.

Junpeng · December 29, 2020, 9:03am

Thank you for your information! Does this solve your problems completely?

I use ReducedFunctional() and compute the derivative, then I set_working_tape(Tape()) in every loop. Though it helps a lot (help increase many times of loops), it seems that the problems still exist as time goes long enough. The function space is not the same so I define a new function space in every loop. (I use the latest dolfin-adjoint)

Besides the tape, is there anything that contributes to MPI communicator running out?

xiaodong_zhang · January 19, 2021, 2:09pm

Dear Stephane, Could you please tell me how these two equations come?
F = (1/dt)nabla_div(u-u_n)qdx + (k/(mu_l))dot(grad§,grad(q))dx + ( S/dt )(p-p_n)qdx
F += inner(2muepsilon(u),epsilon(v))*dx + lambda_*nabla_div(u)*nabla_div(v)dx -pnabla_div(v)*dx- dot(T,v)*ds(2) .Thank you in advance.

stephane · January 19, 2021, 4:46pm

My pleasure! These are the weak formulation of Terzaghi’s consolidation problem. It is a classical problem, I’m sure you will find the strong form (the physical equations) in any book of poromechanics.

Topic		Replies	Views
Time-dependent Navier Stokes flow control dolfin-adjoint	4	241	October 17, 2024
Error using adj_checkpointing	4	495	January 16, 2022
Adjoint for function evaluation at point dolfin-adjoint	2	1182	September 9, 2020
Dolfin-adjoint overloaded solve fails but regular dolfin solve succeeds? dolfin-adjoint	2	721	March 11, 2022
NotImplementedError when using 2D Expression as Control dolfin-adjoint	4	700	November 26, 2020

Pyadjoint crash when using project function

Related topics