Pyadjoint crash when using project function

Dear all,

I’ve tried to make parameters optimization with pyadjoint for a, quite heavy, bio-poromechanical system, and everything is fine until an error occurs with the project function. Depending on the version of the code, the error doesn’t occurs at the same iteration (638, 1015, 1048, 2035, …). It seems that simpler is the code, greater the number of iterations done.

I’m on Ubuntu 18.04.2 LTS and I’m using FEniCS and Pyadjoint on Docker.

The error is the following, occurs at iteration 2035 - takes 30 seconds running with the MWE - :

File “1flu_1sol_dolfin.py”, line 81, in <module>
p_P1 = project(_p, L)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/projection.py”, line 24, in project
block = ProjectBlock(args[0], args[1], output, bcs, **sb_kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/projection.py”, line 45, in init
super(ProjectBlock, self).init(a == L, output, bcs, *args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/solving.py”, line 76, in init
self._init_dependencies(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/solving.py”, line 145, in _init_dependencies
self.add_dependency(c.block_variable, no_duplicates=True)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block.py”, line 50, in add_dependency
dep.will_add_as_dependency()
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block_variable.py”, line 63, in will_add_as_dependency
self.save_output(overwrite=overwrite)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/tape.py”, line 46, in wrapper
return function(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/pyadjoint/block_variable.py”, line 51, in save_output
self._checkpoint = self.output._ad_create_checkpoint()
File “/usr/local/lib/python3.6/dist-packages/fenics_adjoint/types/function.py”, line 142, in _ad_create_checkpoint
deepcopy=True)
File “/usr/local/lib/python3.6/dist-packages/dolfin/function/function.py”, line 526, in sub
self.cpp_object().sub(i),
RuntimeError: *** Error: Duplication of MPI communicator failed (MPI_Comm_dup

I’ve tried to make a minimal working example, but it’s 80 lines long. It’s the classical Terzaghi consolidation problem of poromechanics.

It’s a mixed formulation problem, known and benchmarked. It’s almost sure that the error isn’t a math problem. Here is the code:

from dolfin import *
from dolfin_adjoint import *
from mshr import *
from ufl import nabla_div

# Terzaghi's problem
T, num_steps = 100.0, 10000                        
dt = T / (1.0*num_steps)
rho_l, mu_l, k  = 1000, 1e-2, 1.8e-15 
E, nu = 5000, 0.4
lambda_, mu =(E*nu)/((1+nu)*(1-2*nu)), E/(2*(1+nu))
Kf, Ks, poro=2.2e9, 1e10, 0.2
S=(poro/Kf)+(1-poro)/Ks

# Create mesh and define expression
mesh=RectangleMesh(Point(0.0,0.0), Point(1e-5, 1e-4),8,16,'crossed')
top =  CompiledSubDomain("near(x[1], side)", side = 1e-4)
bottom =  CompiledSubDomain("near(x[1], side)", side = 0.0)
left_right = CompiledSubDomain("(near(x[0], 0.0) )|| (near(x[0], 1e-5) )")

boundary = MeshFunction('size_t', mesh, mesh.topology().dim()-1)
boundary.set_all(1)
top.mark(boundary, 2)
ds = Measure('ds', domain = mesh, subdomain_data = boundary)

# Load on the top boundary
T  = Expression(('0.0', '-100'),degree=1)  

#Define Mixed Space (R2,R) -> (u,p)
V = VectorElement("CG", mesh.ufl_cell(), 2)
W = FiniteElement("CG", mesh.ufl_cell(), 1)
L = dolfin.FunctionSpace(mesh,W)
MS = dolfin.FunctionSpace(mesh, MixedElement([V,W]))

# Define boundary condition
c = Expression(("0.0"), degree=1)
bcu1 = DirichletBC(MS.sub(0).sub(0), c, left_right)  
bcu2 = DirichletBC(MS.sub(0).sub(1), c, bottom)      
bcp = DirichletBC(MS.sub(1), c, top)
bc=[bcu1,bcu2,bcp]

# Define strain
def epsilon(u):
    return 0.5*(nabla_grad(u) + nabla_grad(u).T)

# Define variational problem and initial condition
X0 = Function(MS)
B = TestFunction(MS)

e_u0 = Expression(('0.0', '0.0'), degree=1)
e_p0 = Expression('100.0', degree=1)

u0 = interpolate(e_u0, MS.sub(0).collapse())
p0 = interpolate(e_p0, MS.sub(1).collapse())

Xn = Function(MS)
assign(Xn, [u0, p0])

(u,p)=split(X0)
(u_n,p_n)=split(Xn)
(v,q)=split(B)

F = (1/dt)*nabla_div(u-u_n)*q*dx + (k/(mu_l))*dot(grad(p),grad(q))*dx
F += inner(2*mu*epsilon(u),epsilon(v))*dx + lambda_*nabla_div(u)*nabla_div(v)*dx -p*nabla_div(v)*dx- dot(T,v)*ds(2)

t = 0
for n in range(num_steps):
	print('n=',n)
	t += dt
	
	solve(F == 0, X0, bc)   
	    
	(_u,_p)=X0.split()

	# the error occurs here
	p_P1 = project(_p, L)

	assign(Xn,X0)

If the lines:

from dolfin import *
from dolfin_adjoint import *

are replaced by

from fenics import *

the 10,000 iterations are done.

Have you a lead to avoid this error ? Is it an identified bug ?

Thank you for your time

Was running into a similar problem (always stopped running after ~400 time steps). My code would work for fenics / fenics-adjoint version 2017.2.0, but not for 2018.1.0. I just saw your post, and removing " from dolfin-adjoint import * " solved the problem and ran until it reached the required 1000 time steps. I’m not sure what this means for optimization, but I’ll post a general question tomorrow. Thanks for the post!

Slightly extended error:
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/variational_solver.py”, line 46, in solve
**self._ad_kwargs)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/solving.py”, line 106, in init
self.add_dependency(bc.block_variable, no_duplicates=True)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/block.py”, line 34, in add_dependency
dep.will_add_as_dependency()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/block_variable.py”, line 61, in will_add_as_dependency
overwrite = self.output._ad_will_add_as_dependency()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/overloaded_type.py”, line 332, in _ad_will_add_as_dependency
self._ad_annotate_block()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/pyadjoint/overloaded_type.py”, line 349, in _ad_annotate_block
block = self.block_class(*self._ad_args, **self._ad_kwargs)
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/fenics_adjoint/types/dirichletbc.py”, line 90, in init
self.collapsed_space = self.function_space.collapse()
File “/home/ayonge3/anaconda3/envs/fenics_2018/lib/python3.7/site-packages/dolfin/function/functionspace.py”, line 195, in collapse
cpp_space, dofs = self._cpp_object.collapse()
RuntimeError: *** Error: Duplication of MPI communicator failed (MPI_Comm_dup

Hi, this problem occurs because each Function.sub(deepcopy=True) call creates a new function space (which in turn uses 1 MPI communicator). Normally DOLFIN will free these communicators when the subfunctions are freed from memory (which would be typical for a for-loop), but in order to track inputs/outputs of operations dolfin-adjoint keeps these in memory.

Now in this case, you don’t have a deepcopy=True split, but dolfin-adjoint stores the values as deepcopies internally. However, we believe the annotation should still work without these deepcopies, so I have now pushed a commit to the master branch of dolfin-adjoint that should fix the issue in your posted code.

This can be installed using

pip install git+https://bitbucket.org/dolfin-adjoint/pyadjoint.git@master
1 Like

The error vanished! Many thanks!