Large Mesh Error PETSc Error 76 - Adjoint Calculation

Good morning,

I have attached a minimal running example for which I get the following error:

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at


*** fenics-support@googlegroups.com


*** Remember to include the error message listed below and, if possible,
*** include a minimal running example to reproduce the error.


*** -------------------------------------------------------------------------
*** Error: Unable to successfully call PETSc function ‘KSPSolve’.
*** Reason: PETSc error code is: 76 (Error in external library).
*** Where: This error was encountered inside /Users/travis/miniconda3/conda-bld/fenics_1520063526556/work/dolfin-2017.2.0.post0/dolfin/la/PETScKrylovSolver.cpp.
*** Process: 0


*** DOLFIN version: 2017.2.0
*** Git changeset:
*** ————————————————————————————————————

The error only appears when I increase the number of cells above 30,000.

The error specifically occurs during the calculation of the gradient dJdX5 (line 84).

Could you please help me with this error? I would be very grateful.

Thanks

from dolfin import *
from dolfin_adjoint import *

eps = 1e-4
E, nu = 209*10**9, 0.3                        # Young Modulus
mu = E/(2.0*(1.0 + nu))                        # Shear Modulus
lmbda = E*nu/((1.0 + nu)*(1.0 -2.0*nu))       # Lambda

def epsilon(u):
	return as_tensor(0.5*(nabla_grad(u) + nabla_grad(u).T))

def sigma(u):
    return lmbda*tr(nabla_grad(u))*Identity(3) + 2*mu*epsilon(u) ## Stress

def simp(X5):
	return eps + (1 - eps)*X5**pp

# Create mesh and define function space
mesh = BoxMesh(Point(0, 0, 0), Point(3, 1, 1),90,30,30)

A = FunctionSpace(mesh, "CG", 1)
P = VectorFunctionSpace(mesh, "CG", 1)
global pp
pp = 3

class PosX(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[0],3)
    
class NegX(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[0],0)
    
class PosY(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[1],1)
    
class NegY(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[1],0)

class PosZ(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[2],1)
    
class NegZ(SubDomain):
    def inside(self, x, on_boundary):
        return on_boundary and near(x[2],0)

PosX = PosX()
NegX = NegX()
PosY = PosY()
NegY = NegY()
PosZ = PosZ()
NegZ = NegZ()

materials = MeshFunction('size_t', mesh, 2)
materials.set_all(0)
PosX.mark(materials, 1)
NegX.mark(materials, 2)
PosY.mark(materials, 3)
NegY.mark(materials, 4)
PosZ.mark(materials, 5)
NegZ.mark(materials, 6)
ds = Measure("ds")(subdomain_data=materials)

bc1 = DirichletBC(P.sub(0), 0, materials, 2)
bc2 = DirichletBC(P.sub(1), 0, materials, 2)
bc3 = DirichletBC(P.sub(2), 0, materials, 2)
bc = [bc1,bc2,bc3]
T = Constant((0,-1e3,0))

X5 = interpolate(Constant(0.2),A)

u = TrialFunction(P)
v = TestFunction(P)
a = simp(X5)*inner(sigma(u), grad(v))*dx
L = dot(T,v)*ds(3)
u = Function(P)
solve(a == L, u, bc, solver_parameters={"linear_solver": "mumps"})

J = assemble(inner(T, u)*ds(3))
X5_control = Control(X5)
dJdX5 = compute_gradient(J,X5_control)
1 Like

I sometimes get PETSc Error 76 when my machine runs out of memory.
Have you checked if this is the case?
I would make sense if it only happens for large number of cells.

1 Like

Dear @Dave ,

This is a known dolfin-adjoint issue. The adjoint system does not use mumps, and therefore run out of memory. We are in the process of fixing this, and a branch and corresponding PullRequest has been created: https://bitbucket.org/dolfin-adjoint/pyadjoint/pull-requests/85 .
Could you try to use this branch and see if your problem is resolved?

Dear @Dave,
This is now fixed in the development version of pyadjoint.

Hi dokken,

Before going any further, let me first thank you for your time and good contributions.

I’m bit of a newbie in this field and now have the same problem as mentioned by Dave. I’ve installed Fenics on Ubuntu and everything else seems to be alright.

Regarding your solution, I don’t know how to add the development version of pyadjoint to my Fenics installation. Is there a simple command I can try to get this package upgraded?

Any hint or clue will be much appreciated.

Hi @MortezaAram,
Easiest way of installing the latest version of pyadjoint is to run:
pip3 install git+https://github.com/dolfin-adjoint/pyadjoint.git --upgrade --user.
If you would like to have access to the source code, i suggest cloning the repository and installing it locally.

1 Like

Hi again,

Thanks a lot for your quick response. I do appreciate it.
I’ve managed to get pyadjoint upgraded, but then again I got the following error when I ran my code:

> *solve(a == L, T, Outer_bc, solver_parameters = {'linear_solver':'umfpack'})*
  • *** -------------------------------------------------------------------------*
  • *** Error: Unable to successfully call PETSc function ‘KSPSolve’.*
  • *** Reason: PETSc error code is: 76 (Error in external library).*

You know, I have two models; one of them is small and simple for which I got very good results, the other uses the same methodology but is really big and complicated. The error appears when I try to solve the 2nd problem.

I also tried to use other types of solver like gmres with different preconditioners as suggested in this forum and others alike, but they cannot converge and I got the following error:

> Solution failed to converge in 0 iteration …

Finally, I tried mumps which managed to go through whole my code without producing any error; however, it turned out it had given a bunch of NaN in the resultant vtk file.

I don’t know what else I should try. Do you happen to run into the same problem or do you have any idea that can help resolve this issue?

Could you make a minimal example. It is not possible for me with half of the error message to know why this doesn’t work.

Hi!

Yes, sure. And thanks for your time and attention.

The following is the code for which I need to provide some explanation. It’s for heat transfer in the human body, and human tissue properties (Rho_value, Cp_value, and Kapa_value) come from a mat-file along with a source term called PLD. So I didn’t attach here the details about importing them. The only thing worth mentioning though is that source term (f) has been built for every node in the mesh based on the corresponding value of PLD (see the section: Impressed Source Definition from PLD on FEM nodes!). Can this definition of f-term turn the problem to an ill-posed one? I did exactly the same thing with a smaller model and it worked perfectly fine, but I don’t know why it is not working here!

from fenics import *
import scipy.io as sio
import numpy as np

#%% Importing extracted thermal properties
Cb = 3617 #specific heat (Heat Capacity) of blood (J/Kg/'c)
Wb = 0.5 #local perfusion rate (kg/m3/s), assumed to be the same for all types of tissues
Tb = 37   #Arterial blood temperature
Bolus_Temp = Constant(20)
Body_Temp = Constant(37) #Normal body temperature which is used as the initial condition throughout the body.
PwrNormFac = 1;          

Rho_value = []    #Density (Kg/m3)
Cp_value = []     #Heat Capacity (J/Kg/'c)
Kapa_value = []    #Thermal Conductivity (W/m/'c)
    
#%% Time Settings
TFrame = 3600 #60 min
Num_Steps = 60
dt = TFrame / Num_Steps #For every minute

#%% Mesh conversion using meshio package and xdmf-h5 formats!
import meshio
msh = meshio.read("GeoGmsh.msh")

meshio.write("GeoDuke.xdmf", meshio.Mesh(points=msh.points, cells={"tetra": msh.cells["tetra"]}))
meshio.write("cf.xdmf", meshio.Mesh(points=msh.points, cells={"tetra": msh.cells["tetra"]},\
                                                              cell_data={"tetra": {"name_to_read":msh.cell_data["tetra"]["gmsh:physical"]}}))#Physical
meshio.write("mf.xdmf", meshio.Mesh(points=msh.points, cells={"triangle": msh.cells["triangle"]},\
                                                              cell_data={"triangle": {"name_to_read": msh.cell_data["triangle"]["gmsh:physical"]}})) #Facet

mesh = Mesh()
with XDMFFile("GeoDuke.xdmf") as infile:
    infile.read(mesh)

mvc = MeshValueCollection("size_t",mesh,3)
with XDMFFile("cf.xdmf") as infile:
    infile.read(mvc, "name_to_read")
Markers = cpp.mesh.MeshFunctionSizet(mesh,mvc)

mvc = MeshValueCollection("size_t",mesh,2)
with XDMFFile("mf.xdmf") as infile:
    infile.read(mvc, "name_to_read")
BCs = cpp.mesh.MeshFunctionSizet(mesh,mvc)

dx = Measure("dx", domain=mesh, subdomain_data=Markers)

#%% Boundary Condition and initial value
V = FunctionSpace(mesh, 'CG', 1)
Outer_bc = DirichletBC(V, Bolus_Temp, BCs, 88)   #88 is the skin facet 
T_n = interpolate(Body_Temp, V)                  #or project(Body_Temp, V)

#%% Define Variational Problem
T = TrialFunction(V)
w = TestFunction(V)

#%% Impressed Source Definition from PLD on FEM nodes!
PLDonFEM = sio.loadmat('PLD_On_FEM_Nodes_650MHz.mat')
PLD = PLDonFEM['PLDExportList'][0]/PwrNormFac    

FTermSpace = FunctionSpace(mesh, 'CG', 1)  # so that dofs are only in mesh vertices
f = Function(FTermSpace)
f.vector()[:] = PLD[dof_to_vertex_map(FTermSpace)]

#%% Subdomains' property assignment
V0 = FunctionSpace(mesh, 'DG', 0)
Kapa = Function(V0)
Cp = Function(V0)
Rho = Function(V0)

for n, cell_no in enumerate(Markers.array()):
    Subdomain_Tag = int(cell_no - 1)    
    Kapa.vector()[n] = Kapa_value[Subdomain_Tag]
    Cp.vector()[n] = Cp_value[Subdomain_Tag]
    Rho.vector()[n] = Rho_value[Subdomain_Tag]

MeshExport = File('Duke/mesh.pvd')
MeshExport << mesh
ThermalCond = File('Duke/Kapa.pvd')
ThermalCond << Kapa

#%% Symbolic Bioheat equation
#f = Constant(0)
F = T*w*dx + (Kapa*dt/(Cp*Rho))*dot(grad(T), grad(w))*dx - (T_n + (dt/(Cp*Rho))*f - (Cb*Wb*dt/(Cp*Rho))*(T - Tb))*w*dx
a, L = lhs(F), rhs(F)

# Create VTK file for saving solution
vtkfile = File('Duke/Temperature.pvd')

# Export back to Matlab
#hdfHandle = HDF5File(mesh.mpi_comm(), 'Duke/FinalTemperature.h5','w')
#hdfHandle.write(mesh, '/mesh')

# Time-stepping
T = Function(V)
t = 0
for n in range(Num_Steps):

    # Update current time
    t += dt

    # Compute solution
    solve(a == L, T, Outer_bc, solver_parameters = {'linear_solver':'umfpack'}) # To solve the runtime and memory problem when we have a densely meshed structure.

    
    # Save to file and plot solution
    vtkfile << (T, t)
    
    # Save in h5 format
    Name = '/Temperature/%d_min'%n
    hdfHandle.write(T, Name)

    # Update previous solution
    T_n.assign(T)

This is still a quite advanced example, but here are some pointers on how to debug this:

  1. Remove the time loop, does your code run for just one solve, and are the results reasonable?
    If not, there is clearly something wrong with either the variational form or the handling of input parameters.
  2. Decrease dt such that solve this problem once becomes equivalent to a projection, and see if the results make sense.
  3. I also note that you are not using dolfin_adjoint in this example. Thus the error usually indicates that you are running out of memory on your computer. Therefore, you could also try to run the program in parallel (mpirun -n 2 python3 program.py) and check if the error message disappears

Hi dokken,

Thanks a lot for your pointers.

I tried steps 1 and 2 and the problem is caused by none of them. To solve the problem of runtime, I need to make the code run in parallel as you suggested in the 3rd step, but it doesn’t seem easy and trivial because I’m importing the mesh and I want to define the source term on every and each node based on an array imported from a mat-file. Moreover, I have different material properties to assign to different sub-domains.

I searched a lot through the forum and I found some posts about how to run the program with mpi, but some of them seem to be outdated and they don’t work anymore with the latest version of Fenics. Honestly I don’t know how to break the imported mesh, source term, and sub-domains among several cores and then stitch them back together when the problem is solved finally.

If I send you a version of code, is it possible for you to do me a favor by taking care of the parallelization of the code?

Your time and help is much appreciated.

So what happens when you call:mpirun -n 3 python3` instead of python3?

Since your conversion from mat-file to Function runs in serial, I suggest saving it with XDMFFile:write_checkpoint, which can be reloaded in parallel by XDMFFile.read_checkpoint, see Access function in time dependent XDMFFile

In reference to your question, I got the following error which I think has something to do with reading function from the mat-file and I will follow the link that you already sent for XDMFFile.

But this isn’t the only issue because when I remove the complicated source term and use a constant term (like f = Constant(10) ), I get other types of error regarding the parallelization of the mesh. For xml mesh type I got the following error:

And for xdmf mesh type coming from meshio package, I got this:

Besides the source term, do I need to take care of meshing and properties assignment in a parallel run also? Or they will be taken care of automatically?

So the best thing is to move all IO handling to XDMF, as it naturally works in parallel. With respect to the XDMF error you got, was the XDMF file opened in any other software?

There are always a few things that needs to be thought through in parallel, especially assigning to PETSc vectors. However, partitioning and distribution of the mesh and degrees of freedom is automatically.

Hi Dokken,

Thanks a lot for your helpful tips. They did help me a lot for which I’m grateful.
So as you suggested, I separated I/O part and save them in XDMF format. In respect to your previous question about having the XDMF file open somewhere else, I believe that the error was caused by running the XDMFWrite part in parallel because then only one process could do that at a time and the file would be read-only to the other cores. So to solve the problem, I had to split the program into two parts and run the first part in serial just to save the inputs. Then, using mpirun -n 4 python3, I ran the parallel code.

I managed to go through the whole parallel program without any error, but the only problem is that the final output isn’t what I expect to get because I’ve already run exactly the same model in serial and I got reasonable answers out of that, but this one isn’t even close. I doubt I missed something in the parallel part or did something wrongly. Can you please take a look at the following to snippets in serial and parallel to see if there is a blunder or something causing the problem?

Serial Part:

from fenics import *
import scipy.io as sio
import numpy as np

#%% Constants
Kapa_value = [0.5, 0.55] #Thermal Conductivity (W/m/'c), [Muscle, Tumor] (see ‘combine_raw’ m-file)
Cp_value = [3421, 3500]#Heat Capacity (J/Kg/'c), [Muscle, Tumor]
Cb = 3617 #specific heat (Heat Capacity) of blood (J/Kg/'c)
Wb_value = [0.8, 0.5] #local perfusion rate (kg/m3/s), [Muscle, Tumor]
Tb = 37 #Arterial blood temperature
Rho_value = [1090, 1040]#Density of Muscle (Kg/m3)
PwrNormFac = 400; #makeshift (needs to be defined carefully in Matlab)

#%% Modern Mesh conversion using meshio package and xdmf-h5 formats!
import meshio
msh = meshio.read(“GeoGmsh_cylinder.msh”)

meshio.write(“GeoCylinder.xdmf”, meshio.Mesh(points=msh.points, cells={“tetra”: msh.cells[“tetra”]}))
meshio.write(“cf.xdmf”, meshio.Mesh(points=msh.points, cells={“tetra”: msh.cells[“tetra”]},
cell_data={“tetra”: {“name_to_read”:msh.cell_data[“tetra”][“gmsh:physical”]}}))#Physical
meshio.write(“mf.xdmf”, meshio.Mesh(points=msh.points, cells={“triangle”: msh.cells[“triangle”]},
cell_data={“triangle”: {“name_to_read”: msh.cell_data[“triangle”][“gmsh:physical”]}})) #Facet

mesh = Mesh()
with XDMFFile(“GeoCylinder.xdmf”) as infile:
infile.read(mesh)

mvc = MeshValueCollection(“size_t”,mesh,3)
with XDMFFile(mesh.mpi_comm(),“cf.xdmf”) as infile:
infile.read(mvc, “name_to_read”)
Markers = cpp.mesh.MeshFunctionSizet(mesh,mvc)

#%% Impressed Source Definition from PLD on FEM nodes!
PLDonFEM = sio.loadmat(‘PLD_On_FEM_Nodes_cylinder_650MHz.mat’)
PLD = PLDonFEM[‘PLDExportList’][0]/PwrNormFac

FTermSpace = FunctionSpace(mesh, ‘CG’, 1) # so that dofs are only in mesh vertices
f = Function(FTermSpace)
f.vector()[:] = PLD[dof_to_vertex_map(FTermSpace)]

#Write in XDMF format so that it can be retrieved in parallel
PLDXDMF = XDMFFile(‘Q.xdmf’)
PLDXDMF.write_checkpoint(f,‘Source’,0) #0:for the 1st frequency (in a multi-frequency paradigm>> 1: 2nd freq, …)

#%% Subdomains’ property assignment
V0 = FunctionSpace(mesh, ‘DG’, 0)
Kapa = Function(V0)
Cp = Function(V0)
Wb = Function(V0)
Rho = Function(V0)

for n, cell_no in enumerate(Markers.array()):
Subdomain_Tag = int(cell_no - 1)
Kapa.vector()[n] = Kapa_value[Subdomain_Tag]
Cp.vector()[n] = Cp_value[Subdomain_Tag]
Wb.vector()[n] = Wb_value[Subdomain_Tag]
Rho.vector()[n] = Rho_value[Subdomain_Tag]

#Save them for visualization purposes
MeshExport = File(‘Cylinder/mesh.pvd’)
MeshExport << mesh
ThermalCond = File(‘Cylinder/Kapa.pvd’)
ThermalCond << Kapa

#Save them for parallelization purposes
KapaXDMF = XDMFFile(‘Kapa.xdmf’)
KapaXDMF.write_checkpoint(Kapa,‘K’,0)

CpXDMF = XDMFFile(‘Cp.xdmf’)
CpXDMF.write_checkpoint(Cp,‘C’,0)

WbXDMF = XDMFFile(‘Wb.xdmf’)
WbXDMF.write_checkpoint(Wb,‘W’,0)

RhoXDMF = XDMFFile(‘Rho.xdmf’)
RhoXDMF.write_checkpoint(Rho,‘R’,0)

Parallel Part:

from fenics import *
import scipy.io as sio
import numpy as np

#%%
Cb = 3617 #specific heat (Heat Capacity) of blood (J/Kg/'c)
Tb = 37 #Arterial blood temperature
Bolus_Temp = Constant(20)
Body_Temp = Constant(37) #Normal body temperature which is used as the initial condition throughout the body.

#%% Time Settings
TFrame = 3600 #60 min
Num_Steps = 60
dt = TFrame / Num_Steps #For every second

#%% Modern Mesh conversion
mesh = Mesh()
with XDMFFile(mesh.mpi_comm(),“GeoCylinder.xdmf”) as infile:
infile.read(mesh)

mvc = MeshValueCollection(“size_t”,mesh,3)
with XDMFFile(mesh.mpi_comm(),“cf.xdmf”) as infile:
infile.read(mvc, “name_to_read”)
Markers = cpp.mesh.MeshFunctionSizet(mesh,mvc)

mvc = MeshValueCollection(“size_t”,mesh,2)
with XDMFFile(mesh.mpi_comm(),“mf.xdmf”) as infile:
infile.read(mvc, “name_to_read”)
BCs = cpp.mesh.MeshFunctionSizet(mesh,mvc)

dx = Measure(“dx”, domain=mesh, subdomain_data=Markers)

#%% Boundary Condition and initial value
V = FunctionSpace(mesh, ‘CG’, 2)
Outer_bc = DirichletBC(V, Bolus_Temp, BCs, 3) #DomainBoundary())
T_n = interpolate(Body_Temp, V) #or project(Body_Temp, V)

#%% Define Variational Problem
T = TrialFunction(V)
w = TestFunction(V)

#%% Impressed Source Definition from PLD on FEM nodes!
FTermSpace = FunctionSpace(mesh, ‘CG’, 1) # so that dofs are only in mesh vertices
f = Function(FTermSpace)

#read the process-specific portion of the source back from XDMF file stored by the serial part.
with XDMFFile(mesh.mpi_comm(),‘Q.xdmf’) as file:
file.read_checkpoint(f,‘Source’,0)

#%% Subdomains’ property assignment
V0 = FunctionSpace(mesh, ‘DG’, 0)
Kapa = Function(V0)
Cp = Function(V0)
Wb = Function(V0)
Rho = Function(V0)

#read the subdomain XDMF file stored by the serial part for this process.
with XDMFFile(mesh.mpi_comm(),‘Kapa.xdmf’) as file:
file.read_checkpoint(Kapa,‘K’,0)

with XDMFFile(mesh.mpi_comm(),‘Cp.xdmf’) as file:
file.read_checkpoint(Cp,‘C’,0)

with XDMFFile(mesh.mpi_comm(),‘Wb.xdmf’) as file:
file.read_checkpoint(Wb,‘W’,0)

with XDMFFile(mesh.mpi_comm(),‘Rho.xdmf’) as file:
file.read_checkpoint(Rho,‘R’,0)

#%% Symbolic Bioheat equation
F = Twdx + (Kapadt/(CpRho))dot(grad(T), grad(w))dx - (T_n + (dt/(CpRho))f - (CbWbdt/(CpRho))(T - Tb))wdx
a, L = lhs(F), rhs(F)

Create VTK file for saving solution

vtkfile = File(‘Cylinder/Temperature.pvd’)

Export back to Matlab

hdfHandle = HDF5File(mesh.mpi_comm(), ‘Cylinder/FinalTemperature.h5’,‘w’)
hdfHandle.write(mesh, ‘/mesh’)

Time-stepping

T = Function(V)
t = 0
for n in range(Num_Steps):

# Update current time
t += dt

# Compute solution
solve(a == L, T, Outer_bc)


# Save to file and plot solution
vtkfile << (T, t)

# Save in h5 format
Name = '/Temperature/%d_min'%n
hdfHandle.write(T, Name)

# Update previous solution
T_n.assign(T)

BTW, I still have some doubts about where to put mesh.mpi_comm() and where to suppress it in the parallel part. Can the problem possibly come from there?

Thanks a lot again!

I would use xdmf for saving the output as well, as pvd is not optimal for parallel writing. Other than that i cant see anything after a quick glance.

I see.
Alright, I’ll double check my inputs and use xdmf for saving outputs.
Thanks a lot, you helped a lot. I appreciate it.

Hi again,

I hope I didn’t go too far by asking too many questions.

First of all, almost all the previous problems have been solved with a lot of help from you, but there is still a small issue that keeps bothering me for a while.

I get the final distribution and it’s as I expect but it doesn’t look very nice in Paraview due to visible edges appearing in the visualization (see the following sketch). I know I can import them into Matlab and do kind of smoothing and interpolation there, but is there any other way to get rid of these edges in Fenics itself?


I should also mention that the meshing of the problem comes from another software and I have no control over it to increase. So right now I define the function space as V = FunctionSpace(mesh, ‘CG’, 2). Can I make the final result smoother if I change the type from CG to something else? What about the order?

You could use the refine command in dolfin to refine all cells of the mesh mesh = refine(mesh), or increase the polynomial order even further.
Note that to visualize higher order function spaces (higher than CG1) you have to use XDMFFile and write_checkpoint, as it supports arbitrary order lagrangian elements, while standard XDMFFile.write and File.write only stores data at the vertices. On the usage of write_checkpoint, see Loading xdmf data back in .
I would start with writing the solution with write checkpoint and see if it improves. If not, refine the mesh.

You were right as always :slightly_smiling_face:
Now with XDMF, it looks much better. Thanks a lot man, you’re really helpful.