Hello,
I posted yesterday regarding mixed assembly failing on a particular mesh I had. I originally thought it had something to do with something like normal orientations or degenerate elements, but after some more digging, I realized that the problem is likely in how the mesh is partitioned when calling mpirun (I am using the latest docker container [quay.io/fenicsproject/dev] which uses the SCOTCH partitioner).
I was able to replicate the assembly errors I was getting on my previous mesh using just pure fenics code. We take a unit cube mesh, stretch it a bit, and define a subvolume and subsurface closer to one edge. When I run this code with mpirun -np n python3 code.py
with n=[2,3] it runs fine, n=[4,5] causes it to hang until force quitting, and n=6 leads to a segfault.
MWE:
# mixed assembly mwe
import dolfin as d
# Create a mesh
mesh = d.UnitCubeMesh(40,40,40)
z_slice = 0.5 # value of z we use to demaracate the different subvolumes
mesh.coordinates()[:,2] *= 4 # something about partitioning leads to hanging
mesh.init()
# define mesh functions
mf3 = d.MeshFunction("size_t", mesh, 3, 0)
mf2 = d.MeshFunction("size_t", mesh, 2, 0)
for c in d.cells(mesh):
mf3[c] = ( c.midpoint().z() < z_slice+d.DOLFIN_EPS )*1
for f in d.facets(mesh):
mf2[f] = (z_slice-d.DOLFIN_EPS <= f.midpoint().z() <= z_slice+d.DOLFIN_EPS)*1
# Create submeshes using MeshView
mesh0 = d.MeshView.create(mf3, 1); mesh_ = d.MeshView.create(mf2, 1)
# build mesh mapping
mesh_.build_mapping(mesh0)
# function space and function
V0 = d.FunctionSpace(mesh0, "CG", 1)
u0 = d.Function(V0)
# measures
dx = d.Measure('dx', domain=mesh)
dx0 = d.Measure('dx', domain=mesh0)
dx_ = d.Measure('dx', domain=mesh_)
# Give u some value
u0.assign(d.Constant(4))
# Show that the submeshes indeed are non-empty
print(d.assemble(1*dx)) # volume of full mesh = 4
print(d.assemble(1*dx0)) # 0.5
print(d.assemble(1*dx_)) # 1
# Try to assemble a function integrated over domain
print(d.assemble(u0*dx0)) # 4*0.5 = 2
# !!! THIS FAILS IN PARALLEL !!!
# for me, mpirun -np n python3 where n=[1,2,3] works fine, but n=[4,5] hangs and n=6 segfaults
print(d.assemble(u0*dx_)) # 4*1 = 4
I’ve tried tracking down this issue in assembling.py and the specific part that causes it to break is the actual assembly of the dolfin-wrapped form (i.e. assembler.assemble(tensor, dolfin_form)
). I plotted the coloring based on rank and the partitioning seems normal, which leads me to believe I think it is failing when every processor doesn’t have a copy of the function being assembled. Here I show the coloring for n=[3,4] (3 works and n>4 hangs). Notice that in n=3 all three colors are on our sub-surface (the inner surface in between the solid and opaque volumes), and in n=4 there is a color that does not touch the sub-surface (cyan, apologies it is a little difficult to see next to the green), and subsequently the assembly fails.
n=3
n=4
Any help would be really appreciated! Being able to solve mixed-dimensional systems in parallel is extremely important for some of our projects :). Thank you so much!