Function evaluation hangs in parrallel (sometimes)

Hi all,

I’m experiencing an issue when trying to evaluate a function in parrallel.
Now I know it has been treated in previous tickets (Problem with evaluation at a point in parallel - #6 by MiroK , MPI - Gather and distribute a function to all processes (That I can evaluate at a point) - #3 by timwong …), but the thing I’m trying to understand here is why the following MWE works if we mark all the facets of the mesh and on only a subdomain.

mwe.py

import fenics as f


mesh = f.UnitSquareMesh(10, 10)
surface_markers = f.MeshFunction("size_t", mesh, 1, 0)  # initialises all facets at zero

boundary_tag = 1

boundary = f.CompiledSubDomain("on_boundary && near(x[0], 0, tol)", tol=1e-14)
# boundary = f.CompiledSubDomain("on_boundary")  # works fine with this!

boundary.mark(surface_markers, boundary_tag)


V = f.FunctionSpace(mesh, "CG", 1)

u = f.Function(V)
v = f.TestFunction(V)

T = f.interpolate(f.Constant(1), V)


F = f.inner(f.grad(u), f.grad(v))*f.dx

class CustomExpression(f.UserExpression):
    def __init__(self, function, **kwargs):
        super().__init__(kwargs)
        self.function = function

    def eval(self, value, x):
        value[0] = 2*self.function(x)  # this line hangs in parrallel 

    def value_shape(self):
        return ()

expr = CustomExpression(T)
bc = f.DirichletBC(V, expr, surface_markers, boundary_tag)

f.solve(F==0, u, bcs=bc)

To reproduce the issue, run mpirun -np 2 python3 mwe.py.

I wanted to know if there was a workaround.The function given in Problem with evaluation at a point in parallel - #6 by MiroK doesn’t seem to work.

Thanks!

1 Like

Very interesting indeed! Add the following line

mesh.bounding_box_tree()

anywhere before the first evaluation of expr, that means before

f.solve(F==0, u, bcs=bc)

in your case, and you will be fine.

For developers

The bug can be reproduced with the following minimal code

import dolfin as df
mymesh = df.UnitSquareMesh(10,10)
V = df.FunctionSpace(mymesh, "CG", 2)
u = df.Function(V)

#mymesh.bounding_box_tree() # hangs with mpi without this line

for xy in mymesh.coordinates():
    if abs(xy[0]) < 1e-12: # works fine without this line
        u(xy)

If you run this code with mpirun -np 2 python3 filename.py, the evaluation hangs forever. If I interrupt the process with Ctrl+c, I get the following message:

^C[warn] Epoll ADD(4) on fd 33 failed.  Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor

Notice that the if statement is essential for reproducing the error! Depending on the condition, the script either hangs or not. Namel, it hangs with any analogy to near(x[0], 0.0), but if I replace the line

    if abs(xy[0]) < 1e-12: 

with the following analogy to on_boundary:

    if min(abs(xy[0]),abs(xy[0]-1),abs(xy[1]),abs(xy[1]-1))<1e-12:

the code executes just fine. Perhaps it has something to do with the timings between different processes? Nevertheless, calling mesh.bounding_box_tree() solves the problem.

This usually happens if you are assembling something in parallel where one of the processes has no information (that being a cell, facet etc).

In your case, this is because a process end up having no coordinates satisfying your condition

and thus it hangs. This is because to do eval at a point, one needs to find what cells has the point inside (or at its boundary), and then the mesh boundingboxes has to be computed (which uses MPI alltoall).

1 Like

Thanks for the explanation.
So is calling

´´´
mesh.bounding_box_tree()
´´´

Enough to workaround this issue?

In most situations yes.

1 Like

Awesome thanks.

Is there any reason why this is not the default behaviour? Is computing the bounding_box_tree particularily expensive?

Will this be a bottleneck for large mesh?

Thanks @Lukas-Babor for finding this fix!

Alot of legacy dolfin wasnt designed with parallel computing in mind, it has been patched in over time, resulting in issues such as these.

Things such as these are one of the main motivators for dolfinx, as it has a clearer and more explicit input-output strategy, making it clear when mpi communication is required.

2 Likes