I am running a parallel MPI simulation, where each rank loops over its own local list of boundary facets and performs a geometric computation per facet (including a ConvexHull).
I observe that once one rank finishes processing all of its local boundary facets, the other ranks seem to stop making progress. The remaining ranks still have facets to process, but their execution becomes much slower after the first rank finishes. I would appreciate it if you could help me or give me some related resources that can help me in this regard.
[rank 0] starting facet 1415 (117/163), ndofs=6
[rank 3] starting facet 1502 (117/120), ndofs=6
[rank 2] starting facet 1553 (117/119), ndofs=9
[rank 1] starting facet 1492 (117/120), ndofs=8
[rank 3] starting facet 1505 (118/120), ndofs=6
[rank 0] starting facet 1419 (118/163), ndofs=6
[rank 2] starting facet 1556 (118/119), ndofs=12
[rank 1] starting facet 1493 (118/120), ndofs=13
[rank 0] starting facet 1422 (119/163), ndofs=6
[rank 3] starting facet 1508 (119/120), ndofs=6
[rank 2] starting facet 1574 (119/119), ndofs=8
[rank 1] starting facet 1500 (119/120), ndofs=13
[rank 2] finished ALL facets, now leaving facet loop
[rank 2] leaving areatotal loop, areatotal=0.09688432202898835
areatotal 2
[rank 2] before scatter_forward