Can't run Navier-Stokes solution in parallel

Thanks @volkerk.
I implemented the modification you suggested, and that did not help the situation. I still kept it, because yours is the right way to do that operation. So thanks for that.

After much testing, it turns out that the problem is caused before reaching that part of the code. It happens before the loop even starts, in the following lines:

# Calculate dt as per CFL (with CFLmax = 0.5)
deltaX = mesh.hmin() # minimum mesh size
dt = 0.5*deltaX**2/mu
dt = 1E-2*dt # Just for testing
# number of time steps to reach target final time
num_steps = int(T/dt) + 1    # number of time steps

The problem is that the value of hmin may be different in each process, because the value of mesh.hmin() reported by each process is different (or may be different) depending on how the mesh is partitioned among the processes. I verified that these values are different by using a simple print statement following that calculation.

As a result, the value of num_steps can be different in each process, depending on the target final time T and how that and hmin affect the final results of the truncating operation int(T/dt) + 1. So I found myself in a situation in which 3 of the 4 processes I was using had a value of num_steps of 8, while one process had a value of 9. I believe this caused a hang/deadlock in interprocess communication, e.g. the process running an additional step (Process 2 in my case) waiting for a send/receive to complete before exiting.

At least that is my interpretation of what was happening. If anyone else has a different or better explanation, I would like to learn it. Otherwise, this should be marked as solved.

Thanks to anyone who read these posts and gave it a thought. Apologies for wasting your time on what should have been an easy-to-spot bug in the code.

1 Like