I am trying to run the demo code of incompressible Navier-Stokes equations from Bitbucket in parallel with MPI.
However, according to the timing summary, there is no improvement in computational time with increasing number of processors. It seems like that the code is executed several times individually. The timing summary is listed in the following figure
The FEniCS is installed on Docker, and it is the latest stable version. The total number of cores in my computer is 8. Did I misuse the MPI command? Or should I add some extra codes to parallelize the demo?
Could you please guide me to some resources or tips which would help me solve this problem?
The reason for the code not speeding up is that the problem is very small (1000 DOFS in velocity space and 100 in the pressure space). Running code in parallell is useful for large problems, as the mesh is partitioned and distributed over more processes. For small problems such as this, the communication will take as much time as the speed-up of the partitioning.
This can be illustrated by refining the mesh:
for i in range(2):
mesh = refine(mesh)
which will yield the following output:
fenics@3d1c51f37d8c:/root/shared/navier-stokes$ time sudo mpirun -n 1 python3 demo_navier-stokes.py
15170 1937
real 0m21.718s
user 1m20.987s
sys 4m13.590s
fenics@3d1c51f37d8c:/root/shared/navier-stokes$ time sudo mpirun -n 2 python3 demo_navier-stokes.py
real 0m10.502s
user 0m20.699s
sys 0m1.968s
fenics@3d1c51f37d8c:/root/shared/navier-stokes$ time sudo mpirun -n 4 python3 demo_navier-stokes.py
real 0m7.383s
user 0m28.660s
sys 0m2.509s
As you can observe here, going from one to two processes gives you a significant speedup. However, as we go from 2 to 4 processes, we see that the runtime is decreased, but not halfed, as the number of dofs on each process decreases.