MPI rank always returns 0

abelthayil · September 24, 2020, 5:14pm

Hello,
I am running a piece of fenics code on a cluster that has it’s own mpi installations. When I run a command like
mpirun -n 4 python fenics_code.py
I inevitably run many of the functions 4 times, even though I specify these parts within a piece of code such as
if(rank ==0):

On looking closer, I found that rank always returns 0. Can someone help me diagnose the problem, and correct it. I find the solution in [Why do I get comm.rank == 0 everytime when using mpirun?] is not relevant to me because I am not using (at least explicitly) docker.
Thanks!

EDIT:
Minimal example:

from fenics import *
mesh = UnitSquareMesh(10,10)
comm = mesh.mpi_comm()
print('comm.size =', comm.size)
print('comm.rank=', comm.rank)

The above code when launched using slurm (sbatch) returns
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0

which is wrong. When I run this locally on my system, it gives me the correct expected output.

dokken · September 24, 2020, 5:34pm

Please supply a minimal working example, such that others can try to reproduce your error and help you.

abelthayil · September 25, 2020, 3:46pm

I’ve put it up in the original question. Hope it helps.

dokken · September 25, 2020, 4:38pm

As this works for you locally (it also works for me locally using docker), it is clearly something with the installation on the cluster.
Are you sure dolfin has been installed correctly on the cluster?

jjw · September 30, 2020, 5:54pm

Can you post your slurm job script. Do you request:
#SBATCH -N 1 #Nodes
#SBATCH -n 4 #numTasks

?

abelthayil · October 5, 2020, 11:58am

Thank you for your responses. I do use
#SBATCH -N 1 #Nodes
#SBATCH -n 4 #numTasks

My first test was to run which mpirun in the slurm jobscript. It was pointing to the correct installation of mpi installed with the fenics environment on anaconda. Nonetheless, the error I received was Error: node list format not recognized. Try using '-hosts=<hostnames>’. The solution involved using

mpirun -n 4 -hosts $SLURM_JOB_NODELIST mpi_test.py

Does this improve my performance on actual computations? Not really but that’s a whole other question, I think.

Topic		Replies	Views
Why do I get comm.rank == 0 everytime when using mpirun?	4	1394	November 4, 2019
Getting MPI rank as 0 when running in parallel locally installation	1	663	October 2, 2020
MPI in fenics 2019.1.0 docker container	2	1550	June 9, 2019
MPI FEniCS Hangs on File output for Rank 0 General	1	428	December 8, 2022
MPI communicator in Fenics 2018.1 mesh	5	3506	September 9, 2019

MPI rank always returns 0

Related topics