I am running a piece of fenics code on a cluster that has it’s own mpi installations. When I run a command like mpirun -n 4 python fenics_code.py
I inevitably run many of the functions 4 times, even though I specify these parts within a piece of code such as if(rank ==0):
On looking closer, I found that rank always returns 0. Can someone help me diagnose the problem, and correct it. I find the solution in [Why do I get comm.rank == 0 everytime when using mpirun?] is not relevant to me because I am not using (at least explicitly) docker.
Minimal example:
from fenics import *
mesh = UnitSquareMesh(10,10)
comm = mesh.mpi_comm()
print('comm.size =', comm.size)
print('comm.rank=', comm.rank)
The above code when launched using slurm (sbatch) returns
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0
comm.size = 1
comm.rank= 0
which is wrong. When I run this locally on my system, it gives me the correct expected output.
As this works for you locally (it also works for me locally using docker), it is clearly something with the installation on the cluster.
Are you sure dolfin has been installed correctly on the cluster?
Thank you for your responses. I do use #SBATCH -N 1 #Nodes #SBATCH -n 4 #numTasks
My first test was to run which mpirun in the slurm jobscript. It was pointing to the correct installation of mpi installed with the fenics environment on anaconda. Nonetheless, the error I received was Error: node list format not recognized. Try using '-hosts=<hostnames>’. The solution involved using