I am trying to run my job on multiple nodes. but when i look at the output, it seems like the job is not running in parallel.
this is my job submission script
#!/bin/bash
#SBATCH --job-name=gather
#SBATCH -N 4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --exclusive
#SBATCH --switches=1
#SBATCH --time=14-00:00:00
#SBATCH --partition=normal
module load python-3.9.6-gcc-8.4.1-2yf35k6
echo "Job $SLURM_JOB_ID running on SLURM NODELIST: $SLURM_NODELIST"
mpirun -n 4 singularity exec /project/user/fenics/fenics.simg python3 ./run_test.py
I am getting output like
Job 30456205 running on SLURM NODELIST: rome[067-070]
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Solving linear variational problem.
Solving linear variational problem.
Solving linear variational problem.
Solving linear variational problem.
It seems like it is not distributing the mesh across the nodes. If I use singularity exec -e /project/user/fenics/fenics.simg mpirun -n 4 python3 ./run_test.py > output_test
it now seems like it is distributing the mesh, but now the problem is the job runs using only one node. Other nodes, although showing assigned, remain idle.