Multi node job does not run in parallel

kmaya19 · May 2, 2025, 9:10pm

I am trying to run my job on multiple nodes. but when i look at the output, it seems like the job is not running in parallel.

this is my job submission script

#!/bin/bash
#SBATCH --job-name=gather
#SBATCH -N 4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --exclusive
#SBATCH --switches=1
#SBATCH --time=14-00:00:00
#SBATCH --partition=normal

module load python-3.9.6-gcc-8.4.1-2yf35k6

echo "Job $SLURM_JOB_ID running on SLURM NODELIST: $SLURM_NODELIST"

mpirun -n 4 singularity exec /project/user/fenics/fenics.simg python3 ./run_test.py

I am getting output like

Job 30456205 running on SLURM NODELIST: rome[067-070]
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Rank 0: 8128311 vertices (local)
Solving linear variational problem.
Solving linear variational problem.
Solving linear variational problem.
Solving linear variational problem.

It seems like it is not distributing the mesh across the nodes. If I use singularity exec -e /project/user/fenics/fenics.simg mpirun -n 4 python3 ./run_test.py > output_test it now seems like it is distributing the mesh, but now the problem is the job runs using only one node. Other nodes, although showing assigned, remain idle.

Topic		Replies	Views
Error when submitting sbatch script on hpc General	3	182	July 12, 2024
SRUN instead of MPIRUN for Legacy fenics General	2	24	October 8, 2024
Slurm run using Singularity with FEniCS and MFront General	0	407	October 10, 2022
FEniCS using only one node on HPC machine. How do I use more than one?	4	1825	April 21, 2021
How can I run my code in parallel? General mpi	1	78	June 18, 2024

Multi node job does not run in parallel

Related topics