SRUN instead of MPIRUN for Legacy fenics

Howdy Folks,

I have a legacy fenics code that runs properly when using mpirun -n $SLURM_NTASKS python main.py.

However, on the HPC I predominantly use, we have 16 - 32 CPUs per Node. Additionally, the high-throughput cases I am running with my fenics script (assume it is called main.py) it is more efficient to run on say 4 or 8 cores rather than perhaps all of the available cores on the node, thus I am trying to speed up my analysis as well as use the HPC most efficiently.

However, when I have tried to follow various sites, including this one, as well as suggestions from places like ChatGPT, Stack Overflow, Stack Exchange, etc., I have developed a SLURM Submit script that looks like this.

#SBATCH -A 11111
#SBATCH -N 1
'Other #SBATCH Commands'

for i in {1..10}
do
   srun -c 4 -n 4 --exclusive ./run.sh $i > output.log &
done
wait

The run.sh script looks like

#!/bin/bash

cd $i
mpirun -np 4 python main.py

However, what is typically happening is either A) the job will just hang and nothing will be calculated or B) only first directory will have the analysis completed.

Which brings me to my question(s).

  1. Firstly, is srun compatible with legacy fenics, because when I would use srun -c 2 python main.py it appears I only activate one core within the fenics script.

  2. Has anyone had success doing a similar process to what I am trying to accomplish here, i.e. trying to run multiple fenics jobs simultaneously on the same node(s)?

Appreciate any help.

P.S. I have talked with my HPC Admin team, and they don’t have anything setup on the server that would restrict running multiple parallel jobs simultaneously.

Sorry, that’s very specific of your HPC system. If your HPC admin can’t help, it’s unlikely we can.