FEniCS using only one node on HPC machine. How do I use more than one?

mxl · April 15, 2021, 9:59pm

I’m using FEniCS (2019.1.0) in anaconda (2019.10). I’m running my case with the mpirun in the conda environment. It runs fine on my local machine and will run on the cluster that my lab uses (uses SLURM scheduler). I also never ran into the issues described here or here either. The problem is that it runs on one node even when I specify more than one (currently just trying to use two). The reason I know this is that seff [jobid] shows that the job is killed for running out of memory, but I’ve only used half of it.

Is there something special that I need to be doing? I don’t know if maybe I’m not searching for the right thing?

Thanks.

FEniCS application:

from __future__ import print_function

# Environment setup                                                                           
import os

# FEniCS imports                                                                              
from fenics import *
from ufl import nabla_div
from dolfin import *
#from mshr import *                                                                           
import matplotlib.pyplot as plt
import numpy

# Define communicator and rank for this process, in case we are running with MPI              
mpi_comm = MPI.comm_world
mpi_rank = mpi_comm.Get_rank()
                                                                               
set_log_level(INFO)
                                              
# Variables                                                                            
mu = 1          # Lame coefficient                                                            
rho = 1
beta = 1
lambda_ = beta  # Lame coefficient                                                            

tol = 1E-14     # Boundary condition cutoff 

# Read the mesh
mesh = Mesh(mpi_comm)
with XDMFFile(mpi_comm, "mesh.xdmf") as infile:
    infile.read(mesh)
# Define a mesh value collection of dim = 3, for the cell data
# read boundaries
mvc = MeshValueCollection("size_t", mesh, 2)
with XDMFFile("mf.xdmf") as infile:
    infile.read(mvc, "name_to_read")
mf = cpp.mesh.MeshFunctionSizet(mesh, mvc)

# Define function space
V = VectorFunctionSpace(mesh, 'P', 1)

# Define the forcing function
el = V.ufl_element()
f = Expression(("0.", "0.", "500*t*x[0]*x[1]*x[2]"), t=0, element = el)

# Define boundary condition
def clamped_boundary(x, on_boundary):
    return on_boundary and x[2] < tol
bc = DirichletBC(V, Constant((0, 0, 0)), clamped_boundary)

# Define strain and stress
def epsilon(u):
    return 0.5*(nabla_grad(u) + nabla_grad(u).T)

def sigma(u):
    return lambda_*nabla_div(u)*Identity(d) + 2*mu*epsilon(u)

# Define a and L and assemble matrix A
u = TrialFunction(V)
d = u.geometric_dimension()  # space dimension
v = TestFunction(V)
Tr = Constant((0, 0, 0))

a = inner(sigma(u), epsilon(v))*dx
L = dot(f, v)*dx + dot(Tr, v)*ds

print("Assembling matrix.")
A = assemble(a)

# Create XDMF files for visualization output
xdmffile_u = XDMFFile(mpi_comm, 'results/displacement.xdmf')

u = Function(V)

# Time-stepping
print("Starting time loop.")

while t <= T:
    f.t = t             # Update forcing function
    b = assemble(L)     # Update b=f
    bc.apply(A,b)       # Apply bcs to updated b
    # Compute solution
    solve(A, u.vector(), b, "bicgstab")
    # Save solution to file (XDMF/HDF5)
    u.rename("u", "displacement")
    xdmffile_u.write(u, t)
    # Update time trackers
    i +=1
    t +=dt
    if mpi_rank == 0:
       print("We just solved for time t=", t-dt)

SLURM submission script:

#!/bin/bash

#SBATCH -p debug_queue
#SBATCH -N 2 
#SBATCH --ntasks-per-node=44 
#SBATCH --time=0-04:00:00 
#SBATCH --job-name=fenics

module load anaconda/2019.10 

conda init bash 
source ~/.bashrc 
conda activate fenicsproject 

mpirun -n 88 python parallel.py

What seff is returning:

State: FAILED (exit code 137)
Nodes: 2
Cores per node: 44
CPU Utilized: 04:07:45
CPU Efficiency: 81.21% of 05:05:04 core-walltime
Job Wall-clock time: 00:03:28
Memory Utilized: 494.49 GB
Memory Efficiency: 49.16% of 1005.84 GB

Edits: a link and seff box

mxl · April 18, 2021, 7:15pm

I notice in the download section, it says of the anaconda build (emphasis mine):

Update. 2017.2.0 release on conda-forge features MUMPS direct solver, but lacks SuperLU_dist and MPI-enabled HDF5.

I’m starting to understand that this also holds for the current version. Likewise, I ran it on one cpu and didn’t run into the memory issue at all.

kamensky · April 18, 2021, 11:12pm

If the cluster you’re using supports Singularity, I’d recommend trying that instead of the Anaconda build. Singularity is basically an HPC version of Docker, and you can automatically convert the FEniCS Docker images to Singularity images. This is the main way I’ve used FEniCS on clusters, including large multi-node computations that wouldn’t fit on a single node.

mxl · April 19, 2021, 2:42pm

Oh, that’s what singularity is! Our cluster has a module. I’ll go try that right now and report back.

Edit: We have singularity 3.6.0. I’ve got a .sif file running from importing the docker file:
singularity build fenics.sif docker://quay.io/fenicsproject/stable:latest
and it’s running in serial with the command
singularity exec -B $PWD fenics.sif python3 demo_poisson.py
where demo_poisson.py is the usual poisson demo in the working directory. However, I’m still having issues with mpirun very similar to what is described here (and par_test.py is the script from that thread as well):

mpirun -n 4 singularity exec -B $PWD fenics.sif python3 par_test.py 
rank / size:  0 / 1
rank / size:  0 / 1
rank / size:  0 / 1
rank / size:  0 / 1

I’ve tried a few other variations to no avail, but I think this has probably veered from the inital topic enough already.

mxl · April 21, 2021, 10:11pm

Got it running with singularity (3.6.0). First made an image straight from Docker:
singularity build fenics.sif docker://quay.io/fenicsproject/stable:latest
Then ran it with an mpich module. This was the cause of the error above. I’ve always used openmpi and didn’t even think that may have been the issue until running into these posts (1,2). The resulting batch script, assuming the file is in the working directory:

#!/bin/bash

#SBATCH -p debug_queue
#SBATCH -N 2
#SBATCH --ntasks-per-node=44
#SBATCH --time=0-00:10:00 
#SBATCH --job-name=fenics_mpi_test

echo "Loading mpich and singularity"
module load mvapich2_2.3a/gcc_4.8.5
module load singularity/3.6.0

echo "Starting fenics simulation."
mpirun -n 88 singularity exec -B $PWD fenics.sif python3 par_test.py

Thanks for the suggestion to use singularity; it’s working out so far!

Topic		Replies	Views
SRUN instead of MPIRUN for Legacy fenics General	2	27	October 8, 2024
Multi-node calculation with mpi on cluster dolfinx	9	141	July 2, 2025
Mpirun failure for fenics 2019.1.0 installed from conda forge installation	1	644	September 8, 2021
Fenics & singularity - saving data with mpirun installation	9	2203	February 15, 2021
FEniCS + MPI on docker inefficient?	12	2572	September 12, 2020

FEniCS using only one node on HPC machine. How do I use more than one?

Related topics