I’m using FEniCS (2019.1.0) in anaconda (2019.10). I’m running my case with the mpirun in the conda environment. It runs fine on my local machine and will run on the cluster that my lab uses (uses SLURM scheduler). I also never ran into the issues described here or here either. The problem is that it runs on one node even when I specify more than one (currently just trying to use two). The reason I know this is that seff [jobid]
shows that the job is killed for running out of memory, but I’ve only used half of it.
Is there something special that I need to be doing? I don’t know if maybe I’m not searching for the right thing?
Thanks.
FEniCS application:
from __future__ import print_function
# Environment setup
import os
# FEniCS imports
from fenics import *
from ufl import nabla_div
from dolfin import *
#from mshr import *
import matplotlib.pyplot as plt
import numpy
# Define communicator and rank for this process, in case we are running with MPI
mpi_comm = MPI.comm_world
mpi_rank = mpi_comm.Get_rank()
set_log_level(INFO)
# Variables
mu = 1 # Lame coefficient
rho = 1
beta = 1
lambda_ = beta # Lame coefficient
tol = 1E-14 # Boundary condition cutoff
# Read the mesh
mesh = Mesh(mpi_comm)
with XDMFFile(mpi_comm, "mesh.xdmf") as infile:
infile.read(mesh)
# Define a mesh value collection of dim = 3, for the cell data
# read boundaries
mvc = MeshValueCollection("size_t", mesh, 2)
with XDMFFile("mf.xdmf") as infile:
infile.read(mvc, "name_to_read")
mf = cpp.mesh.MeshFunctionSizet(mesh, mvc)
# Define function space
V = VectorFunctionSpace(mesh, 'P', 1)
# Define the forcing function
el = V.ufl_element()
f = Expression(("0.", "0.", "500*t*x[0]*x[1]*x[2]"), t=0, element = el)
# Define boundary condition
def clamped_boundary(x, on_boundary):
return on_boundary and x[2] < tol
bc = DirichletBC(V, Constant((0, 0, 0)), clamped_boundary)
# Define strain and stress
def epsilon(u):
return 0.5*(nabla_grad(u) + nabla_grad(u).T)
def sigma(u):
return lambda_*nabla_div(u)*Identity(d) + 2*mu*epsilon(u)
# Define a and L and assemble matrix A
u = TrialFunction(V)
d = u.geometric_dimension() # space dimension
v = TestFunction(V)
Tr = Constant((0, 0, 0))
a = inner(sigma(u), epsilon(v))*dx
L = dot(f, v)*dx + dot(Tr, v)*ds
print("Assembling matrix.")
A = assemble(a)
# Create XDMF files for visualization output
xdmffile_u = XDMFFile(mpi_comm, 'results/displacement.xdmf')
u = Function(V)
# Time-stepping
print("Starting time loop.")
while t <= T:
f.t = t # Update forcing function
b = assemble(L) # Update b=f
bc.apply(A,b) # Apply bcs to updated b
# Compute solution
solve(A, u.vector(), b, "bicgstab")
# Save solution to file (XDMF/HDF5)
u.rename("u", "displacement")
xdmffile_u.write(u, t)
# Update time trackers
i +=1
t +=dt
if mpi_rank == 0:
print("We just solved for time t=", t-dt)
SLURM submission script:
#!/bin/bash
#SBATCH -p debug_queue
#SBATCH -N 2
#SBATCH --ntasks-per-node=44
#SBATCH --time=0-04:00:00
#SBATCH --job-name=fenics
module load anaconda/2019.10
conda init bash
source ~/.bashrc
conda activate fenicsproject
mpirun -n 88 python parallel.py
What seff
is returning:
State: FAILED (exit code 137)
Nodes: 2
Cores per node: 44
CPU Utilized: 04:07:45
CPU Efficiency: 81.21% of 05:05:04 core-walltime
Job Wall-clock time: 00:03:28
Memory Utilized: 494.49 GB
Memory Efficiency: 49.16% of 1005.84 GB
Edits: a link and seff box