Fenics & singularity - saving data with mpirun

Hi everyone,

I am very new to singularity. I am trying to build a singularity container to run FEniCS on HPC clusters. This generally works. But I have problems writing the data when running with mpirun -n <NUMBER_OF_RANKS>.

The singularity recipe I wrote is just as:

Bootstrap: docker
From: quay.io/fenicsproject/stable:current
%post
    apt-get -y update
    apt-get -y install python3 python3-pip
    python3 -m pip install --force-reinstall numpy 

I also tried to build a singularity image from https://github.com/bhaveshshrimali/singularityFEniCS/blob/master/fenics.recipe:
sudo singularity build fenics.sif fenics.recipe

This works just fine:
singularity exec fenics.sif python3 demo_cahn-hilliard.py
This produces following error:
mpirun -n 4 singularity exec fenics.sif python3 demo_cahn-hilliard.py

HDF5-DIAG: Error detected in HDF5 (1.10.4) MPI-process 0:
  #000: ../../../src/H5F.c line 444 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1364 in H5F__create(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: ../../../src/H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: ../../../src/H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: ../../../src/H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
HDF5-DIAG: Error detected in HDF5 (1.10.4) MPI-process 0:
  #000: ../../../src/H5F.c line 444 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1364 in H5F__create(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: ../../../src/H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: ../../../src/H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: ../../../src/H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed
HDF5-DIAG: Error detected in HDF5 (1.10.4) MPI-process 0:
  #000: ../../../src/H5F.c line 444 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5Fint.c line 1364 in H5F__create(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: ../../../src/H5Fint.c line 1615 in H5F_open(): unable to lock the file
    major: File accessibilty
    minor: Unable to open file
  #003: ../../../src/H5FD.c line 1640 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #004: ../../../src/H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
    major: File accessibilty
    minor: Bad file ID accessed

Following is a failing example - adapted from the demo_cahn-hilliard.py from FEniCS. Excluding the writing of the data part this demo works with mpirun:

import random
from dolfin import *

# Class representing the intial conditions
class InitialConditions(UserExpression):
    def __init__(self, **kwargs):
        random.seed(2 + MPI.rank(MPI.comm_world))
        super().__init__(**kwargs)
    def eval(self, values, x):
        values[0] = 0.63 + 0.02*(0.5 - random.random())
        values[1] = 0.0
    def value_shape(self):
        return (2,)

# Class for interfacing with the Newton solver
class CahnHilliardEquation(NonlinearProblem):
    def __init__(self, a, L):
        NonlinearProblem.__init__(self)
        self.L = L
        self.a = a
    def F(self, b, x):
        assemble(self.L, tensor=b)
    def J(self, A, x):
        assemble(self.a, tensor=A)

# Model parameters
lmbda  = 1.0e-02  # surface parameter
dt     = 5.0e-06  # time step
theta  = 0.5      # time stepping family, e.g. theta=1 -> backward Euler, theta=0.5 -> Crank-Nicolson

# Form compiler options
parameters["form_compiler"]["optimize"]     = True
parameters["form_compiler"]["cpp_optimize"] = True

# Create mesh and build function space
mesh = UnitSquareMesh.create(96, 96, CellType.Type.quadrilateral)
P1 = FiniteElement("Lagrange", mesh.ufl_cell(), 1)
ME = FunctionSpace(mesh, P1*P1)

# Define trial and test functions
du    = TrialFunction(ME)
q, v  = TestFunctions(ME)

# Define functions
u   = Function(ME)  # current solution
u0  = Function(ME)  # solution from previous converged step

# Split mixed functions
dc, dmu = split(du)
c,  mu  = split(u)
c0, mu0 = split(u0)

# Create intial conditions and interpolate
u_init = InitialConditions(degree=1)
u.interpolate(u_init)
u0.interpolate(u_init)

# Compute the chemical potential df/dc
c = variable(c)
f    = 100*c**2*(1-c)**2
dfdc = diff(f, c)

# mu_(n+theta)
mu_mid = (1.0-theta)*mu0 + theta*mu

# Weak statement of the equations
L0 = c*q*dx - c0*q*dx + dt*dot(grad(mu_mid), grad(q))*dx
L1 = mu*v*dx - dfdc*v*dx - lmbda*dot(grad(c), grad(v))*dx
L = L0 + L1

# Compute directional derivative about u in the direction of du (Jacobian)
a = derivative(L, u, du)

# Create nonlinear problem and Newton solver
problem = CahnHilliardEquation(a, L)
solver = NewtonSolver()
solver.parameters["linear_solver"] = "lu"
solver.parameters["convergence_criterion"] = "incremental"
solver.parameters["relative_tolerance"] = 1e-6

# Output file
file = File("output/output.pvd", "compressed")
HDF5 = HDF5File(MPI.comm_world,"output/output.hdf5",'w')
if MPI.rank(MPI.comm_world) == 0:
    vtkfile_phi = File("output/output.pvd", "compressed")

# Step in time
t = 0.0
T = 5*dt
while (t < T):
    t += dt
    u0.vector()[:] = u.vector()
    solver.solve(problem, u.vector())
    file << (u.split()[0], t)
    HDF5.write(u.split()[0], "fun",t)
    if MPI.rank(MPI.comm_world) == 0:
        file << (u.split()[0], t)

Does anyone know what I should do?
Thanks

Hi,
I cannot reproduce this in a docker container (built from Docker Hub) off of which the above singularity image is also built. It runs fine.

I will run this on singularity too just in case, but my guess is that it won’t be different. Do you have write access to the folder you are trying to write this file to?

Also, the recommended way to save files when writing in parallel with multiple processes is to use the xdmf format and the corresponding XDMFFile class, namely

file = XDMFFile(comm, "trial.xdmf")
file.parameters["flush_output"] = True
file.parameters["functions_share_mesh"] = True
file.parameters["rewrite_function_mesh"] = False

The above should also allow you to write multiple functions to the same file for visualization.

The error is most likely due to the fact that the folder “output” does not exist in the singularity container.

1 Like

I will try out if I get the same error within a docker container - in some other test I did not ave problems woth docker, but I have not tried this particular case. For the cluster I am currently using I can not use docker containers.

Yes, I have write access to the folder I am saving my file to - writing works without problems without mpirun.

I tried adding the folder output to the singularity container. Still I get the same error …

Bootstrap: docker
From: quay.io/fenicsproject/stable:current
%files
    output 
%post
    apt-get -y update
    apt-get -y install python3 python3-pip
    python3 -m pip install --force-reinstall numpy

I had no issue running the code after remove your two MPI.rank if tests (and saving for every process), using:

Bootstrap: docker
From: quay.io/fenicsproject/stable:current
%files
    output 
%post
    apt-get -y update
    apt-get -y install python3 python3-pip
    python3 -m pip install --force-reinstall numpy
mkdir -p output
sudo singularity build dolfin.simg dolfin_singularity 
mpirun -n 4 singularity exec dolfin.simg python3 demo.py 

where demo.py is:

import random
from dolfin import *

# Class representing the intial conditions
class InitialConditions(UserExpression):
    def __init__(self, **kwargs):
        random.seed(2 + MPI.rank(MPI.comm_world))
        super().__init__(**kwargs)
    def eval(self, values, x):
        values[0] = 0.63 + 0.02*(0.5 - random.random())
        values[1] = 0.0
    def value_shape(self):
        return (2,)

# Class for interfacing with the Newton solver
class CahnHilliardEquation(NonlinearProblem):
    def __init__(self, a, L):
        NonlinearProblem.__init__(self)
        self.L = L
        self.a = a
    def F(self, b, x):
        assemble(self.L, tensor=b)
    def J(self, A, x):
        assemble(self.a, tensor=A)

# Model parameters
lmbda  = 1.0e-02  # surface parameter
dt     = 5.0e-06  # time step
theta  = 0.5      # time stepping family, e.g. theta=1 -> backward Euler, theta=0.5 -> Crank-Nicolson

# Form compiler options
parameters["form_compiler"]["optimize"]     = True
parameters["form_compiler"]["cpp_optimize"] = True

# Create mesh and build function space
mesh = UnitSquareMesh.create(96, 96, CellType.Type.quadrilateral)
P1 = FiniteElement("Lagrange", mesh.ufl_cell(), 1)
ME = FunctionSpace(mesh, P1*P1)

# Define trial and test functions
du    = TrialFunction(ME)
q, v  = TestFunctions(ME)
# Define functions
u   = Function(ME)  # current solution
u0  = Function(ME)  # solution from previous converged step

# Split mixed functions
dc, dmu = split(du)
c,  mu  = split(u)
c0, mu0 = split(u0)

# Create intial conditions and interpolate
u_init = InitialConditions(degree=1)
u.interpolate(u_init)
u0.interpolate(u_init)

# Compute the chemical potential df/dc
c = variable(c)
f    = 100*c**2*(1-c)**2
dfdc = diff(f, c)

# mu_(n+theta)
mu_mid = (1.0-theta)*mu0 + theta*mu

# Weak statement of the equations
L0 = c*q*dx - c0*q*dx + dt*dot(grad(mu_mid), grad(q))*dx
L1 = mu*v*dx - dfdc*v*dx - lmbda*dot(grad(c), grad(v))*dx
L = L0 + L1

# Compute directional derivative about u in the direction of du (Jacobian)
a = derivative(L, u, du)

# Create nonlinear problem and Newton solver
problem = CahnHilliardEquation(a, L)
solver = NewtonSolver()
solver.parameters["linear_solver"] = "lu"
solver.parameters["convergence_criterion"] = "incremental"
solver.parameters["relative_tolerance"] = 1e-6

# Output file
file = File("output/output.pvd", "compressed")
HDF5 = HDF5File(MPI.comm_world,"output/output.hdf5",'w')
vtkfile_phi = File("output/output.pvd", "compressed")

# Step in time
t = 0.0
T = 5*dt
while (t < T):
    t += dt
    u0.vector()[:] = u.vector()
    solver.solve(problem, u.vector())
    file << (u.split()[0], t)
    HDF5.write(u.split()[0], "fun",t)
    file << (u.split()[0], t)

@dokken I tried that exactly but still got the same error message. I do not understand that. I tried on the cluster and on my own computer.

But I tried running mpirun while I shell in - that worked. So I am using that as a work around:
singularity exec fenics.sif mpirun -n 4 python3 demo_cahn-hilliard.py

I tried to get it working, but I still have the same error. As soon as I start more than one process the access to the HDF5File is not possible - even creating the file results in an error. Going through different forum, I found the suggestion that setting following evironment values might help, but I did not work:
OMP_NUM_THREADS=1
HDF5_USE_FILE_LOCKING=FALSE
MKL_NUM_THREADS=1
NUMEXPR_NUM_THREADS=1
If anyone has any suggestions I would be very happy.

EDIT:

mpirun -n 4 singularity exec dolfin.simg python3 demo.py

is not working. mpirun inside the singularoty container works. But mpirun inside the container does not work for the job summit on the Cluster.

Unless I am missing something obvious, it is likely to be an issue specific to the cluster where you are trying to run this. I am assuming you are using a job scheduler like PBS/SLURM to submit jobs on your cluster and that you are loading the appropriate modules (an MPI module in this case). Because

mpiexec -n 8 singularity exec <>.simg python3 demo.py

should work out of the box. And given that something like

singularity exec <>.simg python3 demo.py

works fine, you may want to look at the MPI installation on your cluster. You may also want to take a look at Singularity and MPI applications — Singularity container 3.3 documentation alongside seeking help from someone at your cluster’s help desk.

Thanks for your answer.
I had the problem when running mpi outside of the singularity container on both, my own computer and on the cluster. On my own computer adding mpi support inside the container helped to solve the problem. Just as ahown in: Singularity and MPI applications — Singularity container 3.3 documentation.
As for the cluster. Yes, I am using a job scheduler like PBS/SLURM. An simple installation of mpi did not work out but I am in contact with the cluster’s help desk.

1 Like