Error in fenics installation on a Singularity container

I am trying to build a Singularity container with FEniCS installed, specifically for deploying on a HPC cluster here. Although there are existing docker images already available, I wanted to configure the container with several other programs that I need in addition to FEniCS.

The complete recipe file (equivalent of a Dockerfile) is here

I am running the same commands as I would on an Ubuntu:18.04 system (I have already installed and tested FEniCS using these and it is working fine):

    add-apt-repository ppa:fenics-packages/fenics
    apt-get -y update
    apt-get -y install --no-install-recommends fenics

but when I try to import DOLFIN I get the following error:

The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:

  plm_rsh_agent: ssh : rsh

Please either unset the parameter, or check that the path is correct
--------------------------------------------------------------------------
[golubh1.campuscluster.illinois.edu:45463] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582
[golubh1.campuscluster.illinois.edu:45463] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value Unable to start a daemon on the local node (-127) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Unable to start a daemon on the local node" (-127) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

Any pointers to what could be potentially going wrong and possible fixes?

Fixed by directly importing one of the Docker images here

2 Likes

The link you provided no longer works.

Thanks for pointing it out. Singularity hub has been archived. You can follow the build recipe from there, and modify to pull Dolfin instead of dolfinx. something along these lines should work…

Bootstrap: docker
From: bhaveshshrimali/dolfin_superlu

%post
    apt-get -y update
    apt-get -y install software-properties-common ffmpeg curl wget build-essential
    pip3 install --force-reinstall matplotlib pandas pyamg ffmpeg-python openpyxl vedo
    ldconfig
%runscript
    exec /bin/bash -i

Or if you don’t need the additional packages, simply pulling dolfin’s Docker image should also work

singularity pull --name fenics.simg docker://quay.io/fenicsproject/dev:latest
2 Likes

I followed the command (singularity pull --name fenics.simg docker://quay.io/fenicsproject/dev:latest) to install fenics on hpc cluster. But when I tried to run a test, I got the following error message:

ERROR: could not import mpi4py!
Traceback (most recent call last):
File “heat_class.py”, line 2, in
from fenics import *
File “/usr/local/lib/python3.6/dist-packages/fenics/init.py”, line 7, in
from dolfin import *
File “/usr/local/lib/python3.6/dist-packages/dolfin/init.py”, line 144, in
from .fem.assembling import (assemble, assemble_system, assemble_multimesh, assemble_mixed,
File “/usr/local/lib/python3.6/dist-packages/dolfin/fem/assembling.py”, line 34, in
from dolfin.fem.form import Form
File “/usr/local/lib/python3.6/dist-packages/dolfin/fem/form.py”, line 12, in
from dolfin.jit.jit import dolfin_pc, ffc_jit
File “/usr/local/lib/python3.6/dist-packages/dolfin/jit/jit.py”, line 121, in
def compile_class(cpp_data, mpi_comm=MPI.comm_world):
RuntimeError: Error when importing mpi4py

Do you have any suggestion on how to fix this?
Thanks!

The image on quay.io is a bit outdated, e.g. with a old python version.

You may want to use one the following instead:

I haven’t tested them with singularity, by I expect them to work

1 Like

Thanks for the quick response! I tried the following command:
singularity pull --name fenics.simg docker://ghcr.io/scientificcomputing/fenics

I got the following error:
FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://ghcr.io/scientificcomputing/fenics: Error reading manifest latest in ghcr.io/scientificcomputing/fenics: manifest unknown.

Then if I tried the second one, with singularity pull --name fenics.simg docker://numericalpdes/base_images:fenics. The error message says:

FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: Error initializing source oci:/home/dli292/.singularity/cache/blob:02e639b5e8e21bafdbdf6684b83ed8f924f2e32d1445eb1e77105d0771a4649b: Error writing blob: write /home/dli292/.singularity/cache/blob/oci-put-blob469916469: disk quota exceeded.

Could you try to use: ghcr.io/scientificcomputing/fenics:2023-11-15?

1 Like

FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://ghcr.io/scientificcomputing/fenics: Error reading manifest latest in ghcr.io/scientificcomputing/fenics: manifest unknown.

Maybe docker:// is expecting to pull the image from docker hub, rather than github container registry? Have a look into the singularity documentation if that is the case, and how to change the container registry

FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: Error initializing source oci:/home/dli292/.singularity/cache/blob:02e639b5e8e21bafdbdf6684b83ed8f924f2e32d1445eb1e77105d0771a4649b: Error writing blob: write /home/dli292/.singularity/cache/blob/oci-put-blob469916469: disk quota exceeded.

That is clearly unrelated. You have run out of disk space in your home. Either clean up old attempts in the singularity cache, or ask your admin for more space.

2 Likes

Thanks for replying. Yes, this indeed fixed the manifest unknown issue. When I tried to run a test, I still got the error “could not import mpi4py!”.

Thank you for the response. I tried what Dokken suggested and that fixed the first issue, but still got the original issue regarding mpi4py when I ran a test. Then I tried the second image from numericalpdes/base_images:fenics, it will get rid of the mpi4py issue. When I tried to run a test, now the issue looks like:

I am afraid I will not be able to help very much further, because I don’t know how singularity works and I have never used it. Still, looking at the picture it seems to me that mpi is getting loaded from the existing environment /mnt/gpfs3_amd/... rather than the docker image.

1 Like

Thank you so much. You already helped a lot! I will reach out to the hpc support center for further help.

Hi,
I last checked and the image bhaveshshrimali/dolfin_superlu was working. Could you check by pulling it?

singularity pull --name fenics.simg docker://bhaveshshrimali/dolfin_superlu:latest ?

1 Like

Yes, it works now! Thank you so much!

Hi,
Do you have the image of dolfinx? Thanks.

For DOLFINx, you should try the standard images:

I’m running into another issue using singularity containers. Running the commands:

singularity pull dolfinx.sif docker://dolfinx/dolfinx:stable
singularity shell dolfinx.sif
ipython3
from dolfinx import mesh as dmesh, fem
from mpi4py import MPI
mesh = dmesh.create_unit_cube(MPI.COMM_WORLD, 2, 2, 2)
V = fem.FunctionSpace(mesh, (‘CG’, 1))

I get the error:

VerificationError: CompileError: command ‘icc’ failed: No such file or directory

It’s unclear to me how I fix the error. Any help is appreciated. Thanks.

icc is the Intel C compiler. The error is saying that it needs you to have that compiler installed in your singularity environment.

The singularity container set the environment variable $CC to icc. The command “export CC=gcc” fixed the problem. Thanks.