Reaction-diffusion equation: error when calling dolfinx.fem.petsc.LinearProblem

Hi all,

I’m trying to solve a 2D reaction-diffusion equation using a DOLFINx singularity container with the image pulled from docker://dolfinx/dolfinx:stable. What I get is a long error message from the line where I call dolfinx.fem.petsc.LinearProblem. My guess is there must be a mistake in the way I define the weak form.

After time discretization, the (simplified) problem for the unknown \theta has the following weak form:

~~~~~~\int_D\left[\tfrac{1}{h} (\theta-\theta_0) \,u+ \nabla\theta\cdot\nabla u-g\,u\right] \mathrm{d}A=0~~~\forall u,

where g is a known expression of the functions \theta_0,\alpha_0, namely,

~~~~~~g = \exp(-\frac{1}{\theta_0+273.15})(1-\alpha_0)\alpha_0.

The functions \theta_0,\alpha_0 are available from the previous time step. Zero Neumann boundary conditions have been assumed on the unit square D.

Here is my MWE:

from mpi4py import MPI
import dolfinx, dolfinx.fem.petsc, basix, ufl

mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 96, 96, dolfinx.mesh.CellType.triangle)
dx = ufl.Measure("dx", mesh)
P1 = basix.ufl.element("Lagrange", mesh.basix_cell(), 1)
V = dolfinx.fem.functionspace(mesh, P1)
theta = ufl.TrialFunction(V)
u  = ufl.TestFunction(V)
theta0, alpha0 = dolfinx.fem.Function(V), dolfinx.fem.Function(V)

g = ufl.exp(-1.0/(theta0+273.15))*alpha0*(1-alpha0)
h = 10.1
bcs = []
F = ( (theta-theta0)/h*u + ufl.dot(ufl.grad(theta),ufl.grad(u)) - g*u) * dx 
problem = dolfinx.fem.petsc.LinearProblem(ufl.lhs(F), ufl.rhs(F), bcs=bcs, petsc_options={"ksp_type": "preonly", "pc_type": "lu", "pc_factor_mat_solver_type": "mumps"})
print('Done!')

Any idea of what I’m missing?

Thanks a lot for your help!

I can’t reproduce any error-message with your code.
Please provide the full error-message you are getting.

Thank you for your prompt reply Dr. Dokken. The script does run fine, so the error message (reported below) must be caused by other factors. This is the error message:

Traceback (most recent call last):
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
    subprocess.check_call(cmd, env=_inject_macos_ver(env))
  File "/usr/lib/python3.12/subprocess.py", line 408, in check_call
    retcode = call(*popenargs, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 389, in call
    with Popen(*popenargs, **kwargs) as p:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.12/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/unixccompiler.py", line 200, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/ccompiler.py", line 1045, in spawn
    spawn(cmd, dry_run=self.dry_run, **kwargs)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/spawn.py", line 72, in spawn
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 48, in _build
    dist.run_command('build_ext')
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/dist.py", line 950, in run_command
    super().run_command(command)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
    cmd_obj.run()
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 98, in run
    _build_ext.run(self)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
    self.build_extensions()
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 476, in build_extensions
    self._build_extensions_serial()
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 502, in _build_extensions_serial
    self.build_extension(ext)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 263, in build_extension
    _build_ext.build_extension(self, ext)
  File "/dolfinx-env/lib/python3.12/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
    super(build_ext, self).build_extension(ext)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 557, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/ccompiler.py", line 606, in compile
    self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
  File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/unixccompiler.py", line 202, in _compile
    raise CompileError(msg)
distutils.errors.CompileError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/DOLFINx/script.py", line 164, in <module>
    problem = LinearProblem(lhs(F), rhs(F), bcs=bcs,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/petsc.py", line 789, in __init__
    self._L = _create_form(
              ^^^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 337, in form
    return _create_form(form)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 331, in _create_form
    return _form(form)
           ^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 254, in _form
    ufcx_form, module, code = jit.ffcx_jit(
                              ^^^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/jit.py", line 62, in mpi_jit
    return local_jit(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/jit.py", line 212, in ffcx_jit
    r = ffcx.codegeneration.jit.compile_forms([ufl_object], options=p_ffcx, **p_jit)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 225, in compile_forms
    raise e
  File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 205, in compile_forms
    impl = _compile_objects(
           ^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 380, in _compile_objects
    ffibuilder.compile(tmpdir=cache_dir, verbose=True, debug=cffi_debug)
  File "/dolfinx-env/lib/python3.12/site-packages/cffi/api.py", line 727, in compile
    return recompile(self, module_name, source, tmpdir=tmpdir,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/cffi/recompiler.py", line 1581, in recompile
    outputfilename = ffiplatform.compile('.', ext,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 20, in compile
    outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 54, in _build
    raise VerificationError('%s: %s' % (e.__class__.__name__, e))
cffi.VerificationError: CompileError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory
Exception ignored in: <function LinearProblem.__del__ at 0x1555440e84a0>
Traceback (most recent call last):
  File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/petsc.py", line 831, in __del__
    self._solver.destroy()
    ^^^^^^^^^^^^
AttributeError: 'LinearProblem' object has no attribute '_solver'

As I mentioned, I run the script with singularity on a cluster with slurm. I’ve been trying different things in the login node, and it looks like the error message is related to the MPI modules that I load in order to run the script in parallel.

Let’s assume the script runs fine with either the simple command singularity exec $HOME/images/dolfinx.sif python3 -u script.py, or in parallel with the command mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py Then, if I change something in the weak form, I get the error message reported above. For instance, that happens if I change the value of a parameter, say h from h = 10.1 to h = 0.1, or the order of the terms in the F expression. These little changes do not modify the weak form, so they should not yield any error message. Plus, in the MWE no problem is actually being solved, I’m simply defining a dolfinx.fem.petsc.LinearProblem.

The way I’ve been able to work around the problem is by unloading/loading modules. For example, I noticed that the following procedure always works:

Step 0. I get the error when I run
$ module load gpu/0.17.3b intel/19.1.3.304/vecir2b intel-mpi/2019.10.317/uwgziob singularitypro/3.11
$ mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py
in the login node.

Step 1. I pass
$ module purge
$ module load singularitypro/3.11
$ singularity exec $HOME/images/dolfinx.sif python3 -u script.py
and the script is executed with no error message.

Step 2. I pass
$ module purge
$ module load slurm gpu/0.17.3b intel/19.1.3.304/vecir2b intel-mpi/2019.10.317/uwgziob singularitypro/3.11
$ mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py
(similar to Step 0) and the script is executed with no error message.

At that point I’m also able to send a batch job containing a command as in Step 2. Note that the number of processes used does not matter, we could replace mpirun -n 2 with mpirun -n 1 or any other number to get the same behavior.

I haven’t been able to really pinpoint the source of the problem, mainly because of my ignorance in HPC and MPI. Any clue of what is happening?

This seems like a compiler mismatch issue when generating code for new variational forms.

This seems to point me to the dolfinx-env within the docker/singularity container.

This for some reason looks for icc within a spack installation.

Mixing spack, docker/singularity might make it very hard t run your code.

As you have seen with your tests, you require the correct installation of intel-mpi and compilers.
In general, one should try to use the same gcc/mpi implementation on the HPC system as that used in the docker/singularity image.
The docker images uses mpich, ref:

mpirun --version
HYDRA build details:
    Version:                                 4.2.2
    Release Date:                            Wed Jul  3 09:16:22 AM CDT 2024
    CC:                              gcc      
    Configure options:                       '--disable-option-checking' '--prefix=NONE' '--with-hwloc=embedded' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=' 'CPPFLAGS= -DNETMOD_INLINE=__netmod_inline_ofi__ -I/tmp/mpich-4.2.2/src/mpl/include -I/tmp/mpich-4.2.2/modules/json-c -I/tmp/mpich-4.2.2/modules/hwloc/include -D_REENTRANT -I/tmp/mpich-4.2.2/src/mpi/romio/include -I/tmp/mpich-4.2.2/src/pmi/include -I/tmp/mpich-4.2.2/modules/yaksa/src/frontend/include -I/tmp/mpich-4.2.2/modules/libfabric/include'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

which you should then also use on your HPC system.