Thank you for your prompt reply Dr. Dokken. The script does run fine, so the error message (reported below) must be caused by other factors. This is the error message:
Traceback (most recent call last):
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
subprocess.check_call(cmd, env=_inject_macos_ver(env))
File "/usr/lib/python3.12/subprocess.py", line 408, in check_call
retcode = call(*popenargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 389, in call
with Popen(*popenargs, **kwargs) as p:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.12/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/unixccompiler.py", line 200, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/ccompiler.py", line 1045, in spawn
spawn(cmd, dry_run=self.dry_run, **kwargs)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/spawn.py", line 72, in spawn
raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 48, in _build
dist.run_command('build_ext')
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/dist.py", line 950, in run_command
super().run_command(command)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
cmd_obj.run()
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 98, in run
_build_ext.run(self)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 476, in build_extensions
self._build_extensions_serial()
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 502, in _build_extensions_serial
self.build_extension(ext)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 263, in build_extension
_build_ext.build_extension(self, ext)
File "/dolfinx-env/lib/python3.12/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
super(build_ext, self).build_extension(ext)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 557, in build_extension
objects = self.compiler.compile(
^^^^^^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/ccompiler.py", line 606, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/dolfinx-env/lib/python3.12/site-packages/setuptools/_distutils/unixccompiler.py", line 202, in _compile
raise CompileError(msg)
distutils.errors.CompileError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/DOLFINx/script.py", line 164, in <module>
problem = LinearProblem(lhs(F), rhs(F), bcs=bcs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/petsc.py", line 789, in __init__
self._L = _create_form(
^^^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 337, in form
return _create_form(form)
^^^^^^^^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 331, in _create_form
return _form(form)
^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/forms.py", line 254, in _form
ufcx_form, module, code = jit.ffcx_jit(
^^^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/jit.py", line 62, in mpi_jit
return local_jit(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/jit.py", line 212, in ffcx_jit
r = ffcx.codegeneration.jit.compile_forms([ufl_object], options=p_ffcx, **p_jit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 225, in compile_forms
raise e
File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 205, in compile_forms
impl = _compile_objects(
^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/ffcx/codegeneration/jit.py", line 380, in _compile_objects
ffibuilder.compile(tmpdir=cache_dir, verbose=True, debug=cffi_debug)
File "/dolfinx-env/lib/python3.12/site-packages/cffi/api.py", line 727, in compile
return recompile(self, module_name, source, tmpdir=tmpdir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/cffi/recompiler.py", line 1581, in recompile
outputfilename = ffiplatform.compile('.', ext,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 20, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dolfinx-env/lib/python3.12/site-packages/cffi/ffiplatform.py", line 54, in _build
raise VerificationError('%s: %s' % (e.__class__.__name__, e))
cffi.VerificationError: CompileError: command '/cm/shared/apps/spack/0.17.3/gpu/b/opt/spack/linux-rocky8-skylake_avx512/gcc-8.5.0/intel-19.1.3.304-vecir2bnonslbnjniwcxx6n5vfyeg4yf/bin/icc' failed: No such file or directory
Exception ignored in: <function LinearProblem.__del__ at 0x1555440e84a0>
Traceback (most recent call last):
File "/usr/local/dolfinx-real/lib/python3.12/dist-packages/dolfinx/fem/petsc.py", line 831, in __del__
self._solver.destroy()
^^^^^^^^^^^^
AttributeError: 'LinearProblem' object has no attribute '_solver'
As I mentioned, I run the script with singularity on a cluster with slurm. I’ve been trying different things in the login node, and it looks like the error message is related to the MPI modules that I load in order to run the script in parallel.
Let’s assume the script runs fine with either the simple command singularity exec $HOME/images/dolfinx.sif python3 -u script.py
, or in parallel with the command mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py
Then, if I change something in the weak form, I get the error message reported above. For instance, that happens if I change the value of a parameter, say h from h = 10.1
to h = 0.1
, or the order of the terms in the F expression. These little changes do not modify the weak form, so they should not yield any error message. Plus, in the MWE no problem is actually being solved, I’m simply defining a dolfinx.fem.petsc.LinearProblem
.
The way I’ve been able to work around the problem is by unloading/loading modules. For example, I noticed that the following procedure always works:
Step 0. I get the error when I run
$ module load gpu/0.17.3b intel/19.1.3.304/vecir2b intel-mpi/2019.10.317/uwgziob singularitypro/3.11
$ mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py
in the login node.
Step 1. I pass
$ module purge
$ module load singularitypro/3.11
$ singularity exec $HOME/images/dolfinx.sif python3 -u script.py
and the script is executed with no error message.
Step 2. I pass
$ module purge
$ module load slurm gpu/0.17.3b intel/19.1.3.304/vecir2b intel-mpi/2019.10.317/uwgziob singularitypro/3.11
$ mpirun -n 2 singularity exec $HOME/images/dolfinx.sif python3 -u script.py
(similar to Step 0) and the script is executed with no error message.
At that point I’m also able to send a batch job containing a command as in Step 2. Note that the number of processes used does not matter, we could replace mpirun -n 2
with mpirun -n 1
or any other number to get the same behavior.
I haven’t been able to really pinpoint the source of the problem, mainly because of my ignorance in HPC and MPI. Any clue of what is happening?