Hello,
I’m trying to run my code on a cluster (Piz Daint at CSCS) using a docker container. In order to use multiple nodes I would need to use a docker image with a version of mpi which is ABI compatible. At the moment I’m using a docker image based on docker:latest and I’m having troubles.
They suggested me to use MPICH v3.1.4. I wanted to ask if the latest dolfinx version is still compatible with this version of mpich? And, if yes, where can I find information for building a docker image with MPICH v3.1.4 and latest version of dolfinx?
Thanks
dokken
August 7, 2023, 4:15pm
2
Hi dokken, thanks for the reply!
I tried to compile the test-env image with the following command
docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=haswell" --build-arg MPICH_VERSION=3.4.3 --tag dolfinx_dev_env_mpich_v3.4.3_for_pizdaint .
but I get an error when compiling adios:
…
…
15.29 [82/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/ip_config.c.o
15.69 [83/352] Building C object thirdparty/ffs/ffs/CMakeFiles/ffs.dir/cod.tab.c.o
15.77 [84/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/cmfabric.c.o
15.79 [85/352] Linking C shared library lib/libadios2_ffs.so.2.0.0
15.82 [86/352] Creating library symlink lib/libadios2_ffs.so.2 lib/libadios2_ffs.so
15.83 [87/352] Linking C shared module thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so
15.83 FAILED: thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so
15.83 : && /usr/bin/cc -fPIC -w -Wall -O3 -DNDEBUG -shared -o thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/cmfabric.c.o thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/ip_config.c.o -Wl,-rpath,/tmp/build-dir/lib: lib/libadios2_atl.so.2.2.1 -lfabric && :
15.83 /usr/bin/ld: cannot find -lfabric: No such file or directory
15.83 collect2: error: ld returned 1 exit status
15.90 [88/352] Generating cm_interface.c, revp.c, revpath.h
15.90 done
16.02 [89/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmzplenet.dir/cmzplenet.c.o
16.43 [90/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/core/IOHDF5.cpp.o
16.64 [91/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/engine/hdf5/HDF5WriterP.cpp.o
21.96 [92/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/engine/hdf5/HDF5ReaderP.cpp.o
21.96 ninja: build stopped: subcommand failed.
------
Dockerfile.test-env:182
--------------------
181 | # Install ADIOS2 (Python interface in /usr/local/lib), same as GMSH
182 | >>> RUN wget -nc --quiet https://github.com/ornladios/ADIOS2/archive/v${ADIOS2_VERSION}.tar.gz -O adios2-v${ADIOS2_VERSION}.tar.gz &&
183 | >>> mkdir -p adios2-v${ADIOS2_VERSION} &&
184 | >>> tar -xf adios2-v${ADIOS2_VERSION}.tar.gz -C adios2-v${ADIOS2_VERSION} --strip-components 1 &&
185 | >>> cmake -G Ninja -DADIOS2_USE_HDF5=on -DCMAKE_INSTALL_PYTHONDIR=/usr/local/lib/ -DADIOS2_USE_Fortran=off -DBUILD_TESTING=off -DADIOS2_BUILD_EXAMPLES=off -DADIOS2_USE_ZeroMQ=off -B build-dir -S ./adios2-v${ADIOS2_VERSION} &&
186 | >>> cmake --build build-dir &&
187 | >>> cmake --install build-dir &&
188 | >>> rm -rf /tmp/*
189 |
--------------------
ERROR: failed to solve: process “/bin/sh -c wget -nc --quiet https://github.com/ornladios/ADIOS2/archive/v${ADIOS2_VERSION}.tar.gz -O adios2-v${ADIOS2_VERSION}.tar.gz && mkdir -p adios2-v${ADIOS2_VERSION} && tar -xf adios2-v${ADIOS2_VERSION}.tar.gz -C adios2-v${ADIOS2_VERSION} --strip-components 1 && cmake -G Ninja -DADIOS2_USE_HDF5=on -DCMAKE_INSTALL_PYTHONDIR=/usr/local/lib/ -DADIOS2_USE_Fortran=off -DBUILD_TESTING=off -DADIOS2_BUILD_EXAMPLES=off -DADIOS2_USE_ZeroMQ=off -B build-dir -S ./adios2-v${ADIOS2_VERSION} && cmake --build build-dir && cmake --install build-dir && rm -rf /tmp/*” did not complete successfully: exit code: 1
you can also see the full output of the terminal here if needed.
do you know hot to solve this? should I maybe change the version of adios?
Thanks
dokken
August 8, 2023, 10:04am
4
Does it get passed this point if you dont specify the mpich version?
yes, I tried to compile with
docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=native" --tag dolfinx_dev_env_mpich_v4.1.2_for_pizdaint .
and everything goes well. While with
docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=native" --build-arg MPICH_VERSION=3.4.3 --tag dolfinx_dev_env_mpich_v3.4.3_for_mac .
it fails again. So the problem isn’t -march=haswell
but the mpi version
dokken
August 8, 2023, 10:23am
6
Try removing adios2 (it is an optional dependency of dolfinx).
Hi dokken,
unfortunately I cannot remove adios since I use it. However, I could fix the problem by installing these libraries in the docker test-env image:
libpsm2-dev, libpsm-infinipath1-dev, librdmacm-dev, libfabric-dev, libibverbs-dev.
Then I compile the end-user image and finally build another image where I install adios4dolfinx. Everything works well on my mac. However, when I try to run something in parallel on the cluster I get some errors and the code crashes. Do you recognize these errors?
Thanks
garth
August 16, 2023, 9:42am
8
I would strongly recommend using Spack for installing on HPC systems.
Yeah good idea! I managed to install via spack
Thanks!