MPICH version compatibility

Hello,
I’m trying to run my code on a cluster (Piz Daint at CSCS) using a docker container. In order to use multiple nodes I would need to use a docker image with a version of mpi which is ABI compatible. At the moment I’m using a docker image based on docker:latest and I’m having troubles.
They suggested me to use MPICH v3.1.4. I wanted to ask if the latest dolfinx version is still compatible with this version of mpich? And, if yes, where can I find information for building a docker image with MPICH v3.1.4 and latest version of dolfinx?

Thanks :slight_smile:

You would have to build the test image of dolfinx with changing the arg input to the docker file https://github.com/FEniCS/dolfinx/blob/main/docker/Dockerfile.test-env#L27
Then base the end user image on the built image,
https://github.com/FEniCS/dolfinx/tree/main/docker#dockerfileend-user
https://github.com/FEniCS/dolfinx/blob/main/docker/Dockerfile.end-user#L43

I would really recommend using mpich 4, as its been out for 1.5 years: MPICH 4.0 released | MPICH

Hi dokken, thanks for the reply!
I tried to compile the test-env image with the following command

docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=haswell" --build-arg MPICH_VERSION=3.4.3 --tag dolfinx_dev_env_mpich_v3.4.3_for_pizdaint .

but I get an error when compiling adios:



15.29 [82/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/ip_config.c.o
15.69 [83/352] Building C object thirdparty/ffs/ffs/CMakeFiles/ffs.dir/cod.tab.c.o
15.77 [84/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/cmfabric.c.o
15.79 [85/352] Linking C shared library lib/libadios2_ffs.so.2.0.0
15.82 [86/352] Creating library symlink lib/libadios2_ffs.so.2 lib/libadios2_ffs.so
15.83 [87/352] Linking C shared module thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so
15.83 FAILED: thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so
15.83 : && /usr/bin/cc -fPIC -w -Wall -O3 -DNDEBUG -shared -o thirdparty/EVPath/EVPath/lib/libadios2_cmfabric.so thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/cmfabric.c.o thirdparty/EVPath/EVPath/CMakeFiles/cmfabric.dir/ip_config.c.o -Wl,-rpath,/tmp/build-dir/lib: lib/libadios2_atl.so.2.2.1 -lfabric && :
15.83 /usr/bin/ld: cannot find -lfabric: No such file or directory
15.83 collect2: error: ld returned 1 exit status
15.90 [88/352] Generating cm_interface.c, revp.c, revpath.h
15.90 done
16.02 [89/352] Building C object thirdparty/EVPath/EVPath/CMakeFiles/cmzplenet.dir/cmzplenet.c.o
16.43 [90/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/core/IOHDF5.cpp.o
16.64 [91/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/engine/hdf5/HDF5WriterP.cpp.o
21.96 [92/352] Building CXX object source/adios2/CMakeFiles/adios2_hdf5.dir/engine/hdf5/HDF5ReaderP.cpp.o
21.96 ninja: build stopped: subcommand failed.
------
Dockerfile.test-env:182
--------------------
181 | # Install ADIOS2 (Python interface in /usr/local/lib), same as GMSH
182 | >>> RUN wget -nc --quiet https://github.com/ornladios/ADIOS2/archive/v${ADIOS2_VERSION}.tar.gz -O adios2-v${ADIOS2_VERSION}.tar.gz &&
183 | >>> mkdir -p adios2-v${ADIOS2_VERSION} &&
184 | >>> tar -xf adios2-v${ADIOS2_VERSION}.tar.gz -C adios2-v${ADIOS2_VERSION} --strip-components 1 &&
185 | >>> cmake -G Ninja -DADIOS2_USE_HDF5=on -DCMAKE_INSTALL_PYTHONDIR=/usr/local/lib/ -DADIOS2_USE_Fortran=off -DBUILD_TESTING=off -DADIOS2_BUILD_EXAMPLES=off -DADIOS2_USE_ZeroMQ=off -B build-dir -S ./adios2-v${ADIOS2_VERSION} &&
186 | >>> cmake --build build-dir &&
187 | >>> cmake --install build-dir &&
188 | >>> rm -rf /tmp/*
189 |
--------------------
ERROR: failed to solve: process “/bin/sh -c wget -nc --quiet https://github.com/ornladios/ADIOS2/archive/v${ADIOS2_VERSION}.tar.gz -O adios2-v${ADIOS2_VERSION}.tar.gz && mkdir -p adios2-v${ADIOS2_VERSION} && tar -xf adios2-v${ADIOS2_VERSION}.tar.gz -C adios2-v${ADIOS2_VERSION} --strip-components 1 && cmake -G Ninja -DADIOS2_USE_HDF5=on -DCMAKE_INSTALL_PYTHONDIR=/usr/local/lib/ -DADIOS2_USE_Fortran=off -DBUILD_TESTING=off -DADIOS2_BUILD_EXAMPLES=off -DADIOS2_USE_ZeroMQ=off -B build-dir -S ./adios2-v${ADIOS2_VERSION} && cmake --build build-dir && cmake --install build-dir && rm -rf /tmp/*” did not complete successfully: exit code: 1

you can also see the full output of the terminal here if needed.

do you know hot to solve this? should I maybe change the version of adios?
Thanks :slight_smile:

Does it get passed this point if you dont specify the mpich version?

yes, I tried to compile with

docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=native" --tag dolfinx_dev_env_mpich_v4.1.2_for_pizdaint .

and everything goes well. While with

docker build --target dev-env --file dolfinx/docker/Dockerfile.test-env --build-arg PETSC_SLEPC_OPTFLAGS="-O2 -march=native" --build-arg MPICH_VERSION=3.4.3 --tag dolfinx_dev_env_mpich_v3.4.3_for_mac .

it fails again. So the problem isn’t -march=haswell but the mpi version

Try removing adios2 (it is an optional dependency of dolfinx).

Hi dokken,
unfortunately I cannot remove adios since I use it. However, I could fix the problem by installing these libraries in the docker test-env image:
libpsm2-dev, libpsm-infinipath1-dev, librdmacm-dev, libfabric-dev, libibverbs-dev.

Then I compile the end-user image and finally build another image where I install adios4dolfinx. Everything works well on my mac. However, when I try to run something in parallel on the cluster I get some errors and the code crashes. Do you recognize these errors?

Thanks :slight_smile:

I would strongly recommend using Spack for installing on HPC systems.

Yeah good idea! I managed to install via spack :slight_smile:
Thanks!