The programme is slower in HPC computer

I’ve been running simulations with a very fine mesh, resulting in a large number of degrees of freedom (approximately 2048 × 2048). On an HPC supercomputer, utilizing multiple cores accelerates the program compared to using a single core. However, on my laptop, using multiple cores often leads to slower performance, likely due to limited storage capacity.

The issue arises when I use a single core on the HPC system: it’s significantly slower than a single core on my laptop. Consequently, even with multi-core usage, the HPC system doesn’t outperform my laptop, diminishing its advantage.

Is there anything specific I should consider when running DOLFINx codes on an HPC system, such as particular compiler optimizations or configurations? Has anyone experienced similar issues or have suggestions to improve performance? Thank you in advance.

Additional Information:

  • HPC System: DOLFINx installed using Spack.​
  • Laptop: DOLFINx installed using Conda.​

Without any timing breakdown of your code,ie time different parts of the code like:

  • reading in meshes and mesh data
  • Assembling the system matrices/vectors
  • Solving the (non)linear problem
  • Outputting data

Additionally, what solver options are you using, and what compilation flags did you use with spack? The docs suggest -O3 (spack add fenics-dolfinx+adios2 py-fenics-dolfinx cflags="-O3" fflags="-O3")

Also setting OMP_NUM_THREADS=1 might help.

I actually used exactly the same codes in both devices, so I would expect that there should be no difference(maybe I’m wrong). But I will check the compilation flags. Thanks!

The point is that by finding out what part is slower on the HPC system, we can give you more exact guidance as to what might be wrong/not working as expected.

Alongside this, check that you have an optimised implementation of the BLAS library installed. It would be strange if the HPC system isn’t already using optimised BLAS, but if it did happen to be using generic libblas then you would expect it to run more slowly.

I attempted to add checkpoints to monitor each step and identify the source of the problem; however, it appears that almost every step is slower.

Previously, I encountered an issue with installing DOLFINx using Spack:

​I followed your advice to uninstall PETSc and reinstall an older version, which resolved the previous issue. However, could this change be contributing to the current performance slowdown? Given that my code extensively utilizes PETSc matrices and vectors, I’m concerned about potential impacts.

Without having your code it is very hard to give guidance.
If PETSc has been compiled and configured with different flags, that could explain a slowdown in:

  • Assembly of matrices and vectors (the insertion of a local tensor into the global tensor)
  • Solving the linear system.

It should not affect:

  • creating or reading in a mesh (except if you use scotch compiled with PETSc for partitioning)
  • Setting up the variational form
  • Outputting.

Could you report the output of:

import petsc4py
petsc4py.init(["-log_view"])

on both your systems.
For instance, on my system this yields:

****************************************************************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

mwe_petsc_info.py on a linux-gnu-real64-32 named dokken-XPS-9320 with 1 process, by Unknown on Thu Mar  6 14:17:15 2025
Using Petsc Release Version 3.22.0, unknown 

                         Max       Max/Min     Avg       Total
Time (sec):           1.230e-02     1.000   1.230e-02
Objects:              0.000e+00     0.000   0.000e+00
Flops:                0.000e+00     0.000   0.000e+00  0.000e+00
Flops/sec:            0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.2291e-02 100.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

------------------------------------------------------------------------------------------------------------------------

Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

========================================================================================================================
Average time to get PetscTime(): 2.06e-08
#PETSc Option Table entries:
-log_view # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: PETSC_ARCH=linux-gnu-real64-32 --COPTFLAGS=-O2 --CXXOPTFLAGS=-O2 --FOPTFLAGS=-O2 --with-64-bit-indices=no --with-debugging=no --with-fortran-bindings=no --with-shared-libraries --download-hypre --download-metis --download-mumps --download-ptscotch --download-scalapack --download-spai --download-suitesparse --download-superlu --download-superlu_dist --with-scalar-type=real --with-precision=double
-----------------------------------------
Libraries compiled on 2024-10-10 19:48:50 on buildkitsandbox 
Machine characteristics: Linux-6.5.0-1025-azure-x86_64-with-glibc2.39
Using PETSc directory: /usr/local/petsc
Using PETSc arch: linux-gnu-real64-32
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O2  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O2    
-----------------------------------------

Using include paths: -I/usr/local/petsc/include -I/usr/local/petsc/linux-gnu-real64-32/include -I/usr/local/petsc/linux-gnu-real64-32/include/suitesparse
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/usr/local/petsc/linux-gnu-real64-32/lib -L/usr/local/petsc/linux-gnu-real64-32/lib -lpetsc -Wl,-rpath,/usr/local/petsc/linux-gnu-real64-32/lib -L/usr/local/petsc/linux-gnu-real64-32/lib -Wl,-rpath,/usr/local/lib -L/usr/local/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/13 -L/usr/lib/gcc/x86_64-linux-gnu/13 -lHYPRE -lspqr -lumfpack -lklu -lcholmod -lamd -lsuperlu_dist -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu -lspai -llapack -lblas -lptesmumps -lptscotchparmetisv3 -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lmetis -lm -lX11 -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -lrt -lquadmath
-----------------------------------------

Sure, the output for the HPC computer is:

****************************************************************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

test.py on a  named r483.uppmax.uu.se with 1 process, by junjiew on Thu Mar  6 16:23:51 2025
Using Petsc Release Version 3.21.5, Aug 30, 2024 

                         Max       Max/Min     Avg       Total
Time (sec):           6.948e-02     1.000   6.948e-02
Objects:              0.000e+00     0.000   0.000e+00
Flops:                0.000e+00     0.000   0.000e+00  0.000e+00
Flops/sec:            0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.9436e-02  99.9%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

------------------------------------------------------------------------------------------------------------------------

Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

========================================================================================================================
Average time to get PetscTime(): 7.34e-08
#PETSc Option Table entries:
-log_view # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/petsc-3.21.5-nglhlmkq4llxdus4xx6p6sx4xcglwlmr --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 --with-make-exec=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/gmake-4.4.1-3l5yebgcxnrjvrr5pibooq2nyke7kcas/bin/make --with-cc=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpicc --with-cxx=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpic++ --with-fc=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-openmp=0 --with-64-bit-indices=0 --with-blaslapack-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openblas-0.3.29-kwz7fqvr666cayjkkxvctrprcxsvrtrt/lib/libopenblas.so --with-x=0 --with-sycl=0 --with-clanguage=C --with-cuda=0 --with-hip=0 --with-metis=1 --with-metis-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/metis-5.1.0-hmgtnncwrhgoksg2lglwyk7l64wwx2lt/include --with-metis-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/metis-5.1.0-hmgtnncwrhgoksg2lglwyk7l64wwx2lt/lib/libmetis.so --with-hypre=1 --with-hypre-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hypre-2.32.0-bch6gz4ivayrak6rsd7i5bhj3kv5c2vx/include --with-hypre-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hypre-2.32.0-bch6gz4ivayrak6rsd7i5bhj3kv5c2vx/lib/libHYPRE.so --with-parmetis=1 --with-parmetis-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/parmetis-4.0.3-rwix5srh25wakwkw7kaem2457sy2zvyy/include --with-parmetis-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/parmetis-4.0.3-rwix5srh25wakwkw7kaem2457sy2zvyy/lib/libparmetis.so --with-kokkos=0 --with-kokkos-kernels=0 --with-superlu_dist=1 --with-superlu_dist-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/superlu-dist-9.1.0-56acdnubdnu4iu4jd4xlod4h7ojjtw65/include --with-superlu_dist-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/superlu-dist-9.1.0-56acdnubdnu4iu4jd4xlod4h7ojjtw65/lib/libsuperlu_dist.so --with-ptscotch=0 --with-suitesparse=0 --with-hdf5=1 --with-hdf5-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hdf5-1.14.5-maj7n7is5dpjxv7oevpjdxef7qgk7bdb/include --with-hdf5-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hdf5-1.14.5-maj7n7is5dpjxv7oevpjdxef7qgk7bdb/lib/libhdf5.so --with-zlib=1 --with-zlib-include=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/zlib-ng-2.2.3-evl6ilxed7yf754yh5v2sw32auqvwxol/include --with-zlib-lib=/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/zlib-ng-2.2.3-evl6ilxed7yf754yh5v2sw32auqvwxol/lib/libz.so --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-hwloc=0 --with-libjpeg=0 --with-scalapack=0 --with-strumpack=0 --with-mmg=0 --with-parmmg=0 --with-tetgen=0 --with-zoltan=0
-----------------------------------------
Libraries compiled on 2025-02-10 19:48:39 on rackham2.uppmax.uu.se 
Machine characteristics: Linux-3.10.0-1160.119.1.el7.x86_64-x86_64-with-glibc2.17
Using PETSc directory: /crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/petsc-3.21.5-nglhlmkq4llxdus4xx6p6sx4xcglwlmr
Using PETSc arch: 
-----------------------------------------

Using C compiler: /crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g -O  
Using Fortran compiler: /crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpif90  -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O    
-----------------------------------------

Using include paths: -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/petsc-3.21.5-nglhlmkq4llxdus4xx6p6sx4xcglwlmr/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hypre-2.32.0-bch6gz4ivayrak6rsd7i5bhj3kv5c2vx/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/superlu-dist-9.1.0-56acdnubdnu4iu4jd4xlod4h7ojjtw65/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/parmetis-4.0.3-rwix5srh25wakwkw7kaem2457sy2zvyy/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/metis-5.1.0-hmgtnncwrhgoksg2lglwyk7l64wwx2lt/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hdf5-1.14.5-maj7n7is5dpjxv7oevpjdxef7qgk7bdb/include -I/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/zlib-ng-2.2.3-evl6ilxed7yf754yh5v2sw32auqvwxol/include
-----------------------------------------

Using C linker: /crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpicc
Using Fortran linker: /crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/bin/mpif90
Using libraries: -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/petsc-3.21.5-nglhlmkq4llxdus4xx6p6sx4xcglwlmr/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/petsc-3.21.5-nglhlmkq4llxdus4xx6p6sx4xcglwlmr/lib -lpetsc -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hypre-2.32.0-bch6gz4ivayrak6rsd7i5bhj3kv5c2vx/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hypre-2.32.0-bch6gz4ivayrak6rsd7i5bhj3kv5c2vx/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/superlu-dist-9.1.0-56acdnubdnu4iu4jd4xlod4h7ojjtw65/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/superlu-dist-9.1.0-56acdnubdnu4iu4jd4xlod4h7ojjtw65/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openblas-0.3.29-kwz7fqvr666cayjkkxvctrprcxsvrtrt/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openblas-0.3.29-kwz7fqvr666cayjkkxvctrprcxsvrtrt/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/parmetis-4.0.3-rwix5srh25wakwkw7kaem2457sy2zvyy/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/parmetis-4.0.3-rwix5srh25wakwkw7kaem2457sy2zvyy/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/metis-5.1.0-hmgtnncwrhgoksg2lglwyk7l64wwx2lt/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/metis-5.1.0-hmgtnncwrhgoksg2lglwyk7l64wwx2lt/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hdf5-1.14.5-maj7n7is5dpjxv7oevpjdxef7qgk7bdb/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/hdf5-1.14.5-maj7n7is5dpjxv7oevpjdxef7qgk7bdb/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/zlib-ng-2.2.3-evl6ilxed7yf754yh5v2sw32auqvwxol/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/zlib-ng-2.2.3-evl6ilxed7yf754yh5v2sw32auqvwxol/lib -Wl,-rpath,/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/lib -L/crex/proj/uppstore2017025/spack/opt/spack/linux-centos7-broadwell/gcc-13.2.0/openmpi-5.0.6-jgsbbownkeaqqb7oxpd2mqke7kn3jbmp/lib -Wl,-rpath,/sw/comp/gcc/13.2.0_rackham/lib/gcc/x86_64-pc-linux-gnu/13.2.0 -L/sw/comp/gcc/13.2.0_rackham/lib/gcc/x86_64-pc-linux-gnu/13.2.0 -Wl,-rpath,/sw/comp/gcc/13.2.0_rackham/lib64 -L/sw/comp/gcc/13.2.0_rackham/lib64 -Wl,-rpath,/sw/comp/gcc/13.2.0_rackham/lib -L/sw/comp/gcc/13.2.0_rackham/lib -lHYPRE -lsuperlu_dist -lopenblas -lparmetis -lmetis -lhdf5 -lm -lz -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -lquadmath -ldl

and the output for the laptop is:

****************************************************************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

test.py on a  named Precision-3660 with 1 processor, by junjie Thu Mar  6 16:26:32 2025
Using Petsc Release Version 3.20.5, Feb 27, 2024 

                         Max       Max/Min     Avg       Total
Time (sec):           3.919e-02     1.000   3.919e-02
Objects:              0.000e+00     0.000   0.000e+00
Flops:                0.000e+00     0.000   0.000e+00  0.000e+00
Flops/sec:            0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 3.9182e-02 100.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

------------------------------------------------------------------------------------------------------------------------

Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

========================================================================================================================
Average time to get PetscTime(): 2.87e-08
#PETSc Option Table entries:
-log_view # (source: command line)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: AR=${PREFIX}/bin/x86_64-conda-linux-gnu-ar CC=mpicc CXX=mpicxx FC=mpifort CFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/junjie/anaconda3/envs/fenicsx/include  " CPPFLAGS="-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/junjie/anaconda3/envs/fenicsx/include" CXXFLAGS="-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/junjie/anaconda3/envs/fenicsx/include  " FFLAGS="-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/junjie/anaconda3/envs/fenicsx/include   -Wl,--no-as-needed" LDFLAGS="-pthread -fopenmp -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/junjie/anaconda3/envs/fenicsx/lib -Wl,-rpath-link,/home/junjie/anaconda3/envs/fenicsx/lib -L/home/junjie/anaconda3/envs/fenicsx/lib -Wl,-rpath-link,/home/junjie/anaconda3/envs/fenicsx/lib" LIBS="-lmpifort -lgfortran" --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --with-blas-lib=libblas.so --with-lapack-lib=liblapack.so --with-yaml=1 --with-hdf5=1 --with-fftw=1 --with-hwloc=0 --with-hypre=1 --with-metis=1 --with-mpi=1 --with-mumps=1 --with-parmetis=1 --with-pthread=1 --with-ptscotch=1 --with-shared-libraries --with-ssl=0 --with-scalapack=1 --with-superlu=1 --with-superlu_dist=1 --with-superlu_dist-include=/home/junjie/anaconda3/envs/fenicsx/include/superlu-dist --with-superlu_dist-lib=-lsuperlu_dist --with-suitesparse=1 --with-x=0 --with-scalar-type=real --prefix=/home/junjie/anaconda3/envs/fenicsx
-----------------------------------------
Libraries compiled on 2024-02-27 14:24:58 on 8da39d698ceb 
Machine characteristics: Linux-6.5.0-1015-azure-x86_64-with-glibc2.17
Using PETSc directory: /home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
Using PETSc arch: 
-----------------------------------------

Using C compiler: mpicc -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include -O3 -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include 
Using Fortran compiler: mpifort -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include   -Wl,--no-as-needed -O3    -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include
-----------------------------------------

Using include paths: -I/home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include -I/home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/include/superlu-dist
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpifort
Using libraries: -Wl,-rpath,/home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib -L/home/conda/feedstock_root/build_artifacts/petsc_1709043572817/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/lib -lpetsc -lHYPRE -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu -lsuperlu_dist -lfftw3_mpi -lfftw3 -llapack -lblas -lptesmumps -lptscotchparmetisv3 -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lparmetis -lmetis -lhdf5_hl -lhdf5 -lm -lyaml -lquadmath -ldl -lmpifort -lgfortran
-----------------------------------------

There is definitely different optimization levels at play here. Your laptop has

Which is not present at

1 Like

So does this have something to do with the way I install it? I just followed the instruction about installation of Spack, maybe I have load the wrong compiler?

Could you please show what instructions you followed ?

This one:

Please consider the instructions on the main branch: GitHub - FEniCS/dolfinx: Next generation FEniCS problem solving environment