Dolfinx petsc error for large mesh

leshinka · June 9, 2023, 4:56pm

I am solving a problem in fenicsx where I get the error below when I use large mesh.

Traceback (most recent call last):
  File "/storage/home/coda1/0/ssb/transport.py", line 100, in <module>
    model = fem.petsc.LinearProblem(a, L, bcs=[left_bc, right_bc], petsc_options=options)
  File "/storage/home/hcoda1/0/molel3/.conda/envs/fenicsx-env/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 562, in __init__
    self._A = create_matrix(self._a)
  File "/storage/home/hcoda1/0/molel3/.conda/envs/fenicsx-env/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 128, in create_matrix
    return _cpp.fem.petsc.create_matrix(a)
ValueError: vector::reserve

I do not obtain the error when I rerun the same problem and with same geometry but coarser mesh. The mesh that causes the error is at least 1.5GB.

I appreciate any tips to resolve this.

fluid · June 26, 2023, 11:41am

I have the same issue when solving Poisson problem with ‘cg’ and ‘ilu’. Any updates on this?
Thanks

dokken · June 26, 2023, 12:23pm

Without a reproducible example, it is not easy to give any guidance as to what goes wrong in your or @leshinka’s code. One possibility is to use PETSc with 64 bit integers (but hard to say without an example).

leshinka · June 26, 2023, 12:41pm

@dokken Thank you for the tip. Let me try that. Unfortunately the mesh was too big to conveniently attach.

dokken · June 26, 2023, 1:45pm

Add it to a Google Drive or similar service.
How many cells are there in your mesh?

I would guess that you need to compile PETSc with 64 bit indices.

fluid · June 26, 2023, 4:37pm

I’m using Petsc installed via anaconda and seems to be a 64 build.
| linux-64/petsc-3.17.4-real_h4502189_101.tar.bz2
Do I have to compile it instead?

dokken · June 26, 2023, 5:03pm

I don’t think conda support 64 bit indices, as it has to be configured in the build script as —with-64bit-indices which I cannot find in

github.com

conda-forge/petsc-feedstock/blob/main/recipe/build.sh

#!/bin/bash
set -ex

# Get an updated config.sub and config.guess
cp $BUILD_PREFIX/share/gnuconfig/config.* .

export PETSC_DIR=$SRC_DIR
export PETSC_ARCH=arch-conda-c-opt

unset F90
unset F77
# unset CC
unset CXX
if [[ "$target_platform" == linux-* ]]; then
    export LDFLAGS="-pthread -fopenmp $LDFLAGS"
    export LDFLAGS="$LDFLAGS -Wl,-rpath-link,$PREFIX/lib"
    # --as-needed appears to cause problems with fortran compiler detection
    # due to missing libquadmath
    # unclear why required libs are stripped but still linked
    export FFLAGS="${FFLAGS:-} -Wl,--no-as-needed"

This file has been truncated. show original

Ref (With-64bit-indices · Search · GitLab,
With-64-bit-indices · Search · GitLab)

Maybe @minrk can comment on this?

leshinka · June 26, 2023, 5:29pm

My initial error was for a conda environment. I have a separate spack installation as well that uses 64-bit integer, and I obtain the similar error, only much earlier in the code while loading the mesh:
I uploaded the mesh files here: Sign in to your account

Traceback (most recent call last):
  File "/storage/coda1/p/0/shared/leshinka/ssb/transport.py", line 65, in <module>
    domain = infile3.read_mesh(cpp.mesh.GhostMode.none, 'Grid')
  File "/storage/coda1/p/0/shared/leshinka/spack/var/spack/environments/fenicsx-env/.spack-env/view/lib/python3.10/site-packages/dolfinx/io/utils.py", line 167, in read_mesh
    mesh = _cpp.mesh.create_mesh(self.comm(), _cpp.graph.AdjacencyList_int64(cells),
ValueError: vector::reserve

Thus to reproduce this later version:

from dolfinx import cpp, io
from mpi4py import MPI

comm = MPI.COMM_WORLD
with io.XDMFFile(comm, "tetr.xdmf", "r") as infile3:
    domain = infile3.read_mesh(cpp.mesh.GhostMode.none, 'Grid')
    ct = infile3.read_meshtags(domain, name="Grid")

dokken · June 26, 2023, 5:41pm

How many processes are you running this code on?
As your mesh has 270 million cells (and 43 million nodes) you would need to use quite a lot of processes ( I do not have a system available that can run this atm). I would expect using between 30 and 80 processes is needed for a sensible performance.

minrk · June 29, 2023, 8:43am

I don’t have the answer for the 64-bit indices, but I asked.

fluid · July 17, 2023, 1:34pm

Unfortunately, using PETSc with 64 bit integers didn’t solve the problem. Any updates from you guys?
Thanks

leshinka · July 17, 2023, 2:54pm

None from my end. I am gonna try to adaptively refine my mesh to make sure I have just the amount of resolution I need.

fluid · July 17, 2023, 3:46pm

I’m already doing that. I refine mesh locally. Having a threshold on number of mesh elements is a serious limitation.

dokken · July 18, 2023, 8:04pm

DOLFINx has been executed with over 200 billion cells with reasonable scaling, so I cant really comment any further on this issue, as you haven’t replied to my previous questions:

One would also need more information about the amount of memory on your system.

leshinka · July 18, 2023, 8:43pm

In my case I was running with allocation of one node and 384GB of RAM and 1.6TB of storage.

dokken · July 18, 2023, 8:48pm

How many processes did you use with mpirun?

leshinka · July 18, 2023, 8:54pm

I did not use mpirun. I rather used sbatch for a batch slurm job as part of a slurm cluster; with the necessary allocations of memory (384GB) and nodes (1).

dokken · July 18, 2023, 9:02pm

You should still use mpirun even if you use slurm/sbatch. Otherwise you are not exploiting any of the parallelism of DOLFINx.

I dont have an sbatch script at hand (on my phone). But there are online examples: Slurm MPI examples | www.hpc2n.umu.se

leshinka · July 18, 2023, 9:07pm

Perhaps that makes the difference. Our HPC vendors discourage use of mpirun or mpiexec infavor of srun/sbatch without giving much reasons why (I did not get this error while using mpirun/mpiexec for bigger meshes of a different problem).

dokken · July 18, 2023, 9:14pm

Most finite element libraries leverage MPI for the parallelism.
We do scaling tests for DOLFINx on a nightly basis with

with weak scaling results at https://fenics.github.io/performance-test-results/

As i mentioned above you can still use srun/sbatch with MPI, one does not exclude the other.

Topic		Replies	Views
Cannot run Fenics code with fine mesh (memory issue?) General	19	136	December 9, 2024
Cannot access a non-const vector for huge mesh problem Linear Algebra	5	1186	April 10, 2020
Dolfinx crashes with a bit computation load; How? dolfinx mesh , dolfinx , mpi	10	226	March 3, 2024
Residual error for big mess (\|\|r\|\| = 0.000000e+00) mesh	4	478	April 29, 2020
PETSc Krylov Solver RuntimeError with fine mesh I/O	21	5625	October 28, 2022

Dolfinx petsc error for large mesh

Related topics