Dolfinx petsc error for large mesh

I am solving a problem in fenicsx where I get the error below when I use large mesh.

Traceback (most recent call last):
  File "/storage/home/coda1/0/ssb/transport.py", line 100, in <module>
    model = fem.petsc.LinearProblem(a, L, bcs=[left_bc, right_bc], petsc_options=options)
  File "/storage/home/hcoda1/0/molel3/.conda/envs/fenicsx-env/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 562, in __init__
    self._A = create_matrix(self._a)
  File "/storage/home/hcoda1/0/molel3/.conda/envs/fenicsx-env/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 128, in create_matrix
    return _cpp.fem.petsc.create_matrix(a)
ValueError: vector::reserve

I do not obtain the error when I rerun the same problem and with same geometry but coarser mesh. The mesh that causes the error is at least 1.5GB.

I appreciate any tips to resolve this.

1 Like

I have the same issue when solving Poisson problem with ‘cg’ and ‘ilu’. Any updates on this?
Thanks

Without a reproducible example, it is not easy to give any guidance as to what goes wrong in your or @leshinka’s code. One possibility is to use PETSc with 64 bit integers (but hard to say without an example).

1 Like

@dokken Thank you for the tip. Let me try that. Unfortunately the mesh was too big to conveniently attach.

Add it to a Google Drive or similar service.
How many cells are there in your mesh?

I would guess that you need to compile PETSc with 64 bit indices.

I’m using Petsc installed via anaconda and seems to be a 64 build.
| linux-64/petsc-3.17.4-real_h4502189_101.tar.bz2
Do I have to compile it instead?

I don’t think conda support 64 bit indices, as it has to be configured in the build script as —with-64bit-indices which I cannot find in

Ref (With-64bit-indices · Search · GitLab,
With-64-bit-indices · Search · GitLab)

Maybe @minrk can comment on this?:slight_smile:

My initial error was for a conda environment. I have a separate spack installation as well that uses 64-bit integer, and I obtain the similar error, only much earlier in the code while loading the mesh:
I uploaded the mesh files here: Sign in to your account

Traceback (most recent call last):
  File "/storage/coda1/p/0/shared/leshinka/ssb/transport.py", line 65, in <module>
    domain = infile3.read_mesh(cpp.mesh.GhostMode.none, 'Grid')
  File "/storage/coda1/p/0/shared/leshinka/spack/var/spack/environments/fenicsx-env/.spack-env/view/lib/python3.10/site-packages/dolfinx/io/utils.py", line 167, in read_mesh
    mesh = _cpp.mesh.create_mesh(self.comm(), _cpp.graph.AdjacencyList_int64(cells),
ValueError: vector::reserve

Thus to reproduce this later version:

from dolfinx import cpp, io
from mpi4py import MPI

comm = MPI.COMM_WORLD
with io.XDMFFile(comm, "tetr.xdmf", "r") as infile3:
    domain = infile3.read_mesh(cpp.mesh.GhostMode.none, 'Grid')
    ct = infile3.read_meshtags(domain, name="Grid")

How many processes are you running this code on?
As your mesh has 270 million cells (and 43 million nodes) you would need to use quite a lot of processes ( I do not have a system available that can run this atm). I would expect using between 30 and 80 processes is needed for a sensible performance.

I don’t have the answer for the 64-bit indices, but I asked.

1 Like

Unfortunately, using PETSc with 64 bit integers didn’t solve the problem. Any updates from you guys?
Thanks

None from my end. I am gonna try to adaptively refine my mesh to make sure I have just the amount of resolution I need.

I’m already doing that. I refine mesh locally. Having a threshold on number of mesh elements is a serious limitation.

1 Like

DOLFINx has been executed with over 200 billion cells with reasonable scaling, so I cant really comment any further on this issue, as you haven’t replied to my previous questions:

One would also need more information about the amount of memory on your system.

In my case I was running with allocation of one node and 384GB of RAM and 1.6TB of storage.

How many processes did you use with mpirun?

I did not use mpirun. I rather used sbatch for a batch slurm job as part of a slurm cluster; with the necessary allocations of memory (384GB) and nodes (1).

You should still use mpirun even if you use slurm/sbatch. Otherwise you are not exploiting any of the parallelism of DOLFINx.

I dont have an sbatch script at hand (on my phone). But there are online examples: Slurm MPI examples | www.hpc2n.umu.se

1 Like

Perhaps that makes the difference. Our HPC vendors discourage use of mpirun or mpiexec infavor of srun/sbatch without giving much reasons why (I did not get this error while using mpirun/mpiexec for bigger meshes of a different problem).

Most finite element libraries leverage MPI for the parallelism.
We do scaling tests for DOLFINx on a nightly basis with

with weak scaling results at https://fenics.github.io/performance-test-results/

As i mentioned above you can still use srun/sbatch with MPI, one does not exclude the other.

1 Like