Dolfinx petsc error for large mesh

Thank you, I will revert back to using mpirun/mpiexec and see if that helps with the problem as I clarify the reasons for the advisory https://docs.pace.gatech.edu/phoenix_cluster/slurm_conversion_phnx/.

The webpage says:

  • srun is the standard SLURM command to start an MPI program. It automatically uses the allocated job resources: nodelist, tasks, logical cores per task. Do not use mpirun.

This means that srun is equivalent to using mpirun, but you need to set the number of processes you want to use on the node (number of tasks), see for instance https://hpcrcf.atlassian.net/wiki/spaces/TCP/pages/7287338/How-to+Submit+a+MPI+Job

#SBATCH --nodes=3
#SBATCH --ntasks-per-node=28
This will result in an MPI job with 84 processes, with 28 processes on each of 3 nodes. In this case, we’ve chosen 28 tasks per node with the knowledge that each node has 28 CPU cores. Although it’s not necessary to use all cores on a node, this is often efficient, since more communication between processes happens on the same node.

Thank you for pointing me to the resource. Will update if I get resolution.

Any updates @leshinka ? Could you solve the issue? Thanks

Sorry for the belated reply. I was on break. I made changes to my scripts according to the suggestions by @dokken, but that did not solve the issue. I found alternative ways to do the analysis that I wanted without relying on the large mesh, so I have stalled further work on the problem until I have more bandwidth/inspiration. I built my mesh in gmsh before converting to xdmf format. Is it possible that the mesh construction process could have some bearing on the issue we face?

1 Like

I wouldnt think so. I would suggest first checking how many cells, dofs etc is per process. If this number is too large (like running all on one process, one cannot create the matrix with a standard petsc installation).

1 Like

Indeed I have been running all on one process (specifying one node) since that simplified the downstream post-processing of the results (all assembled on one process, and get one report logged). Is there a way to constrain all the post-processing to happen on one process? For example, will nesting the post-processing in (where domain is the mesh object)

if domain.comm.rank == 0:
    insulated_area = fem.assemble_scalar(fem.form(1 * ds(markers.insulated)))

give accurate results if the original problem is run on several processes?

A trivial MPI allreduce or reduce, as done in
https://jsdokken.com/dolfinx-tutorial/chapter2/heat_code.html?highlight=allreduce#verifying-the-numerical-solution
Ie

insulated_area = fem.form(1*ds(markers.insulated))
local_area = fem.assemble_scalar(insulated_area)
global_area = domain.comm.allreduce(local_area, op=MPI.SUM)
if domain.comm.rank == 0:
     print(global_area)

or

insulated_area = fem.form(1*ds(markers.insulated))
local_area = fem.assemble_scalar(insulated_area)
global_area = domain.comm.reduce(local_area, op=MPI.SUM, root=0)
if domain.comm.rank == 0:
     print(global_area)

which would only create a variable on rank 0.

1 Like

Consider a tutorial on the use of MPI, e.g:

https://mpi4py.readthedocs.io/en/stable/tutorial.html

1 Like

You can also consider: Introduction to MPI — MPI tutorial
which is tailored towards dolfinx

Thank you @dokken and @nate !

I think my issue might be related to memory limitations that don’t raise OOM error. I have run the same problem for variation of mesh sizes (4 problems to be exact) and the RAM usage per node is shy of 384G for the second-largest mesh I am working with 46 million tetrahedral elements. I used 20 nodes each with 384G for each problem and I am only getting the vector reserve error for a problem with 57 million tetrahedral elements. The accompanying facets are less than a million triangle elements, so they don’t make much of a difference. I don’t have access to a larger node RAM memory, so I cannot falsify the theory.

<Xdmf Version="3.0"><Domain><Grid Name="Grid"><Geometry GeometryType="XYZ"><DataItem DataType="Float" Dimensions="9261442 3" Format="HDF" Precision="8">tetr.h5:/data0</DataItem></Geometry><Topology TopologyType="Tetrahedron" NumberOfElements="56817384" NodesPerElement="4"><DataItem DataType="Int" Dimensions="56817384 4" Format="HDF" Precision="8">tetr.h5:/data1</DataItem></Topology><Attribute Name="name_to_read" AttributeType="Scalar" Center="Cell"><DataItem DataType="Int" Dimensions="56817384" Format="HDF" Precision="8">tetr.h5:/data2</DataItem></Attribute></Grid></Domain></Xdmf>

@fluid what’s the context for your problem?

Assuming the mesh and matrices fit into memory, which linear solution method are you employing?

It is to me unclear if you still run things in serial or if you run in parallel now?
Are you using MPI.COMM_WORLD when declaring/loading your mesh?

Yes, I am using MPI.COMM_WORLD when declaring/loading the mesh. I have grabbed relevant parts showing mesh loading:

comm = MPI.COMM_WORLD
logger.debug("Loading tetrahedra (dim = 3) mesh..")
with io.XDMFFile(comm, tetr_mesh_path, "r") as infile3:
    domain = infile3.read_mesh(cpp.mesh.GhostMode.none, 'Grid')
    ct = infile3.read_meshtags(domain, name="Grid")
domain.topology.create_connectivity(domain.topology.dim, domain.topology.dim - 1)
with io.XDMFFile(comm, tria_mesh_path, "r") as infile2:
    ft = infile2.read_meshtags(domain, name="Grid")
left_boundary = ft.find(markers.left_cc)
right_boundary = ft.find(markers.right_cc)
meshtags = mesh.meshtags(domain, 2, ft.indices, ft.values)

I am using fem.petsc.LinearProblem with the following Krylov solver options:

options = {
               "ksp_type": "gmres",
               "pc_type": "hypre",
               "ksp_rtol": 1.0e-12
               }

Do you have a backtrace as to where in the code there is an error? Is it in assembly, solve or somewhere different?

The error is during model assembly, when petsc tries to create the matrix. Besides this traceback, I don’t know how to obtain additional trace:

Traceback (most recent call last):
  File "/storage/coda1/p-tf74/0/shared/leshinka/ssb/conductivity.py", line 130, in <module>
    model = fem.petsc.LinearProblem(a, L, bcs=[left_bc, right_bc], petsc_options=options)
  File "/storage/coda1/p-tf74/0/shared/leshinka/spack/var/spack/environments/fenicsx-env/.spack-env/view/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 567, in __init__
    self._A = create_matrix(self._a)
  File "/storage/coda1/p-tf74/0/shared/leshinka/spack/var/spack/environments/fenicsx-env/.spack-env/view/lib/python3.10/site-packages/dolfinx/fem/petsc.py", line 131, in create_matrix
    return _cpp.fem.petsc.create_matrix(a)
ValueError: vector::reserve

Is this what you are referring to?

This indicates you cannot fit your matrix into memory. How many degrees of freedom are in your global problem? Or can you provide roughly how many degrees of freedom per process you’re trying to solve?

In the global problem, there’s roughly DOF Order(28 million). @nate Per your comment and the fact that a problem with slightly fewer DOF nearly exhausted the max memory per node, it just might mean that I cannot run the problem given the memory allocated: Dolfinx petsc error for large mesh - #32 by leshinka. I can rerun the problem with the goal of getting DOF per process, but that would take a bit more time.

28e6 / N_proc will give you a sufficient estimate for number of DoFs per process, where N_proc is the number of processes.

Another factor to consider is the density of the matrix. This is indicated, for example, by the polynomial degree and element type of the FE space you’re employing. Naturally, the more dense the matrix, the more memory required.