Write and read back mesh in parallel with same meshtags

a-lyrae · March 18, 2024, 11:16am

Hello,

I recently moved from dolfinx 0.6.0 to dolfinx 0.7.3, by installing adios4dolfinx ADIOS4DOLFINx - A framework for checkpointing in DOLFINx — ADIOS2Wrappers via conda.

In the following example, I create a mesh via GMSH and define cell and surface tags.
I then import everything in dolfinx by using io.gmshio.model_to_mesh.
Finally, I save mesh, cell tags and facet tags in a XDMF file.
I run the script in parallel, for instance by using mpirun -n 5.

My problem:
I would like to read back the same mesh and meshtags, distributed in the same fashion among the 5 ranks. It worked out of the box with dolfinx 0.6.0, but it does not seem to be the case with dolfinx 0.7.3.
In the following example, the boundary dofs distributed among the 5 ranks are different when:

I save the mesh and then compute boundary_dofs (save = True)
I load mesh and meshtags from the XDMF file and then compute the boundary_dofs (save = False)

I tried to do a similar thing by using adios4dolfinx but I run into the same issue.
Could you please help me?

Many thanks

import gmsh
from dolfinx import io, fem
import os, shutil, sys
import petsc4py
from mpi4py import MPI

petsc4py.init(sys.argv)
comm = MPI.COMM_WORLD

dim = 3
volume_tag = 1
surface_tag = 10
save = True
test_dir = os.path.join(os.getcwd(),'test')

if save:
    # Save mesh and meshtags
    if comm.rank == 0:
        print('Saving ...', flush = True)
    if os.path.isdir(test_dir):
        shutil.rmtree(test_dir)
    os.makedirs(test_dir)

    # Create mesh
    gmsh.initialize()
    gmsh.option.setNumber("General.Terminal", 0)
    model = gmsh.model()
    model.add("Domain")
    model.occ.addBox(0, 0, 0, 8, 8, 8)
    model.occ.synchronize()
    model.addPhysicalGroup(dim, [1], volume_tag)
    model.setPhysicalName(dim, volume_tag, "Domain volume")
    boundary_dimtags = model.getBoundary([(dim, 1)], oriented=False)
    model.addPhysicalGroup(dim-1, [dt[1] for dt in boundary_dimtags], tag=surface_tag)
    model.setPhysicalName(dim-1, surface_tag, "Domain surface")
    gmsh.option.setNumber("Mesh.CharacteristicLengthMax", 8/13)
    model.occ.synchronize()
    model.mesh.generate(dim)
    domain, cell_tags, facet_tags = io.gmshio.model_to_mesh(model, comm, rank=0, gdim=dim)

    with io.XDMFFile(comm, test_dir+'/domain.xdmf', "w") as xdmf:
        xdmf.write_mesh(domain)
        cell_tags.name = "cell_tags"
        facet_tags.name = "facet_tags"
        xdmf.write_meshtags(cell_tags, domain.geometry)
        xdmf.write_meshtags(facet_tags, domain.geometry)
        xdmf.close()
else:
    # Load mesh and meshtags
    if comm.rank == 0:
        print('Loading ...', flush = True)
    with io.XDMFFile(comm, test_dir+'/domain.xdmf', "r") as xdmf:
        domain = xdmf.read_mesh()
        tdim = domain.topology.dim
        fdim = tdim - 1
        domain.topology.create_connectivity(fdim, tdim)
        cell_tags = xdmf.read_meshtags(domain, name = "cell_tags")
        facet_tags = xdmf.read_meshtags(domain, name =  "facet_tags")
        assert cell_tags.name == "cell_tags"
        assert facet_tags.name == "facet_tags" 

# Compute boundary_dofs
Vp = fem.FunctionSpace(domain,("CG", 1))
boundary_dofs = fem.locate_dofs_topological(Vp, domain.topology.dim-1, facet_tags.find(surface_tag))
# boundary_dofs is different when the mesh is computed versus loaded
print('Rank = {}, self.boundary_dofs = {}\n\n'.format(comm.rank, boundary_dofs))

dokken · March 18, 2024, 11:23am

For the main branch if adios4dolfinx there is support for storing data in the original order, see
http://jsdokken.com/adios4dolfinx/docs/partitioned_mesh.html
and/or
http://jsdokken.com/adios4dolfinx/docs/original_checkpoint.html
Depending on the application.

a-lyrae · March 18, 2024, 1:43pm

Thanks Dokken,

however, the function adios4dolfinx.write_mesh does not seem to have the keyword store_partition_info

import adios4dolfinx as a4dx
a4dx.write_mesh(domain, test_dir+'/domain.bp', store_partition_info=True)
TypeError: write_mesh() got an unexpected keyword argument 'store_partition_info

The function write_mesh in checkpointing.py is defined as

write_mesh(mesh: dolfinx.mesh.Mesh, filename: Path, engine: str = "BP4")

This, with adios4dolfinx installed via conda (v0.7.3).
When I try to install the main branch I get:

INFO: pip is looking at multiple versions of adios4dolfinx to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement fenics-dolfinx>=0.8.0.dev0 (from adios4dolfinx) (from versions: none)
ERROR: No matching distribution found for fenics-dolfinx>=0.8.0.dev0

dokken · March 18, 2024, 2:25pm

It is on the main branch of adios4dolfinx.
You would need to use the main branch of DOLFINx as well, as I do not plan to backport these functionalities to 0.7.x

a-lyrae · March 18, 2024, 3:58pm

Thank you Dokken,

I am having troubles installing dolfinx 0.8.0 from source, due to conflicting dependencies.
When do you think that adios4dolfinx+dolfinx0.8.0 will be available in conda?

Meanwhile, is there any way to achieve the same by playing with the XDMF files, as reported in the mwe above?

thanks

dokken · March 18, 2024, 4:59pm

I would just use the docker image from: Package dolfinx/dolfinx · GitHub

I would never assume that the boundary dofs are the same across runs. This is a very fragile thing to consider, as the distribution of ghosts might be different depending on the MPI communication. Could you add an argument as to why you would need this? As the function above compute the dofs, there shouldn’t be a need to assume the exact same order for another run.

a-lyrae · March 18, 2024, 5:03pm

thank you for your answer.

During a first run, I create a mesh and compute a boundary element matrix with a given number of processors. Calculations are long so I would like to restart the simulation by using checkpoints.
In the second run, I would like to simply load the BEM matrix, which requires each processor to have the same DoFs of the first run.
This was working just fine in dolfinx 0.6.0, did something change in dolfinx 0.7.3 ?

I installed docker by using pip.
When I try to pull the main branch of dolfinx, I get the error message:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

dokken · March 18, 2024, 5:17pm

I think that was just working by chance, it would not be something we plan on supporting, as the distributions of dofs is up to the partitioner (Parmetis, Scotch or Kahip), and they are not always consistent across runs. Secondly, the ghost ordering might change from run to run, as they are assigned based on the current communicator, which I do not think is the same from run to run (as i dont think the ghost ordering in the index map is the same from time to time)

a-lyrae · March 18, 2024, 5:21pm

oh, I see. Strange that it was working no matter the number of procs I used though. Good to know!

Would adios4dolfinx.write_mesh() with store_partition_info=True solve my problem or can you already foresee problems ahead with ghost nodes and so on? Would it be better to compute the bem matrix at every run?

dokken · March 18, 2024, 5:24pm

I think you would get problems with the ordering of ghosts (even if the cells are on the same process).

I would recompute the bem matrix at each run.

a-lyrae · March 18, 2024, 5:25pm

thank you for your replies! Have a nice evening

a-lyrae · March 25, 2024, 8:57am

Hi Dokken,

could you please explain to me what is the advantage of using adios4dolfinx.write_mesh() with store_partition_info=True if this does not ensure saving the mesh and all the DoFs (including ghosts) consistently between runs? What does it exactly do and what is its intended purpose?

Also, is there any documentation on how dolfinx distributes DoFs (including ghosts) among processors at every run?

There is a variety of applications where one would like to load a mesh and apply some initial conditions to the exact same nodes, without interpolating the entire field at each time (with BEM it’s not even possible), so it seems strange to me that this is not a possibility.

thanks again

dokken · March 25, 2024, 10:07am

If you want to avoid repartitioning of the mesh (as this can be somewhat expensive).

If I would have to enforce the DoF ordering etc it requires storage of alot more data.

You would have to read the source code:

github.com

FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/dofmapbuilder.cpp#L621-L672


      
          fem::build_dofmap_data(
              MPI_Comm comm, const mesh::Topology& topology,
              const std::vector<ElementDofLayout>& element_dof_layouts,
              const std::function<std::vector<int>(
                  const graph::AdjacencyList<std::int32_t>&)>& reorder_fn)
          {
            common::Timer t0("Build dofmap data");
          
            // Build a simple dofmap based on mesh entity numbering, returning (i)
            // a local dofmap, (ii) local-to-global map for dof indices, and (iii)
            // pair {dimension, mesh entity index} giving the mesh entity that dof
            // i is associated with.
            const auto [node_graphs, local_to_global0, dof_entity0, topo_index_maps,
                        offset]
                = build_basic_dofmaps(topology, element_dof_layouts);
          
            LOG(INFO) << "Got " << topo_index_maps.size() << " index_maps";
          
            // Build re-ordering map for data locality and get number of owned
            // nodes

This file has been truncated. show original

and probably also the topology creation code.

It is unclear to me how you want to store this data. As far as I can tell from your code, you have the underlying assumption of a P1-space, where degrees of freedom are associated with the vertices. In such a case it is possible to build a map from the global input vertices to the degrees of freedom on your process. However, note that this is not possible for most spaces.

Topic		Replies	Views
I/O from XDMF/HDF5 files in dolfin-x dolfinx	73	9378	June 24, 2024
Parallel XDMF read/write of mesh General	7	161	April 25, 2024
Parallel function saving and reading it back using PETSc or adios4dolfinx dolfinx io	48	1161	February 12, 2024
Saving mesh and mesh_tags using adios4dolfinx General	15	551	March 4, 2024
Initializing mesh and function from a file (read sequentially) I/O	9	335	June 23, 2024

Write and read back mesh in parallel with same meshtags

Related topics