Parallel function saving and reading it back using PETSc or adios4dolfinx

Hi there,

I have a high order function solution (with MPI) as a PETSc vector and I would like to save it and read it back for postprocessing using single process. I have tried using adios4dolfinx but I could not make it working.

I prepared a MWE to show my attempt below;

The below code nicely saves the dolfinx function as a PETSc vector.

import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace,
                         locate_dofs_geometrical)
from dolfinx.io import VTXWriter
from dolfinx.mesh import create_unit_square, locate_entities
from ufl import VectorElement, FiniteElement
from mpi4py import MPI
from petsc4py import PETSc

mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

def Omega_0(x):
    return x[1] <= 0.3

def Omega_1(x):
    return x[1] >= 0.7

kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)

@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled

dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1

name = 'test'

#save petsc vector
viewer = PETSc.Viewer().createBinary(name+".dat", mode=PETSc.Viewer.Mode.WRITE, comm=MPI.COMM_WORLD)
kappa.vector.view(viewer)

with VTXWriter(mesh.comm, name+".bp", [kappa], engine="BP4") as vtx:
        vtx.write(0.0)

Now, I read that PETSc vector with the code below;

from dolfinx.fem import (Function, FunctionSpace)
from dolfinx.io import VTXWriter
from dolfinx.mesh import create_unit_square
from ufl import FiniteElement
from mpi4py import MPI
from petsc4py import PETSc

mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

kappa = Function(Q)

# read the petsc vector back
name = 'test'
viewer = PETSc.Viewer().createBinary(name+".dat", mode=PETSc.Viewer.Mode.READ, comm=MPI.COMM_WORLD)
loaded_vector = PETSc.Vec().load(viewer)

kappa.vector.setArray(loaded_vector.array)

# test the reading 
fname = 'test_petsc'
with VTXWriter(mesh.comm, fname+".bp", [kappa], engine="BP4") as vtx:
        vtx.write(0.0)

When I run the first code with 1 proc and read the PETSc vector with the second code using 1 proc, it gives the correct result;

However, when I run the first code in parallel using 2 procs and read the PETSc vector with 1 proc, it gives corrupted function;

I am aware that dof sharing over multiple processes is the problem, but is there anyone who can provide a clue, or different solution strategy? My aim is to save the result of heavy simulation in parallel and read it back to postprocess it using single proc.

Thank you for the answers in advance!

As the developer of adios4dolfinx, I am curious as to what you could make work with it?

You cannot simply store the solution vector without the corresponding mesh data (dofmap), as when your mesh is read it, it is partitioned, and cells (and in turn dof maps) are redistributed.

Could you please provide a code which reads the test.bp file using adios4dolfinx? I will run it in my computer and will say the error I am facing with.

You cannot use the test.bp file, as it doesn’t contain sufficient amount of data to read in.
You need to use adios4dolfinx to both write and read the checkpoint.
For more information as to why, see:
http://jsdokken.com/checkpointing-presentation/#/
and

for why one cannot use VTXWriter for checkpointing.

OK, then could you provide a MWE to write and read the data with adios4dolfinx?

Tons of examples in the tests: adios4dolfinx/tests/test_checkpointing.py at main · jorgensd/adios4dolfinx · GitHub

It is Also used in rbnicsx
https://github.com/search?q=repo%3ARBniCS%2FRBniCSx%20adios4dolfinx&type=code

1 Like

Thank you for sharing the link for RBniCSx.

When I use the code you linked, it again gives the corrupted function. MWEs are;

import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace, locate_dofs_geometrical)
from dolfinx.mesh import create_unit_square, locate_entities
from ufl import FiniteElement
from mpi4py import MPI

mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

def Omega_0(x):
    return x[1] <= 0.3

def Omega_1(x):
    return x[1] >= 0.7

kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)

@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled

dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1
kappa.x.scatter_forward()

import adios4dolfinx
import os
import pathlib

import adios4dolfinx
import dolfinx.fem
import dolfinx.io

def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)

    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)

    # Export for checkpointing
    adios4dolfinx.write_function(function, pathlib.Path(checkpointing_directory), "bp4")

export_function(kappa, "adios_dir", "test_export")

for exporting the function. That generates “test_export.bp” which is;

Then when I try to read this with;

from dolfinx.fem import (Function, FunctionSpace)
from dolfinx.mesh import create_unit_square
from ufl import FiniteElement
from mpi4py import MPI

mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

kappa = Function(Q)

import os
import pathlib

import adios4dolfinx
import dolfinx.fem

def import_function(
    function_space: dolfinx.fem.FunctionSpace, directory: str, filename: str
) -> dolfinx.fem.Function:

    function = dolfinx.fem.Function(function_space)
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    adios4dolfinx.read_function(function, pathlib.Path(checkpointing_directory), "bp4")
    return function

def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)

    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)

    # Export for checkpointing
    adios4dolfinx.write_function(function, pathlib.Path(checkpointing_directory), "bp4")

from MWE_adios import export_function

kappa = import_function(Q, "adios_dir","test_export")

name = 'test_import'

export_function(kappa, "adios_dir", name)

then export it back for testing, it gives;

Do they work in your installation nicely?

You should read the mesh from the checkpoint as well, not just the function data, as they are dependent on eachother.

What do you exactly mean by that?

Also i did observe very weird behaviour. I did try writing/reading the mesh using adios4dolfinx with the functions here.

For writing the data;

import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace, locate_dofs_geometrical)
from dolfinx.mesh import create_unit_square, locate_entities
from ufl import FiniteElement
from mpi4py import MPI
import dolfinx.fem
import dolfinx.io
import adios4dolfinx
import os
import pathlib



mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
adios4dolfinx.write_mesh(mesh, "mesh")

v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

def Omega_0(x):
    return x[1] <= 0.3

def Omega_1(x):
    return x[1] >= 0.7

kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)

@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled

dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1
kappa.x.scatter_forward()

def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)

    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)

    # Export for checkpointing
    adios4dolfinx.write_function(function, pathlib.Path(checkpointing_directory), "bp4")

export_function(kappa, "adios_dir", "test_export")

nicely writes the data serial/paralel. Then when I read mesh and function with;

from dolfinx.fem import (Function, FunctionSpace)
from dolfinx.mesh import create_unit_square
from ufl import FiniteElement
from mpi4py import MPI
import adios4dolfinx
# mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
import dolfinx

mesh = adios4dolfinx.read_mesh(MPI.COMM_WORLD,"mesh", engine='BP4', ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 1)
Q = FunctionSpace(mesh, v_cg)

kappa = Function(Q)

import os
import pathlib

import adios4dolfinx
import dolfinx.fem

def import_function(
    function_space: dolfinx.fem.FunctionSpace, directory: str, filename: str
) -> dolfinx.fem.Function:

    function = dolfinx.fem.Function(function_space)
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    adios4dolfinx.read_function(function, pathlib.Path(checkpointing_directory), "bp4")
    return function

def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(directory, filename + ".bp", ".checkpoint")
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)

    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)

    # Export for checkpointing
    adios4dolfinx.write_function(function, pathlib.Path(checkpointing_directory), "bp4")

from MWE_adios import export_function

kappa = import_function(Q, "adios_dir","test_export")

name = 'test_import'

export_function(kappa, "adios_dir", name)

When I run mpirun -np 1 python3 adios_read.py for the first time, I get;

Then when I run mpirun -np 1 python3 adios_read.py again without doing anything, I get successful import;

I am confused at this point. To read the function nicely, do I need to run reading script twice?

Also when I try to export/import degree 2 elements, I get an error during import;

Traceback (most recent call last):
  File "adios_read.py", line 48, in <module>
    kappa = import_function(Q, "adios_dir","test_export")
  File "adios_read.py", line 28, in import_function
    adios4dolfinx.read_function(function, pathlib.Path(checkpointing_directory), "bp4")
  File "/home/ee331/Dev/Venvs/v072complex/lib/python3.8/dist-packages/adios4dolfinx/checkpointing.py", line 248, in read_function
    owned_values = send_dofmap_and_recv_values(
  File "/home/ee331/Dev/Venvs/v072complex/lib/python3.8/dist-packages/adios4dolfinx/comm_helpers.py", line 113, in send_dofmap_and_recv_values
    values_to_distribute[i] = values[dofmap_offsets[l_cell] + pos]
IndexError: index 601 is out of bounds for axis 0 with size 600

The issues are your paths. You are storing the checkpoint inside the other BP file. Please change the codes to:

import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace, locate_dofs_geometrical)
from dolfinx.mesh import create_unit_square, locate_entities
from ufl import FiniteElement
from mpi4py import MPI
import dolfinx.fem
import dolfinx.io
import adios4dolfinx
import os
import pathlib


def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(
        directory, filename + "_checkpoint.bp")
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)
    print(visualization_directory)
    print(checkpointing_directory)
    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)
    # Export for checkpointing
    adios4dolfinx.write_function(
        function, pathlib.Path(checkpointing_directory), "bp4")


mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
adios4dolfinx.write_mesh(mesh, "mesh")

v_cg = FiniteElement("CG", mesh.ufl_cell(), 2)
Q = FunctionSpace(mesh, v_cg)


def Omega_0(x):
    return x[1] <= 0.3


def Omega_1(x):
    return x[1] >= 0.7


kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)


@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled


dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1
kappa.x.scatter_forward()

export_function(kappa, "adios_dir", "test_export")

and

import dolfinx.fem
import pathlib
import os
from dolfinx.fem import (Function, FunctionSpace)
from dolfinx.mesh import create_unit_square
from ufl import FiniteElement
from mpi4py import MPI
import adios4dolfinx
# mesh = create_unit_square(MPI.COMM_WORLD, 10, 10)
import dolfinx


def import_function(
    function_space: dolfinx.fem.FunctionSpace, directory: str, filename: str
) -> dolfinx.fem.Function:

    function = dolfinx.fem.Function(function_space)
    checkpointing_directory = os.path.join(
        directory, filename + ".bp")
    adios4dolfinx.read_function(
        function, pathlib.Path(checkpointing_directory), "bp4")
    return function


def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(
        directory, filename + "_checkpoint.bp")
    os.makedirs(checkpointing_directory, exist_ok=True)

    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)

    # Export for checkpointing
    adios4dolfinx.write_function(
        function, pathlib.Path(checkpointing_directory), "bp4")


mesh = adios4dolfinx.read_mesh(
    MPI.COMM_WORLD, "mesh", engine='BP4', ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 2)
Q = FunctionSpace(mesh, v_cg)

kappa = Function(Q)


kappa = import_function(Q, "adios_dir", "test_export_checkpoint")

name = 'test_import'

export_function(kappa, "adios_dir", name)

This is because you should remove the previous attempts to write to file, adios2 isn’t always good at clearing out all the data.

I think @Ekrem_Ekici copied the use of the .checkpoint hidden folder from the way I use adios4dolfinx in RBniCSx. @dokken can you comment on why having the checkpoint in a subfolder would be causing issues? My CI is running correctly, and I think I had manually checked the results once with paraview.

This is just from what I observed by running the code.

I dont know all the internals of how adios read its folder structure, But I imagine having nested files might cause issues.

@francesco-ballarin @dokken I got a more clean test here. First, I just generate the mesh with single proc;

# adios_mesh.py
from dolfinx.mesh import create_unit_square
import dolfinx
import adios4dolfinx
from mpi4py import MPI
import shutil

mesh = create_unit_square(MPI.COMM_WORLD, 30, 30, ghost_mode=dolfinx.mesh.GhostMode.shared_facet)
shutil.rmtree("mesh", ignore_errors=True)
adios4dolfinx.write_mesh(mesh, "mesh")

Then using this mesh, I make a function and export it with;

# adios_write.py
import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace, locate_dofs_geometrical)
from dolfinx.mesh import locate_entities
from ufl import FiniteElement
from mpi4py import MPI
import dolfinx.fem
import dolfinx.io
import adios4dolfinx
import os
import pathlib
import shutil

def export_function(function: dolfinx.fem.Function, directory: str, filename: str) -> None:

    comm = function.function_space.mesh.comm
    visualization_directory = os.path.join(directory, filename + ".bp")
    checkpointing_directory = os.path.join(
        directory, filename + "_checkpoint.bp")

    shutil.rmtree(visualization_directory, ignore_errors=True)
    shutil.rmtree(checkpointing_directory, ignore_errors=True)
    os.makedirs(visualization_directory, exist_ok=True)
    os.makedirs(checkpointing_directory, exist_ok=True)
    print(visualization_directory)
    print(checkpointing_directory)
    # Export for visualization
    with dolfinx.io.VTXWriter(comm, visualization_directory, function, "bp4") as vtx_file:
        vtx_file.write(0)
    # Export for checkpointing
    adios4dolfinx.write_function(
        function, pathlib.Path(checkpointing_directory), "bp4")

mesh = adios4dolfinx.read_mesh(MPI.COMM_WORLD, "mesh", engine='BP4', ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 2)
Q = FunctionSpace(mesh, v_cg)

def Omega_0(x):
    return x[1] <= 0.3

def Omega_1(x):
    return x[1] >= 0.7

kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)

@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled

dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1

export_function(kappa, "adios_dir", "test_export")

Lastly, I test the import of exported function with;

# assert.py
import numba
import numpy as np
from dolfinx.fem import (Function, FunctionSpace, locate_dofs_geometrical)
from dolfinx.mesh import create_unit_square, locate_entities
from ufl import FiniteElement
from mpi4py import MPI
import dolfinx.fem
import dolfinx.io
import adios4dolfinx
import os
import pathlib

mesh = adios4dolfinx.read_mesh(MPI.COMM_WORLD, "mesh", engine='BP4', ghost_mode=dolfinx.mesh.GhostMode.shared_facet)

v_cg = FiniteElement("CG", mesh.ufl_cell(), 2)
Q = FunctionSpace(mesh, v_cg)

def Omega_0(x):
    return x[1] <= 0.3

def Omega_1(x):
    return x[1] >= 0.7

kappa = Function(Q)
cells_0 = locate_entities(mesh, mesh.topology.dim, Omega_0)
cells_1 = locate_entities(mesh, mesh.topology.dim, Omega_1)

@numba.njit
def unroll_dofmap(dofs, bs):
    dofs_unrolled = np.zeros(bs*len(dofs), dtype=np.int32)
    for i, dof in enumerate(dofs):
        for b in range(bs):
            dofs_unrolled[i*bs+b] = dof*bs+b
    return dofs_unrolled

dofs_0 = locate_dofs_geometrical(Q, Omega_0)
dofs_1 = locate_dofs_geometrical(Q, Omega_1)
dofmap_bs = Q.dofmap.bs
kappa.x.array[unroll_dofmap(dofs_0, dofmap_bs)] = 1
kappa.x.array[unroll_dofmap(dofs_1, dofmap_bs)] = 0.1

def import_function(function_space: dolfinx.fem.FunctionSpace, directory: str, filename: str) -> dolfinx.fem.Function:

    function = dolfinx.fem.Function(function_space)
    checkpointing_directory = os.path.join(directory, filename + ".bp")
    adios4dolfinx.read_function(function, pathlib.Path(checkpointing_directory), "bp4")
    return function

kappa_import = import_function(Q, "adios_dir", "test_export_checkpoint")

print("original", np.count_nonzero(kappa.vector.array))
print("imported", np.count_nonzero(kappa_import.vector.array))

assert np.allclose(kappa.vector.array, kappa_import.vector.array)

For mpirun -np 1 python3 adios_write.py && python3 assert.py, it passes;

original 2318
imported 2318

For mpirun -np 2 python3 adios_write.py && python3 assert.py, it fails with;

original 2318
imported 2324
Traceback (most recent call last):
  File "assert.py", line 55, in <module>
    assert np.allclose(kappa.vector.array, kappa_import.vector.array)
AssertionError

For mpirun -np 3 python3 adios_write.py && python3 assert.py, it fails with;

original 2318
imported 2307
Traceback (most recent call last):
  File "assert.py", line 55, in <module>
    assert np.allclose(kappa.vector.array, kappa_import.vector.array)
AssertionError

For mpirun -np 4 python3 adios_write.py && python3 assert.py, it fails with;

original 2318
imported 2258
Traceback (most recent call last):
  File "assert.py", line 55, in <module>
    assert np.allclose(kappa.vector.array, kappa_import.vector.array)
AssertionError

and so on.

There is something wrong with exporting in parallel but I could not figured out why.

The mesh has to be associated with the checkpoint file. You are currently not doing that.
Note that I just made a release (v0.7.2 as of 10 minutes ago) supporting multi-function and time-stepping in checkpoints.

I will try to illustrate how I would use adios4dolfinx on a slightly smaller example, using ipythonparallel to use different processes to read and write.
See: Demo of capabilities · Issue #54 · jorgensd/adios4dolfinx · GitHub
or

"""Usage of ADIOS4DOLFINx
Author: Jørgen S. Dokken
SPDX-License-Identifier: MIT
"""

import ipyparallel as ipp
from pathlib import Path
from typing import Tuple, Callable
import numpy as np


def f(x):
    return x[0]**2+x[1]


checkpoint_0 = Path("checkpoint_0.bp")
checkpoint_1 = Path("checkpoint_1.bp")

el = ("Lagrange", 3)


def create_mesh(path: Path, N: int):
    """
    Given a path and a number of elements in each direction, create a mesh and write it to file
    """
    from mpi4py import MPI
    import dolfinx
    import adios4dolfinx
    mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, N, N)
    adios4dolfinx.write_mesh(mesh, path, engine="BP4")
    print(
        f"Mesh written to {path} on comm {mesh.comm.rank+1}/{mesh.comm.size}", flush=True)


cluster = ipp.Cluster(engines="mpi", n=3)
rc = cluster.start_and_connect_sync()

# This has started to processes that can execute code with a MPI communicator.
# We create a mesh and write a checkpoint with 3 processors

query = rc[:].apply_async(create_mesh, checkpoint_0, 10)
query.wait()
assert query.successful(), query.error
for msg in query.stdout:
    print(msg)
cluster.stop_cluster_sync()


# Next we create a new cluster with 2 processors and read the checkpoint

def read_mesh_and_create_function(path: Path, path_out: Path, el: Tuple[str, str], f: Callable[[np.ndarray], np.ndarray]):
    """
    Given a mesh input file, read the mesh and create a function and write it to a new file.
    The function is put in the function space defined by el and f is the function to interpolate
    """
    from mpi4py import MPI
    import dolfinx
    import adios4dolfinx
    mesh = adios4dolfinx.read_mesh(
        MPI.COMM_WORLD, path, engine="BP4", ghost_mode=dolfinx.mesh.GhostMode.shared_facet)
    print(
        f"Mesh read from {path} on comm {mesh.comm.rank+1}/{mesh.comm.size}", flush=True)
    V = dolfinx.fem.functionspace(mesh, el)
    u = dolfinx.fem.Function(V)
    u.interpolate(f)
    adios4dolfinx.write_mesh(mesh, path_out, engine="BP4")
    adios4dolfinx.write_function(u, path_out, engine="BP4")
    print(
        f"Function written to {path_out} on comm {mesh.comm.rank+1}/{mesh.comm.size}", flush=True)


cluster_2 = ipp.Cluster(engines="mpi", n=2)
rc_2 = cluster_2.start_and_connect_sync()

query_2 = rc_2[:].apply_async(
    read_mesh_and_create_function, checkpoint_0, checkpoint_1, el, f)
query_2.wait()
assert query_2.successful(), query_2.error
for msg in query_2.stdout:
    print(msg)
cluster_2.stop_cluster_sync()


def read_and_compare(path, el: Tuple[str, str], f: Callable[[np.ndarray], np.ndarray]):
    """
    Read in function from file and compare with analytical input solution
    """
    from mpi4py import MPI
    import dolfinx
    import adios4dolfinx
    import numpy as np
    mesh = adios4dolfinx.read_mesh(
        MPI.COMM_WORLD, path, engine="BP4", ghost_mode=dolfinx.mesh.GhostMode.shared_facet)
    print(
        f"Mesh read from {path} on comm {mesh.comm.rank+1}/{mesh.comm.size}", flush=True)
    V = dolfinx.fem.functionspace(mesh, el)
    u_ex = dolfinx.fem.Function(V)
    u_ex.interpolate(f)

    u = dolfinx.fem.Function(V)
    adios4dolfinx.read_function(u, path, engine="BP4")
    print(
        f"Function read from {path} on comm {mesh.comm.rank+1}/{mesh.comm.size}")
    np.testing.assert_allclose(u.x.array, u_ex.x.array, atol=1e-15)
    print(f"Assertion passed on comm {mesh.comm.rank+1}/{mesh.comm.size}")


# Finally, we read the mesh and function on 4 processors and compare it with the analytical solution
cluster_3 = ipp.Cluster(engines="mpi", n=4)
rc_2 = cluster_3.start_and_connect_sync()

query_3 = rc_2[:].apply_async(read_and_compare, checkpoint_1, el, f)
query_3.wait()
assert query_3.successful(), query_3.error
for msg in query_3.stdout:
    print(msg)
cluster_3.stop_cluster_sync()

Returning

Mesh written to checkpoint_0.bp on comm 1/3

Mesh written to checkpoint_0.bp on comm 2/3

Mesh written to checkpoint_0.bp on comm 3/3

Mesh read from checkpoint_0.bp on comm 1/2
Function written to checkpoint_1.bp on comm 1/2

Mesh read from checkpoint_0.bp on comm 2/2
Function written to checkpoint_1.bp on comm 2/2

Mesh read from checkpoint_1.bp on comm 1/4
Function read from checkpoint_1.bp on comm 1/4
Assertion passed on comm 1/4

Mesh read from checkpoint_1.bp on comm 2/4
Function read from checkpoint_1.bp on comm 2/4
Assertion passed on comm 2/4

Mesh read from checkpoint_1.bp on comm 3/4
Function read from checkpoint_1.bp on comm 3/4
Assertion passed on comm 3/4

Mesh read from checkpoint_1.bp on comm 4/4
Function read from checkpoint_1.bp on comm 4/4
Assertion passed on comm 4/4

Thank you, your demo file gives the error below;

Mesh written to checkpoint_0.bp on comm 1/3

Mesh written to checkpoint_0.bp on comm 2/3

Mesh written to checkpoint_0.bp on comm 3/3

Mesh read from checkpoint_0.bp on comm 1/2
Function written to checkpoint_1.bp on comm 1/2

Mesh read from checkpoint_0.bp on comm 2/2
Function written to checkpoint_1.bp on comm 2/2

Traceback (most recent call last):
  File "demo.py", line 114, in <module>
    assert query_3.successful(), query_3.error
AssertionError: [<RemoteError[ ]:Exception(MPI_ERR_TRUNCATE: message truncated)>, <RemoteError[ ]:Exception(MPI_ERR_TRUNCATE: message truncated)>, <RemoteError[ ]:Exception(MPI_ERR_TRUNCATE: message truncated)>, <RemoteError[ ]:Exception(MPI_ERR_TRUNCATE: message truncated)>]
Stopping cluster <Cluster(cluster_id='1707322345-ftlf', profile='default', controller=<running>, engine_sets=['1707322346'])>

I am not sure if my mpi4py configuration is right for adios4dolfinx. Is there any mpi version requirement in adios4dolfinx?

Is it possible that you are on a machine (or cloud instance, or whatever) with less than 4 cores?

I do have 16 cores in my system.

@darcy:~$ nproc
16

Do my tests pass in your system @francesco-ballarin ?

Because I use your code from RBniCSx.

I run the latest code that @dokken shared, and it’s working on my system.

I’ll update RBniCSx tomorrow, because when he wrote

The mesh has to be associated with the checkpoint file. You are currently not doing that.

that was actually my fault :wink:

Could you try to change the number of processes in the latter cluster to 1,2 or 3?
Ie change n in