I/O from XDMF/HDF5 files in dolfin-x

Dear community,

I’ve been searching a while for how to write and read a function from XDMF/HDF5 files in dolfin-x, but haven’t found a suitable example.

Consider the following MWE which first writes a function to a file and then (fails) to read back in the function from the file:

from dolfinx import *
from dolfinx.io import XDMFFile
from dolfinx.cpp.mesh import CellType
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD

mesh = BoxMesh(comm, [np.array([0.0, 0.0, 0.0]),np.array([2.0, 1.0, 1.0])], [12, 12, 12],
               CellType.tetrahedron, dolfinx.cpp.mesh.GhostMode.none)

V = VectorFunctionSpace(mesh, ("DG", 0))
f = Function(V, name="fiber")

# write function to file
with XDMFFile(comm,'tmp.xdmf','w') as ofile:
    ofile.write_mesh(mesh)
    ofile.write_function(f)

encoding=XDMFFile.Encoding.HDF5

f2 = Function(V)

# read function from file
with XDMFFile(comm,'tmp.xdmf','r',encoding=encoding) as ifile:
    #f2 = ifile.read_function(V,"fiber")
    ifile.read_checkpoint(f2,"fiber",1)

The writing of the function and the mesh work, but the reading back in to another function fails. The read_checkpoint and the read_function seem to have been removed in dolfin-x? Is there any alternative?

What I could do is reading with h5py:

import h5py
with h5py.File('tmp.h5', 'r') as ifile:
    y = ifile['/Function/fiber/0']

however the output y is not a function and would have to be transformed/copied to a function (which could lack in parallel).

Hope someone has a nicer solution!

Best,
Marc

The write_checkpoint/read_checkpoint functions are under development, and will be added to dolfinx eventually.

Oh ok, thanks for the info. Is there any other chance of reading a function from a XDMF/HDF5 file? I do not need it as checkpoint, I just have a preprocessing step where I save a vector function to a file and want to read that back in in order to assign this vector to every element.

I haven’t tested that, but it should be possible.
To assign arrays in parallel, you can find the ownership range of the PETSc vector (u.vector), and use u.vector.setvalues local to assign the array.

I will let you know once checkpointing is back in fenics-x

Ok, I tried:

V_fiber = VectorFunctionSpace(mesh, ("DG", 0))
f0 = Function(V_fiber)

with h5py.File('tmp.h5', 'r') as ifile:
    y = ifile['/Function/fiber/0']
    
    f0.vector[:] = y

and it seemed to have worked in parallel, too! :slight_smile:

So guess I solved my problem.

Bw,
Marc

1 Like

Unfortunately, after a closer look, my solution did not work in parallel… :-/ I set up a new minimal example that reads in a txt file with vector values and stores them into a function. Writing the function to a file and visualizing it in Paraview yields different results serial vs. parallel.

Concretely, I’m creating some random vector data in a txt file and then want to assign it to a vector in a discontinuous function space.

from dolfinx import *
from dolfinx.io import XDMFFile
from dolfinx.cpp.mesh import CellType
from mpi4py import MPI
import numpy as np
from numpy import loadtxt
from ufl import sin, cos

comm = MPI.COMM_WORLD

mode = 'parallel' # serial, parallel

mesh = BoxMesh(comm, [np.array([0.0, 0.0, 0.0]),np.array([2.0, 1.0, 1.0])], [1, 1, 1],
               CellType.tetrahedron, dolfinx.cpp.mesh.GhostMode.none)

V = VectorFunctionSpace(mesh, ("DG", 0))

# generate data
filename = 'data.txt'
fl = open(filename, 'wt')

for i in range(int(V.dim()/3)):
    x, y, z = sin(i), cos(i), 1.0
    fl.write(str(x) + " " + str(y) + " " + str(z) + "\n")
fl.close()

# load data - V.dim() x 3 array
data = loadtxt('data.txt')

# roll out to 3*V.dim() x 1 array
data_flat = data.flatten()   

# function to be filled with data
f0 = Function(V, name="fiber")

for i in range(len(data_flat)):
    if i >= f0.vector.getOwnershipRange()[0] and i < f0.vector.getOwnershipRange()[-1]:
        index_loc = i-f0.vector.getOwnershipRange()[0]
        f0.vector.setValuesLocal(index_loc,data_flat[i])
        print("i = %i  rank = %i  val = %.4f" % (i,comm.rank,f0.vector[i]))

#f0.vector[:] = data_flat

with XDMFFile(comm,'tmp_'+mode+'.xdmf','w') as outfile:
    outfile.write_mesh(mesh)
    outfile.write_function(f0)

The print of array index, rank and function value is in my view consistent serial vs. parallel, but the field in Paraview looks significantly different, so I assume the write_function is causing the problem.
The commented alternative line f0.vector[:] = data_flat to assign the vector yields the same results: right in serial, wrong in parallel.

Does anyone have an idea what goes wrong here?

Thx!
Marc

As the DG field is interpolated to a CG 1 function space when visualized in paraview, it does not surprise me that the visualization of it differs in serial and parallel, as the values at the vertices are not unique. We are working on reintroducing proper DG visualization again.

Yes, but I experienced in a more complex example that it’s not only the visualization, but indeed the results that differ. I’m using this vector f0 in an anisotropic constitutive law, and depending on the number of processes, the outcome is different. :confused:

So what you could do, is save the cell center of each cell in front of the values in your text file(V.tabulate_dof_coordinates() should supply these). Then , when you then read it in, loop over all cells to and find the line with the matching cell midpoint, which should correspond to the index in the DG function array

Thanks for the hint. I tried this out, but since I then have to loop over cells and inside this loop again over my txt file, this gets infeasible for large problems.
I guess the only solution would be to re-implement the read_function function in order to read from the XDMF file.

2 Likes

Any updates on this? Did you managed to reimplement it? I’m facing a similar issue right now.

Hi Tomaubier,

I’ve implemented a read function that reads a nodal vector field f, provided together with the nodal coordinates x, y, z in a text file with the order

fx1 fy1 fz1 x1 y1 z1
fx2 fy2 fz2 x2 y2 z2
...
fxn fyn fzn xn yn zn

with the following function:

def readfunction(self, f, V, datafile, normalize=False):
    
    # block size of vector
    bs = f.vector.getBlockSize()
    
    # load data and coordinates
    data = np.loadtxt(datafile,usecols=(np.arange(0,bs)),ndmin=2)
    coords = np.loadtxt(datafile,usecols=(-3,-2,-1)) # last three always are the coordinates
    
    # new node coordinates (dofs might be re-ordered in parallel)
    # in case of DG fields, these are the Gauss point coordinates
    co = V.tabulate_dof_coordinates()

    # index map
    im = V.dofmap.index_map.global_indices()

    tol = 1.0e-8
    tolerance = int(-np.log10(tol))

    # since in parallel, the ordering of the dof ids might change, so we have to find the
    # mapping between original and new id via the coordinates
    ci = 0
    for i in im:
        
        ind = np.where((np.round(coords,tolerance) == np.round(co[ci],tolerance)).all(axis=1))[0]
        
        # only write if we've found the index
        if len(ind):
            
            if normalize:
                norm_sq = 0.
                for j in range(bs):
                    norm_sq += data[ind[0],j]**2.
                norm = np.sqrt(norm_sq)
            else:
                norm = 1.
            
            for j in range(bs):
                f.vector[bs*i+j] = data[ind[0],j] / norm
        
        ci+=1

    f.vector.assemble()
    
    # update ghosts
    f.vector.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

That works for me very efficiently in parallel also for large problems.

For other things such as restarting a simulation, I’m using PETSc viewer to write and read vectors, e.g.

viewer = PETSc.Viewer().createMPIIO('name.dat', 'w', self.comm)
variable.vector.view(viewer)

and

viewer = PETSc.Viewer().createMPIIO('name.dat', 'r', self.comm)
variable.vector.load(viewer)
variable.vector.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

Hope this helps!

Best,
Marc

3 Likes

Awesome! thanks for your responsiveness!

I’d like to follow up on that, because this feature seems to already exist in PETSc.

According to petsc4py documentation there exists two ways to create a Viewer that interacts with hdf5 (here with the createHDF5 method and here). This probably leads to the relevant PETSc handle (with its example).

However, any attempt to use these features inside the dolfinx docker container fail. Namely :

> sudo docker run -itv ~/my_folder:/home/shared -w /home/shared/ --rm dolfinx/dolfinx
root@whatever: python3 -c "from petsc4py import PETSc; PETSc.Viewer().createHDF5('dummy.h5')"

Produces an error :

Traceceback (most recent call last):
  File "<string>", line 1, in <module>
  File "PETSc/Viewer.pyx", line 182, in petsc4py.PETSc.Viewer.createHDF5
petsc4py.PETSc.Error: error code 86
[0] PetscViewerSetType() at /usr/local/petsc/src/sys/classes/viewer/interface/viewreg.c:442
[0] Unknown type. Check for miss-spelling or missing package: https://petsc.org/release/install/install/#external-packages
[0] Unknown PetscViewer type given: hdf5

I’ve emailed petsc-users about this, and it seems that PETSc has to be configured with the --with-hdf5 flag in order to take advantage of those features. I’ve tried to run the following commands inside the docker container :

root@whatever: cd /usr/local/petsc/
root@whatever: ./configure --with-hdf5 --with-petsc4py --force
root@whatever: make all

This takes time but produces same traceback as above. I’m trying to change the dockerfile to have a dolfinx image with hdf5 now, adding the proper flags in the PETSc section.

My motivation is that this seems the simplest way to obtain parallel reading and writing of files that can be understood by dolfinx in either real or complex mode, and my use case requires me to shift between the two.

In this case, the elegant viewer solution proposed by @marchirschvogel fails. I don’t know how to write a parallel code that would write the file in the correct format for readFunction.

If other people have run into this any insights are welcome.

Ok I’ve given up some time ago on making HDF5 work, MPIIO binary files is good enough for me using :

viewer = PETSc.Viewer().createMPIIO('name.dat', 'w', self.comm)
variable.vector.view(viewer)

And :

viewer = PETSc.Viewer().createMPIIO('name.dat', 'r', self.comm)
variable.vector.load(viewer)
variable.vector.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

These snippets worked for me until dolfinx 0.5.0. I can’t seem to read the files right, it always produces garbage. It seems to me that there is no read/write_checkpoint in dolfinx.io.XDMFFile yet, so I’m curious as to how to save a Function object to read later (in parallel).

Surely I’m not the only one with this issue ?

For what it’s worth, the h5py package should in principle be able to read HDF5 files in parallel, if it’s been built using libhdf5-parallel.

Yes but there are multiple downsides to this approach, it requires an additional package, it’s not so straightforward to build in the container, and the only big positive thing that I can see is that it could result in a file that’s not processor-number dependent, whereas my current solution fails if a 10 proc files is read by 4 procs.

Building h5py in a docker container (based of dolfinx/dolfinx) is straightforward with the command:

HDF5_MPI="ON" HDF5_DIR="/usr/local" pip3 install --no-cache-dir --no-binary=h5py h5py --upgrade
1 Like

I’m building h5py as suggested, then trying a naive approach to save my files :

viewer = pet.Viewer().createHDF5("file.h5", 'w', comm)
function.vector.view(viewer)

However, this fails and produces :

petsc4py.PETSc.Error: error code 86
[9] PetscViewerSetType() at /usr/local/petsc/src/sys/classes/viewer/interface/viewreg.c:435
[9] Unknown type. Check for miss-spelling or missing package: https://petsc.org/release/install/install/#external-packages
[9] Unknown PetscViewer type given: hdf5

I’m going to reach out to the PETSc community about this. My pydoc shows that I’m using existing functions.

Overall though, I’m surprised it’s such a pain to read and write results. Someone’s bound to have had this issue before - there’s a need for a read/write_checkpoint-like…

It sounds like your PETSc was not configured to use hdf5. Can you confirm if it was? e.g. look for references in

$ objdump -p /usr/lib/x86_64-linux-gnu/libpetsc.so | grep hdf5

(or the equivalent path on your system)

The PETSc build needs to be configured to include hdf5, e.g. configured with --with-hdf5-include=$(HDF5_INCLUDE_DIR) --with-hdf5-lib="$(HDF5_LIBS)"