How to efficiently interface fenicsx with PyG

I am using Fenicsx to generate PyG graph dataset. My aim is to train a graph neural network on cylinder flow or fluidic pinball data.
I am having trouble efficiently converting fenicsx simulation results to the required data structure.

My fenics implementation outputs simulation results as a Dolfinx.fem.Function and I must convert its vector output to the following format :

  • data.x: Node feature matrix with shape [num_nodes, num_node_features] (node features will be x and y velocity and pressure)
  • data.edge_index: Graph connectivity (pairs of connected nodes) in COO format with shape [2, num_edges] and type torch.long
  • data.edge_attr: Edge feature matrix with shape [num_edges, num_edge_features]
  • data.y: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, *] or graph-level targets of shape [1, *]
  • data.pos: Node position matrix with shape [num_nodes, num_dimensions]

I have a working implementation but I find it very naive and it is very slow for larger meshes :

I) after computing fenics solution stored in up_temp

I run this code to aggregate node features (u_x,u_y, p). I only store results for P1 nodes, velocity at 2nd order integration points is not stored, this allows me not to handle missing data.

(u_f, p_f) = (up_temp.sub(0).collapse(), up_temp.sub(1).collapse())
##################################
pyvista_cells, cell_types, coordinates = dolfinx.plot.create_vtk_mesh(u_f.function_space)
values = u_f.x.array.reshape(coordinates.shape[0], u_f.function_space.dofmap.index_map_bs)

#################################
  
pyvista_cells, cell_types, coordinates_p = dolfinx.plot.create_vtk_mesh(p_f.function_space)
values_p = p_f.x.array.reshape(coordinates_p.shape[0], p_f.function_space.dofmap.index_map_bs)

features_oseen =  np.concatenate((coordinates_p[:,:2],values_p), axis = 1)
  
u_proj = np.zeros((coordinates_p.shape[0],2))
  
for i in range(coordinates.shape[0]):
    for j in range(coordinates_p.shape[0]) :
        if coordinates[i,0] == coordinates_p[j,0] and coordinates[i,1] == coordinates_p[j,1]:
            u_proj[j,:] = values[i,:]
  
feature =  np.concatenate((features, u_proj) , axis = 1)

The feature matrix holds (2d coordinates, 2d velocity and pressure) for each node.

II) Having our node and node features i create the adjacency list in the following way :

    connect = np.copy(mesh.topology.connectivity(2,0).array)

    connect = connect.reshape((int(connect.shape[0]/3),3)) # this gives me an array of triangles : [[node1,node2,node3],[....]....]

    edges = np.zeros((2,1))

    for i in connect : # loop over triangles
        for j in i :  # loop over nodes of given triangle
            for k in i :  # loop over nodes of given triangle

                if k != j : # do not add self connections

                    # a convoluted way to detect of the current edge has been already stored (because triangle share edges)

                    temp_edge = np.zeros((2,1))
                    temp_edge[0] = j
                    temp_edge[1] = k

                    temp_list = np.argwhere((edges[0,:]-temp_edge[0])==0) 

                    flag = 0

                    for l in temp_list :
                        if edges[1,int(l)] == temp_edge[1] :

                            flag += 1

                    if flag == 0:

                        edges = np.concatenate((edges,temp_edge), axis = 1)

    #################################

Does anyone know of a better way to perform those two tasks, is their a cleverer way, or a even better, built in dolfinx / gmsh functions that can do this better.

Thanks for your help.

Nicolas

If all your loops are using numpy data structures, I would suggest using numba to speed up your loops.

Another related question is: How should FEniCSx interface with PyG in parallel (running with multiple MPI processes?)

I think you can create the edge adjacency list in a simpler way, since dolfinx has a built in function to access the node connectivity

mesh.topology.create_connectivity(1, 0)
node_adjacency_list = mesh.topology.connectivity(1, 0)
edge_index = np.reshape(node_adjacency_list.array, (-1, 2))
edge_index = torch.from_numpy(edge_index.astype(np.int64)).t().contiguous()

As for you first code block, I don’t understand why you need to do all this. Why not simply use up_temp.x.array.reshape(...) as your feature vector? For GNNs, you’re going to use nodal discretization anyway so there shouldn’t be any shape incompatibility.

Should it even bother? Using MPI clusters for deep learning is quite esoteric. Most of the research targets multi GPU rigs. In my opinion MPI FEM outputs should be accumulated on a single node and passed to the gpu.

1 Like

Thank you both for your help, I ended up using the previous code once to see which node is linked to which DOFs, this is then stored in a list for easier access after initialisation.

The parallel question is interesting, for the moment I am concentrating on having a working code before improving its speed