Multiprocess function mapping or any fast alt. method

Hello,

I have a function u, which is a solution of a PDE on a 3d mesh.
I can obtain the value of this function (scalar field) by doing

local_val = u(x,y,z)
print(local_val)

and this works just as expected.

Now I would like to have a long list of coordinates coord_list and map it on this function u.
However,

from multiprocessing import Pool
coord_list = [[0,0,0],[0,1,0]] # a very long list of coordinates to be evaluated at
pool = Pool()
outputs = pool.map(u,coord_list)
pool.close()
pool.join()

returns pickle error (fenics function object is unpickleable).

I can do the same thing by iterating through coord_list in a for loop, but it is extremely slow and it is a single core operation, which is a waste of cpu time.
Here what I’d like to get is a returned numpy array containing local values of u evaluated at each coordinate. Is there any good method to this?

To make your approach with the multiprocessing module work, consider:

from dolfin import *
from multiprocessing import Pool
import numpy as np
import time

def u_eval(x):
    return u(x)

if __name__ == '__main__':
    mesh = UnitCubeMesh(50,50,50)
    
    u = interpolate(Expression('x[0]*x[1]*x[2]', degree = 3), FunctionSpace(mesh, 'CG', 3))
    coord_list = list(mesh.coordinates())

    start = time.time()
    pool = Pool(4)
    outputs_1 = pool.map(u_eval,coord_list)
    pool.close()
    pool.join()
    print('Time multiprocessing:', time.time()-start)
    
    start = time.time()
    outputs_2 = [u(coord) for coord in coord_list]
    print('Time list comprehension:', time.time() - start)
    
    
    start = time.time()
    outputs_3 = list()
    for coord in coord_list:
        outputs_3.append(u(coord))
    print('Time loop:', time.time() - start)
    
    print(np.linalg.norm(np.array(outputs_1)-np.array(outputs_2)))
    print(np.linalg.norm(np.array(outputs_2)-np.array(outputs_3)))
1 Like

Note that if the points are defined by the mesh coordinates, a way faster method for CG/DG spaces, is to directly access the function vector:

from dolfin import *
from multiprocessing import Pool
import numpy as np
import time

def u_eval(x):
    return u(x)

if __name__ == '__main__':
    mesh = UnitCubeMesh(50,50,50)
    V =  FunctionSpace(mesh, 'CG', 3)
    u = interpolate(Expression('x[0]*x[1]*x[2]', degree = 3),V)
    mesh_coords = mesh.coordinates()

    start = time.time()
    x = V.tabulate_dof_coordinates()
    u_arr = u.vector().get_local()
    dof_vertex_indices = np.flatnonzero(np.isin(V.tabulate_dof_coordinates()
                                                ,mesh_coords).all(axis=1))
    vals = np.zeros(len(dof_vertex_indices))
    for i,j  in enumerate(dof_vertex_indices):
        vals[i] = u_arr[j]
    end = time.time()
    print("Time dof coords: {0:.2f}".format(end-start))

    # Use same ordering of coordinates for all methods
    coords = x[dof_vertex_indices]

    start = time.time()
    outputs_2 = [u(coord) for coord in coords]
    end = time.time()
    print('Time list comprehension:', end - start)


    start = time.time()
    outputs_3 = list()
    for coord in coords:
        outputs_3.append(u(coord))
    print('Time loop:', time.time() - start)


    print(np.linalg.norm( np.array(outputs_2)-vals))
    print(np.linalg.norm(np.array(outputs_2)-np.array(outputs_3)))

This approach can either be used in serial, or in parallel with mpi (mpirun -n 4 python3 script.py)

Note that to make the comparison fair, I’ve included all vector operations on the dof coordinates (The runtime of the evaluation loop itself is negligible, it is tabulating the dof coordinates that are costly).
Running the script in serial yields:

Time dof coords: 1.77
Time multiprocessing: 2.0806162357330322
Time list comprehension: 3.4157373905181885
Time loop: 3.3714652061462402

Running it in parallel on 4 processors, excluding multiprocessing, yields:

Time dof coords: 0.67
Time list comprehension: 0.8781118392944336
Time loop: 0.7395052909851074
Time dof coords: 0.70
Time list comprehension: 0.8745849132537842
Time loop: 0.7665245532989502
Time dof coords: 0.68
Time list comprehension: 0.8844482898712158
Time loop: 0.7806322574615479

Time dof coords: 0.70
Time list comprehension: 0.9165513515472412
Time loop: 0.9084732532501221

Note that the runtime for all methods are reduced, as the mesh coordinates are distributed over individual processors.

3 Likes

Yesterday I tried with a helper function (just exactly like you suggested), but I got an error.
So I assumed it doesn’t work with fenics for some reason, but today I restarted the server and it worked.

Now I cannot reproduce the issue I had. This itself is interesting, but in any case it works and I’m happy with it :slight_smile: Thank you!

While this itself is not exactly what I’m trying to do (meshing grids and evaluating coordinates are different in my case, I should’ve mentioned that), but this is definitely something I will need soon. Thanks for providing the knowledge.