Multiprocess function mapping or any fast alt. method

Sterling · August 13, 2020, 6:46am

Hello,

I have a function u, which is a solution of a PDE on a 3d mesh.
I can obtain the value of this function (scalar field) by doing

local_val = u(x,y,z)
print(local_val)

and this works just as expected.

Now I would like to have a long list of coordinates coord_list and map it on this function u.
However,

from multiprocessing import Pool
coord_list = [[0,0,0],[0,1,0]] # a very long list of coordinates to be evaluated at
pool = Pool()
outputs = pool.map(u,coord_list)
pool.close()
pool.join()

returns pickle error (fenics function object is unpickleable).

I can do the same thing by iterating through coord_list in a for loop, but it is extremely slow and it is a single core operation, which is a waste of cpu time.
Here what I’d like to get is a returned numpy array containing local values of u evaluated at each coordinate. Is there any good method to this?

volkerk · August 13, 2020, 7:55am

To make your approach with the multiprocessing module work, consider:

from dolfin import *
from multiprocessing import Pool
import numpy as np
import time

def u_eval(x):
    return u(x)

if __name__ == '__main__':
    mesh = UnitCubeMesh(50,50,50)
    
    u = interpolate(Expression('x[0]*x[1]*x[2]', degree = 3), FunctionSpace(mesh, 'CG', 3))
    coord_list = list(mesh.coordinates())

    start = time.time()
    pool = Pool(4)
    outputs_1 = pool.map(u_eval,coord_list)
    pool.close()
    pool.join()
    print('Time multiprocessing:', time.time()-start)
    
    start = time.time()
    outputs_2 = [u(coord) for coord in coord_list]
    print('Time list comprehension:', time.time() - start)
    
    
    start = time.time()
    outputs_3 = list()
    for coord in coord_list:
        outputs_3.append(u(coord))
    print('Time loop:', time.time() - start)
    
    print(np.linalg.norm(np.array(outputs_1)-np.array(outputs_2)))
    print(np.linalg.norm(np.array(outputs_2)-np.array(outputs_3)))

dokken · August 13, 2020, 10:11am

Note that if the points are defined by the mesh coordinates, a way faster method for CG/DG spaces, is to directly access the function vector:

from dolfin import *
from multiprocessing import Pool
import numpy as np
import time

def u_eval(x):
    return u(x)

if __name__ == '__main__':
    mesh = UnitCubeMesh(50,50,50)
    V =  FunctionSpace(mesh, 'CG', 3)
    u = interpolate(Expression('x[0]*x[1]*x[2]', degree = 3),V)
    mesh_coords = mesh.coordinates()

    start = time.time()
    x = V.tabulate_dof_coordinates()
    u_arr = u.vector().get_local()
    dof_vertex_indices = np.flatnonzero(np.isin(V.tabulate_dof_coordinates()
                                                ,mesh_coords).all(axis=1))
    vals = np.zeros(len(dof_vertex_indices))
    for i,j  in enumerate(dof_vertex_indices):
        vals[i] = u_arr[j]
    end = time.time()
    print("Time dof coords: {0:.2f}".format(end-start))

    # Use same ordering of coordinates for all methods
    coords = x[dof_vertex_indices]

    start = time.time()
    outputs_2 = [u(coord) for coord in coords]
    end = time.time()
    print('Time list comprehension:', end - start)


    start = time.time()
    outputs_3 = list()
    for coord in coords:
        outputs_3.append(u(coord))
    print('Time loop:', time.time() - start)


    print(np.linalg.norm( np.array(outputs_2)-vals))
    print(np.linalg.norm(np.array(outputs_2)-np.array(outputs_3)))

This approach can either be used in serial, or in parallel with mpi (mpirun -n 4 python3 script.py)

Note that to make the comparison fair, I’ve included all vector operations on the dof coordinates (The runtime of the evaluation loop itself is negligible, it is tabulating the dof coordinates that are costly).
Running the script in serial yields:

Time dof coords: 1.77
Time multiprocessing: 2.0806162357330322
Time list comprehension: 3.4157373905181885
Time loop: 3.3714652061462402

Running it in parallel on 4 processors, excluding multiprocessing, yields:

Time dof coords: 0.67
Time list comprehension: 0.8781118392944336
Time loop: 0.7395052909851074
Time dof coords: 0.70
Time list comprehension: 0.8745849132537842
Time loop: 0.7665245532989502
Time dof coords: 0.68
Time list comprehension: 0.8844482898712158
Time loop: 0.7806322574615479

Time dof coords: 0.70
Time list comprehension: 0.9165513515472412
Time loop: 0.9084732532501221

Note that the runtime for all methods are reduced, as the mesh coordinates are distributed over individual processors.

Sterling · August 14, 2020, 6:09am

Yesterday I tried with a helper function (just exactly like you suggested), but I got an error.
So I assumed it doesn’t work with fenics for some reason, but today I restarted the server and it worked.

Now I cannot reproduce the issue I had. This itself is interesting, but in any case it works and I’m happy with it Thank you!

Sterling · August 14, 2020, 6:11am

While this itself is not exactly what I’m trying to do (meshing grids and evaluating coordinates are different in my case, I should’ve mentioned that), but this is definitely something I will need soon. Thanks for providing the knowledge.

Topic		Replies	Views
Parallel multiple tasks in class-based program	7	2527	October 14, 2020
Problem in evaluating function at point in parallel with dolfinx dolfinx	1	838	May 24, 2022
Question/difficulty with parallel implementation General	4	479	June 1, 2023
Proper coordinates for eval()-method	4	3647	September 28, 2019
Assign values at DOF in parallel using MPI General dolfinx , mpi	2	168	February 21, 2024

Multiprocess function mapping or any fast alt. method

Related topics