I figured this out and listed here to help others with the same need.

One thing I noticed is that,

`mesh.coordinates()`

will return coordinates in coordinates order, and

`from fenics import dof_to_vertex_map`

this method, when called with P1 function space as parameter will return a map from degree of freedoms to vertex order (coordinates order). say we have the map, d2v.

Thus,

F1 = FunctionSpace(mesh, ‘P’, 1)

dof_map = F1.dofmap()

first_dof, last_dof = dof_map.ownership_range()

local_counts = last_dof - first_dof

coord_dof = mesh.coordinates()[d2v[:local_counts]]

we get coordinates in the order of dofs, which is exactly the order of,

`u.vector()`

The reason I slice d2v to :local_counts (and thus coordiantes) is that it contain more than we need, I don’t quite figure out why here. Maybe there is some overlap between different partitions.

Then we use MPI to gather all coordinates, we sort it to get a map that transfer dofs to vertex. Let’s say we have gathered coordinates,

`coord_all`

we can get the order by,

ind = np.lexsort((coord_all[:, 0], coord_all[:, 1]))

Finally, we can gather the solution as a numpy array then change it to the coordinates order simply by,

u.vector().get_local()

u_all[ind]

To scatter back to each processor, simply reverse ind,

ind_inv[ind] = np.arange(0, len(ind), dtype=int)

scatter_buf = f_all[ind_inv]