How to reduce the numbers of interpolation

Hello everyone. Suppose I have some time dependent functions u(t), v(t), w(t) in function space V. Now I want to obtain the function \int_{t=0}^{T}\nabla u(t)\cdot \nabla w(t)\,dt and \int_{t=0}^{T}v(t) \nabla u(t)\cdot \nabla w(t)\,dt. Generally I will use the time discretization form, that is \sum_{i=1}^{N}dt*\nabla u(t_i)\cdot \nabla w(t_i) and \sum_{i=1}^{N}dt*v(t_i)\nabla u(t_i)\cdot \nabla w(t_i). I will use the interpolation method such as:

grad_u = ufl.grad(u)
grad_w = ufl.grad(w)
expr1 = ufl.inner(grad_u, grad_w)
expr2 =  v*ufl.inner(grad_u, grad_w)
for i in range(N):
      u.x.array[:] = U[i,:]
      w.x.array[:] = W[i,:]
      v.x.array[:]  = V[i,:]
      k1.interpolate(fem.Expression(expr1, V.element.interpolation_points()))
      k2.interpolate(fem.Expression(expr2, V.element.interpolation_points()))

However, I find that interpolation is really time-costing, so I wonder if there is a way such that I only need to do one interpolation instead of doing it inside the time loop.

In your code snippet, k1 and k2 are not actually used. How do they show in the remainder of your code?

They are used in my later code. In fact, this step is to calculate the gradient of two functions. I will do something like u_1=u_1+k_1 to obtain new u_1.

The interpolations of k are the expensive bits right? I doubt that action itself can be sped up much (though I’d happily be corrected), so you’d have to think about smart ways of avoiding that operation all together. Without knowing exactly how you end up using k it is a bit difficult to offer advice.

From your second comment it seems like you could just add inner(grad(u), grad(w) ) to u in whatever operation you perform next.

Thanks for your advice! I think this might be difficult because of the time integration. I wonder if I can define a function space \tilde{V}, where the functions \nabla u\cdot \nabla v lie in?

That could be a good idea. Not sure if it ends up being much faster, but you could definitely try that. From a consistency level that might certainly be better than interpolating on the V interpolation points: assuming V is C0, then grad(u) and grad(v) would be discontinuous from element to element. So on the element boundary interpolation points grad(u) would be multi-valued.

If V is C0 and P1, then would would need a DG0 space for k1 and DG1 for k2.

In legacy FEniCS one could implement a per-element projection for DG spaces, which was much faster than the global one. There are some code snippets around. Not sure if that is still needed for FEniCSx

1 Like

That depends on the space of u and v. Say u is a first order Lagrange function. The the gradient is a piecewise constantfunction (vector components). The dot product with a v from the same space yields that $\nabla u \cdot \nabla v is in piecewise constant function.

Please note that one thing that would speed up your code is to move fem.Expression(expr1/2, V.element.interpolation_points()) outside the loop, as you will then avoid cache lookup.

I am suprised that the interpolation of this expression is that costly. Could you provide a reproducible example?

Of course yes. Here is my code.

from dolfinx import mesh, fem
from dolfinx.mesh import create_rectangle
import ufl
import numpy as np
import time

Lx,Ly =400, 400
nx, ny =400, 400
domain=create_rectangle(MPI.COMM_WORLD, [np.array([0, 0]), np.array([Lx, Ly])],
                               [nx, ny],  mesh.CellType.triangle)
V = fem.functionspace(domain, ("Lagrange", 1))
U_now = fem.Function(V)
u_n = fem.Function(V)
uh, w = fem.Function(V), fem.Function(V)
grad_U_now = ufl.grad(U_now)
grad_u_n = ufl.grad(u_n)
expr1 = ufl.inner(grad_U_now, grad_u_n)
U_now.x.array[:]=np.ones(401*401)
u_n.x.array[:]=np.ones(401*401)
tstart=time.time()
w.interpolate(fem.Expression(expr1, V.element.interpolation_points()))
tend=time.time()
print(tend-tstart)

The interpolation will cost me 0.03 seconds per step. So when the time loop is 1000 steps, the total cost for interpolation is at least 30s.

As I stated, if you split interpolate into two pieces, compile (once), then interpolate within the loop, you can save some time.

from mpi4py import MPI
from dolfinx import mesh, fem
from dolfinx.mesh import create_rectangle
import ufl
import numpy as np
import time

Lx, Ly = 400, 400
nx, ny = 400, 400
domain = create_rectangle(MPI.COMM_WORLD, [np.array([0, 0]), np.array([Lx, Ly])],
                          [nx, ny],  mesh.CellType.triangle)
V = fem.functionspace(domain, ("Lagrange", 1))
U_now = fem.Function(V)
u_n = fem.Function(V)
uh, w = fem.Function(V), fem.Function(V)
grad_U_now = ufl.grad(U_now)
grad_u_n = ufl.grad(u_n)
expr1 = ufl.inner(grad_U_now, grad_u_n)
U_now.x.array[:] = np.ones(401*401)
u_n.x.array[:] = np.ones(401*401)
tstart = time.perf_counter()
w.interpolate(fem.Expression(expr1, V.element.interpolation_points()))
tend = time.perf_counter()
print(f"Combined approach: {tend-tstart:.5e}")

tstart = time.perf_counter()
compile_expr = fem.Expression(expr1, V.element.interpolation_points())
tend = time.perf_counter()
print(f"Compile expr (one time): {tend-tstart:.5e}")

tstart = time.perf_counter()
w.interpolate(compile_expr)
tend = time.perf_counter()
print(f"Interpolate: {tend-tstart:.5e}")
Combined approach: 1.95868e-02
Compile expr (one time): 1.04306e-03
Interpolate: 1.66111e-02

I guess interpolation is not the bottle-neck of your problem, as the solve action of your problem must surely be more expensive than this?

1 Like

Well, yes and no. The time I solve the convection-diffusion equation is 0.2s per step, but half of the step is used for interpolation, which is 0.1s since I have two functions to interpolate. The split method can help, but it seems that the major cost still remains. Is this an essential computational cost?

I think another way is to remove the interpolation outside the loop. As I mentioned before, is it possible to make the function w=\nabla u \cdot \nabla v in a proper function space? So that inside the loop, the only thing I need to do is the summation of some numpy arrays. Or is it possible to add the fem.Expression in some way?