Speeding up time-dependent diffusion equation

Hi all,

I am working on solving a diffusion equation with a time-dependent diffusion tensor. For that, I am following the Diffusion of a Gaussian function demo, as well as the discussion in the similar thread Time dependent diffusion constant in dolfinx. However, because my diffusion coefficient is a tensor, I cannot use a simple trick from the previous post (D_.value[0]= 0.1*D) to reassign the value of the diffusion constituent and have to rewrite the tensor completely. As a result, the following re-assembly of the A matrix is taking a very long time on each timestep and I was wondering if there is a way to speed things up.

The example code is as follows (very minor edits to the original Diffusion of a Gaussian function demo). While this example still works quite fast although noticeably slower than the original, this approach results in an extremely large delay for the more complex problem that I am trying to solve.

import numpy as np

from mpi4py import MPI
from petsc4py import PETSc

from dolfinx import fem, mesh, io, plot

# Define temporal parameters
t = 0 # Start time
T = 1.0 # Final time
num_steps = 50
dt = T / num_steps # time step size

# Define mesh
nx, ny = 50, 50
domain = mesh.create_rectangle(MPI.COMM_WORLD, [np.array([-2, -2]), np.array([2, 2])],
                               [nx, ny], mesh.CellType.triangle)
V = fem.FunctionSpace(domain, ("CG", 1))


# Create initial condition
def initial_condition(x, a=5):
    return np.exp(-a*(x[0]**2+x[1]**2))
u_n = fem.Function(V)
u_n.name = "u_n"
u_n.interpolate(initial_condition)

# Create boundary condition
fdim = domain.topology.dim - 1
boundary_facets = mesh.locate_entities_boundary(
    domain, fdim, lambda x: np.full(x.shape[1], True, dtype=bool))
bc = fem.dirichletbc(PETSc.ScalarType(0), fem.locate_dofs_topological(V, fdim, boundary_facets), V)



uh = fem.Function(V)
uh.name = "uh"
uh.interpolate(initial_condition)

import ufl
u, v = ufl.TrialFunction(V), ufl.TestFunction(V)
f = fem.Constant(domain, PETSc.ScalarType(0))

D = ufl.as_tensor(((1, 0), (0, 2)))
a = u * v * ufl.dx + dt*ufl.inner(D * ufl.grad(u), ufl.grad(v)) * ufl.dx
L = (u_n + dt * f) * v * ufl.dx

bilinear_form = fem.form(a)
linear_form = fem.form(L)

A = fem.petsc.assemble_matrix(bilinear_form, bcs=[bc])
A.assemble()
b = fem.petsc.create_vector(linear_form)

solver = PETSc.KSP().create(domain.comm)
solver.setOperators(A)
solver.setType(PETSc.KSP.Type.PREONLY)
solver.getPC().setType(PETSc.PC.Type.LU)

for i in range(num_steps):
    t += dt

    print(t)
    D = ufl.as_tensor(((1 - t, 0), (0, 2 - t)))
    a = u * v * ufl.dx + dt*ufl.inner(D * ufl.grad(u), ufl.grad(v)) * ufl.dx
    print('----> Assembling')
    A = fem.petsc.assemble_matrix(fem.form(a), bcs=[bc])
    A.assemble()
    solver.setOperators(A)

    # Update the right hand side reusing the initial vector
    with b.localForm() as loc_b:
        loc_b.set(0)
    fem.petsc.assemble_vector(b, linear_form)

    # Apply Dirichlet boundary condition to the vector
    fem.petsc.apply_lifting(b, [bilinear_form], [[bc]])
    b.ghostUpdate(addv=PETSc.InsertMode.ADD_VALUES, mode=PETSc.ScatterMode.REVERSE)
    fem.petsc.set_bc(b, [bc])

    # Solve linear problem
    solver.solve(b, uh.vector)
    uh.x.scatter_forward()

    # Update solution at previous time step (u_n)
    u_n.x.array[:] = uh.x.array


Constants can have the shape of a tensor.
See for instance https://github.com/FEniCS/dolfinx/blob/deac4f1cba124f09302f8bb501777f3e25a827c4/python/test/unit/fem/test_constant.py#L54

@dokken thanks.
But what to do in the more general case if elements of the D tensor are expression (e.g., depend on the spatial coordinates)?

You can make a tensor with ufl.as_tensor that is a mixture of ufl.SpatialCoordinate and constants (say for time)`
For instance you could do something like

x = ufl.SpatialCoordinate(domain)
a = dolfinx.fem.Constant(domain, 0.)
t = dolfinx.fem.Constant(domain, 0.1)
D = ufl.as_tensor(((a*x[0]-t, 0),(x[1], t)))

and update a and t in your time loop.

1 Like

Yup, that’s what I am doing in the code above and it works perfectly fine, however, I was wondering if there is a way to speed up the assembly of the matrix on each timestep after updating D. The command

A = fem.petsc.assemble_matrix(fem.form(a), bcs=[bc])
A.assemble()

takes vary long, thus slowing down the computations significantly. Any ideas?

One immediate optimisation is to take fem.form(a) outside of the loop. This will be looking up the cache on your file system and potentially compiling a form each call. This is very expensive and not its intended use.

This also means you should refactor constant values (e.g. t) into a dolfinx.fem.Constant and update them. Then you don’t have to recompile the form every time step.

Trying to find more optimisations will likely involve splitting up your stiffness matrix into linear combinations of those with multiples of updating constants. You then won’t have to assemble the FE matrix each time step, just a linear combination of matrices which I believe is scalable.

You can then try to accommodate for these trivial matrix updates in the LU factorisation. If you can reuse the LU factorisation or, like with the matrix operators compose it of a linear combination, you should see a big performance leap.

4 Likes

Hello Nate! I think now I have the same problem. The heat equation I want to solve has a time-dependent diffusion coefficient. Something like \kappa(x,t)=e^{-at}\kappa_1+(1-e^{at})\kappa_2. In this case, how to take the form outside the time loop? Here is my code:

for n in range(num_steps):
        f.x.array[:] = f_source[n,:]
        kappa.x.array[:] = ks[:] + np.exp(-alpha*(n+1)*dt)*(k0[:]-ks[:])
        F = u * v * ufl.dx +dt *kappa * ufl.dot(ufl.grad(u), ufl.grad(v)) * ufl.dx - (u_n + dt * f) * v * ufl.dx
        a = fem.form(ufl.lhs(F))
        L = fem.form(ufl.rhs(F))
        A = assemble_matrix(a)
        A.assemble()

See for instance: Efficient usage of the Unified Form Language — FEniCS Workshop

Thanks! I have solved this problem using the above discussion. I think the example given in the workshop may have the same bug as the old version 0.8 when using u.vector. In version 0.9, using the u.x.petsc_vec can avoid this bug.

What do you mean by bug here?

Well, in Version 0.8, when I use the command “python xx.py”,it will use all available cores on my server. However, it seems that all the cores are doing the same thing. After changes in Version 0.9, It will only use one core and the speed is faster than before.

Try setting the environment variable OMP_NUM_THREADS=1 prior to calling your script.

Yes, I used to try it and it worked. But I am confused about it since in Version 0.9 there is no such problem. I will add the command in my script.

This depends on how dependencies have been installed.