PETSc NonlinearProblem running slow

Hi, I have adapted the code from demo_elastodynamics.py in order to make a similar solver in dolfinx. However, it runs much slower than the legacy version.

In the current state of the code (see below), it takes 40 secs to run for a mesh with 500 cells and 50 timesteps. This is worrying because A) it is much slower than the legacy version of effectively the same code and B) eventually I want to multiply both of these parameters by 100.

I am new to the use of FEniCS/dolfinx but I assume a better problem-solver configuration is the solution? Any help on this would be appreciated. :slightly_smiling_face:

import numpy as np
import matplotlib.pyplot as plt
from mpi4py import MPI
from petsc4py import PETSc
from dolfinx import fem, mesh,default_scalar_type
from dolfinx.io import XDMFFile
from dolfinx import *
import ufl
import dolfinx.fem.petsc, dolfinx.nls.petsc
import time
start_time = time.time()

# Create mesh and function space
L = 1.0  # Length of the rod
nx, ny, nz = 20, 5, 5  # Number of elements in x, y, and z directions
domain = mesh.create_box(MPI.COMM_WORLD, [np.array([0, 0, 0]), np.array([L, 0.1, 0.1])], [nx, ny, nz], cell_type=mesh.CellType.hexahedron)
V = fem.VectorFunctionSpace(domain, ("CG", 1))

fdim = domain.topology.dim - 1

def clamped_boundary(x):
    return np.isclose(x[0], 0) #|np.isclose(x[0], L)

boundary_facets = mesh.locate_entities_boundary(domain, fdim, clamped_boundary)

u_D = np.array([0, 0, 0], dtype=default_scalar_type)
bc = fem.dirichletbc(u_D, fem.locate_dofs_topological(V, fdim, boundary_facets), V)

# Elasticity constants
E = 1.0e6
nu = 0.3
mu = E / (2.0 * (1 + nu))
lmbda = E * nu / ((1.0 + nu) * (1.0 - 2.0 * nu))

# Time integration parameters
am = 0.2
af = 0.4
g = 0.5 + af - am
alpha_m = fem.Constant(domain, am)
alpha_f = fem.Constant(domain, af)
gamma = fem.Constant(domain, g)
beta = fem.Constant(domain, (g + 0.5)**2 / 4.0)

rho = 1.0
eta_m = 0.0
eta_k = 0.0

T = 1.0
Nsteps = 50
dt = fem.Constant(domain, T / Nsteps)

# Functions and fields
u_ = ufl.TestFunction(V)
u = fem.Function(V, name='Displacement')
u_old = fem.Function(V)
v_old = fem.Function(V)
a_old = fem.Function(V)

# External load
B = fem.Constant(domain, PETSc.ScalarType((0.0, 0.0, -1.0)))  # Applied load in the z-direction

# Measures
dx = ufl.Measure("dx", domain=domain)

# GA method functions
def sigma(r):
    return lmbda*ufl.nabla_div(r)*ufl.Identity(len(r)) + 2*mu*ufl.sym(ufl.grad(r))
    
def m(u,u_):
    return rho*ufl.inner(u,u_)*dx

def k(u,u_):
    return ufl.inner(sigma(u),ufl.grad(u_))*dx

def c(u,u_):
    return eta_m*m(u,u_) + eta_k*k(u,u_)

#update variables
def update_a(u, u_old, v_old, a_old, ufl_=True):
    if ufl_:
        dt_ = dt
        beta_ = beta
    else:
        dt_ = dt
        beta_ = beta.value
    return (u-u_old-dt_*v_old)/beta_/dt_**2 - (1-2*beta_)/2/beta_*a_old


def update_v(a,u_old,v_old,a_old,ufl_=True):
    if ufl_:
        dt_ = dt
        gamma_ = gamma
    else:
        dt_ = dt
        gamma_ = gamma.value
    return v_old + dt_*((1-gamma_)*a_old + gamma_*a)

def update_fields(u,u_old,v_old,a_old):
    u_vec,u0_vec = u.x.array[:],u_old.x.array[:]
    v0_vec,a0_vec = v_old.x.array[:],a_old.x.array[:]
    
    a_vec = update_a(u_vec,u0_vec,v0_vec,a0_vec,ufl_=False)
    v_vec = update_v(a_vec,u0_vec,v0_vec,a0_vec,ufl_=False)
    v_old.x.array[:] = v_vec
    a_old.x.array[:] = a_vec
    u_old.x.array[:] = u_vec
    
def avg(x_old,x_new,alpha):
    return alpha*x_old + (1-alpha)*x_new

a_new = update_a(u,u_old,v_old,a_old,ufl_=True)
v_new = update_v(a_new,u_old,v_old,a_old,ufl_=True)

# Problem setup
res =  m(avg(a_old,a_new,alpha_m),u_) + c(avg(v_old,v_new,alpha_f),u_) + k(avg(u_old,u,alpha_f),u_) - rho*ufl.inner(B,u_)*dx  #residual to solve
problem = fem.petsc.NonlinearProblem(res,u,[bc])

# Solver setup
solver = nls.petsc.NewtonSolver(MPI.COMM_WORLD, problem)
solver.atol = 1e-8
solver.rtol = 1e-8
solver.convergence_criterion = "incremental"

# Quantities to track
times = np.linspace(0, T, Nsteps + 1)

# Create file
with XDMFFile(domain.comm, "rod_dbc0.xdmf", "w") as xdmf_file:
    xdmf_file.write_mesh(domain)
    xdmf_file.write_function(u, 0)

# Write function at each time step
for i in range(Nsteps):
    t = times[i + 1]
    print("Time = %s" % t )
    num_its, converged = solver.solve(u)
    assert converged
    u.x.scatter_forward()

    # Update old fields with new quantities
    update_fields(u, u_old, v_old, a_old)

    # Write the function to the XDMF file
    xdmf_file.write_function(u, t)

print("--- %s seconds runtime ---" % (time.time() - start_time))

You should add also the code of the legacy implementation you adapted from the the old demo.

I am assuming you are not running the file you linked, because

  • in demo_elastodynamics:
E  = 1000.0
nu = 0.3
  • in your snippet:
E = 1.0e6
nu = 0.3

Comparing the runtimes which material properties so different seems unfair to me.

The stress tensor (sigma) is linear in E so I don’t think the magnitude of this constant should affect the runtime in the significant way I am looking for.

You have also not specified any solver options and these might differ between DOLFIN (legacy) and DOLFINx.

Note that almost all of the time is spent inside your function update_fields.
This is because of how you create v_vec and u_vec, which are not sensible objects to assign to a function.
Instead of working on the array data, you are assigning ufl objects to a numpy array.

There are various things here that you need to carefully address here (specifically mapping all constants to floats).
The following code takes 2.4 seconds to run:

import numpy as np
import matplotlib.pyplot as plt
from mpi4py import MPI
from petsc4py import PETSc
from dolfinx import fem, mesh, default_scalar_type
from dolfinx.io import XDMFFile
from dolfinx import *
import ufl
import dolfinx.fem.petsc
import dolfinx.nls.petsc
import time
start_time = time.time()

# Create mesh and function space
L = 1.0  # Length of the rod
nx, ny, nz = 20, 5, 5  # Number of elements in x, y, and z directions
domain = mesh.create_box(MPI.COMM_WORLD, [np.array([0, 0, 0]), np.array(
    [L, 0.1, 0.1])], [nx, ny, nz], cell_type=mesh.CellType.hexahedron)
V = fem.VectorFunctionSpace(domain, ("CG", 1))

fdim = domain.topology.dim - 1


def clamped_boundary(x):
    return np.isclose(x[0], 0)  # |np.isclose(x[0], L)


boundary_facets = mesh.locate_entities_boundary(domain, fdim, clamped_boundary)

u_D = np.array([0, 0, 0], dtype=default_scalar_type)
bc = fem.dirichletbc(u_D, fem.locate_dofs_topological(
    V, fdim, boundary_facets), V)

# Elasticity constants
E = 1.0e6
nu = 0.3
mu = E / (2.0 * (1 + nu))
lmbda = E * nu / ((1.0 + nu) * (1.0 - 2.0 * nu))

# Time integration parameters
am = 0.2
af = 0.4
g = 0.5 + af - am
alpha_m = fem.Constant(domain, am)
alpha_f = fem.Constant(domain, af)
gamma = fem.Constant(domain, g)
beta = fem.Constant(domain, (g + 0.5)**2 / 4.0)

rho = 1.0
eta_m = 0.0
eta_k = 0.0

T = 1.0
Nsteps = 50
dt = fem.Constant(domain, T / Nsteps)

# Functions and fields
u_ = ufl.TestFunction(V)
u = fem.Function(V, name='Displacement')
u_old = fem.Function(V)
v_old = fem.Function(V)
a_old = fem.Function(V)

# External load
B = fem.Constant(domain, PETSc.ScalarType((0.0, 0.0, -1.0))
                 )  # Applied load in the z-direction

# Measures
dx = ufl.Measure("dx", domain=domain)

# GA method functions


def sigma(r):
    return lmbda*ufl.nabla_div(r)*ufl.Identity(len(r)) + 2*mu*ufl.sym(ufl.grad(r))


def m(u, u_):
    return rho*ufl.inner(u, u_)*dx


def k(u, u_):
    return ufl.inner(sigma(u), ufl.grad(u_))*dx


def c(u, u_):
    return eta_m*m(u, u_) + eta_k*k(u, u_)

# update variables


def update_a(u, u_old, v_old, a_old, ufl_=True):
    if ufl_:
        dt_ = dt
        beta_ = beta
        return (u-u_old-dt_*v_old)/beta_/dt_**2 - (1-2*beta_)/2/beta_*a_old

    else:
        dt_ = dt
        beta_ = beta.value
        return (u-u_old-float(dt_)*v_old)/float(beta_)/float(dt_)**2 - (1-2*float(beta_))/2/float(beta_)*a_old


def update_v(a, u_old, v_old, a_old, ufl_=True):
    if ufl_:
        return v_old + dt*((1-gamma)*a_old + gamma*a)
    else:
        return v_old + float(dt) * ((1 - float(gamma)) * a_old + float(gamma)*a)


def update_fields(u, u_old, v_old, a_old):
    u_vec, u0_vec = u.x.array[:], u_old.x.array[:]
    v0_vec, a0_vec = v_old.x.array[:], a_old.x.array[:]

    a_vec = update_a(u_vec, u0_vec, v0_vec, a0_vec, ufl_=False)
    v_vec = update_v(a_vec, u0_vec, v0_vec, a0_vec, ufl_=False)

    v_old.x.array[:] = v_vec
    a_old.x.array[:] = a_vec
    u_old.x.array[:] = u_vec


def avg(x_old, x_new, alpha):
    return alpha*x_old + (1-alpha)*x_new


a_new = update_a(u, u_old, v_old, a_old, ufl_=True)
v_new = update_v(a_new, u_old, v_old, a_old, ufl_=True)

# Problem setup
res = m(avg(a_old, a_new, alpha_m), u_) + c(avg(v_old, v_new, alpha_f), u_) + \
    k(avg(u_old, u, alpha_f), u_) - rho * \
    ufl.inner(B, u_)*dx  # residual to solve
problem = fem.petsc.NonlinearProblem(res, u, [bc])

# Solver setup
solver = nls.petsc.NewtonSolver(MPI.COMM_WORLD, problem)
solver.atol = 1e-8
solver.rtol = 1e-8
solver.convergence_criterion = "incremental"

# Quantities to track
times = np.linspace(0, T, Nsteps + 1)

# Create file
with XDMFFile(domain.comm, "rod_dbc0.xdmf", "w") as xdmf_file:
    xdmf_file.write_mesh(domain)
    xdmf_file.write_function(u, 0)

solve_time = 0
# Write function at each time step
for i in range(Nsteps):
    t = times[i + 1]
    print("Time = %s" % t)

    s = time.perf_counter()
    num_its, converged = solver.solve(u)
    e = time.perf_counter()
    solve_time += e-s
    assert converged
    u.x.scatter_forward()

    # Update old fields with new quantities
    update_fields(u, u_old, v_old, a_old)

    # Write the function to the XDMF file
    xdmf_file.write_function(u, t)

print("--- %s seconds runtime ---" % (time.time() - start_time))
print(solve_time)

Also note that the mesh uses hexahedrons in your dolfinx code, while the legacy code uses tetrahedrons

1 Like

Thanks for this @dokken - this is really helpful. I will look into the solver options.

Just to clarify: with the 'fem.Constant’s adjusted to floats now update functions are working with numpy arrays, which is a more sensible approach?

Also, my equation can be split into a bilinear and linear form - will petsc.LinearProblem be expected to give better performance?

Yes. That is what you want to do. Working with ufl objects changes the array to an array of ufl objects.

It all boils down to what options you set for the krylov solver of the linearized problem, as it is equal to the problem formed in the Newton solver

Understood. Thanks again for the help!