How to further speed up the calculation in doflinx?

I am trying to run the default tutorial The equations of linear elasticity β€” FEniCS-X tutorial in the docker container, and want to test the computational time of the solver by using the time package.

start_time = time.time()
problem = dolfinx.fem.LinearProblem(a, L, bcs=[bc], petsc_options={"ksp_type": "preonly", "pc_type": "lu"})
uh = problem.solve()
print('Time = %.3f (s)' %(time.time()-start_time))

I run this code by using the command:

mpirun --allow-run-as-root -n 2 python3 Demo_LinearElasticity.py

However, the computational time increases with the number of processors. And it seems the OS just run the same code for many times instead of using the parallel computation. I think dolfinx should have a faster speed and maybe I ran it in an inproper way.

root@08637eaf5b16:/shared# mpirun --allow-run-as-root -n 2 python3 Demo_LinearElasticity.py
Time = 0.383 (s)
Time = 0.383 (s)

root@08637eaf5b16:/shared# mpirun --allow-run-as-root -n 4 python3 Demo_LinearElasticity.py
Time = 0.397 (s)
Time = 0.402 (s)
Time = 0.433 (s)
Time = 0.453 (s)

root@08637eaf5b16:/shared# mpirun --allow-run-as-root -n 8 python3 Demo_LinearElasticity.py
Time = 0.536 (s)
Time = 0.534 (s)
Time = 0.530 (s)
Time = 0.528 (s)
Time = 0.531 (s)
Time = 0.554 (s)
Time = 0.544 (s)
Time = 0.553 (s)

I cannot reproduce your issue with a docker container (dolfinx/dolfinx) with the following script:

L = 1
W = 0.2
mu = 1
rho = 1
delta = W/L
gamma = 0.4*delta**2
beta = 1.25
lambda_ = beta
g = gamma

import dolfinx
import numpy as np
from mpi4py import MPI
from dolfinx.cpp.mesh import CellType
import time

mesh = dolfinx.BoxMesh(MPI.COMM_WORLD, [np.array([0,0,0]), np.array([L, W, W])], [20,6,6], cell_type=CellType.hexahedron)
V = dolfinx.VectorFunctionSpace(mesh, ("CG", 1))


def clamped_boundary(x):
    return np.isclose(x[0], 0)

fdim = mesh.topology.dim - 1
boundary_facets = dolfinx.mesh.locate_entities_boundary(mesh, fdim, clamped_boundary)

u_D = dolfinx.Function(V)
with u_D.vector.localForm() as loc:
    loc.set(0)
bc = dolfinx.DirichletBC(u_D, dolfinx.fem.locate_dofs_topological(V, fdim, boundary_facets))

T = dolfinx.Constant(mesh, (0, 0, 0))

import ufl
ds = ufl.Measure("ds", domain=mesh)

def epsilon(u):
    return ufl.sym(ufl.grad(u)) # Equivalent to 0.5*(ufl.nabla_grad(u) + ufl.nabla_grad(u).T)
def sigma(u):
    return lambda_ * ufl.nabla_div(u) * ufl.Identity(u.geometric_dimension()) + 2*mu*epsilon(u)

u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
f = dolfinx.Constant(mesh, (0, 0, -rho*g))
a = ufl.inner(sigma(u), epsilon(v)) * ufl.dx
L = ufl.dot(f, v) * ufl.dx + ufl.dot(T, v) * ds

problem = dolfinx.fem.LinearProblem(a, L, bcs=[bc], petsc_options={"ksp_type": "preonly", "pc_type": "lu"})
start = time.time()
uh = problem.solve()
end = time.time()
print(f'{MPI.COMM_WORLD.rank}: Time = {end-start:.3f} (s)')

and output:

root@feb3f65a1cf3:/home/shared# mpirun -n 1 python3 linearelasticity_code.py 
0: Time = 0.315 (s)
root@feb3f65a1cf3:/home/shared# mpirun -n 2 python3 linearelasticity_code.py 
0: Time = 0.115 (s)
1: Time = 0.115 (s)
root@feb3f65a1cf3:/home/shared# mpirun -n 3 python3 linearelasticity_code.py 
0: Time = 0.094 (s)
1: Time = 0.094 (s)
2: Time = 0.094 (s)
root@feb3f65a1cf3:/home/shared# mpirun -n 4 python3 linearelasticity_code.py 
0: Time = 0.081 (s)
1: Time = 0.081 (s)
2: Time = 0.081 (s)
3: Time = 0.081 (s)
root@feb3f65a1cf3:/home/shared# mpirun -n 8 python3 linearelasticity_code.py 
0: Time = 0.060 (s)
1: Time = 0.060 (s)
2: Time = 0.060 (s)
3: Time = 0.060 (s)
4: Time = 0.060 (s)
5: Time = 0.060 (s)
6: Time = 0.060 (s)
7: Time = 0.060 (s)
2 Likes

Thanks, Dokken. I delete my old container and rerun a new one, now it works. But my improvement (from 0.36s to 0.14s) is not as obvious as yours (from 0.3s to 0.06s), maybe it is due to the difference of computers.

Im using a desktop computer with 64 GB ram and 16 processes. For further speedups, I would suggest changing from a direct to an iterative solver (especially if you increase the number of dofs)

1 Like

Hello,

I seem to be having the same problem with the same example. I am running fenicsx 0.9.0 under anaconda 3. The operating system is ubuntu 24.04 and the desktop computer has 16 cores and 32Gb of RAM.

In order to mimc the previous posts I created the following minimal code (cut down from linearelasticity_code.py and appropriate for version 0.9.0):

from dolfinx import mesh, fem, default_scalar_type
from dolfinx.fem.petsc import LinearProblem
from mpi4py import MPI
import ufl
import numpy as np
import time
L = 1
W = 0.2
mu = 1
rho = 1
delta = W / L
gamma = 0.4 * delta**2
beta = 1.25
lambda_ = beta
g = gamma

domain = mesh.create_box(MPI.COMM_WORLD, [np.array([0, 0, 0]), np.array([L, W, W])],
                         [20, 6, 6], cell_type=mesh.CellType.hexahedron)
V = fem.functionspace(domain, ("Lagrange", 1, (domain.geometry.dim, )))


def clamped_boundary(x):
    return np.isclose(x[0], 0)


fdim = domain.topology.dim - 1
boundary_facets = mesh.locate_entities_boundary(domain, fdim, clamped_boundary)

u_D = np.array([0, 0, 0], dtype=default_scalar_type)
bc = fem.dirichletbc(u_D, fem.locate_dofs_topological(V, fdim, boundary_facets), V)

T = fem.Constant(domain, default_scalar_type((0, 0, 0)))

ds = ufl.Measure("ds", domain=domain)

def epsilon(u):
    return ufl.sym(ufl.grad(u))  # Equivalent to 0.5*(ufl.nabla_grad(u) + ufl.nabla_grad(u).T)

def sigma(u):
    return lambda_ * ufl.nabla_div(u) * ufl.Identity(len(u)) + 2 * mu * epsilon(u)

u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
f = fem.Constant(domain, default_scalar_type((0, 0, -rho * g)))
a = ufl.inner(sigma(u), epsilon(v)) * ufl.dx
L = ufl.dot(f, v) * ufl.dx + ufl.dot(T, v) * ds


problem = LinearProblem(a, L, bcs=[bc], petsc_options={"ksp_type": "preonly", "pc_type": "lu"})
start_time = time.time()
uh = problem.solve()
end_time = time.time()
print('Time = %.5f (s)' %(end_time-start_time))

My results are as follows:

(fenicsx-env) peter:fenicsx % mpirun -n 1 python lin_elas.py             
Time = 0.18598 (s)
(fenicsx-env) peter:fenicsx % mpirun -n 2 python lin_elas.py
Time = 0.21867 (s)
Time = 0.21867 (s)
(fenicsx-env) peter:fenicsx % mpirun -n 4 python lin_elas.py
Time = 0.34490 (s)
Time = 0.34489 (s)
Time = 0.34490 (s)
Time = 0.34489 (s)
(fenicsx-env) peter:fenicsx % mpirun -n 8 python lin_elas.py
Time = 0.51458 (s)
Time = 0.51459 (s)
Time = 0.51552 (s)
Time = 0.51552 (s)
Time = 0.51551 (s)
Time = 0.51550 (s)
Time = 0.51551 (s)
Time = 0.51552 (s)

I assume that I must have missed something somewhere. I did run the program in a new window with a fresh start of conda. The only thing I haven’t tried is a reboot of the computer.

Any thoughts would be much appreciated.

Thank you,

Peter.