Solving 3D Stokes problem is slower on dolfinx than dolfin

Lingyue_Shen · July 4, 2024, 7:04am

Hi,
I am trying to solve a 3D stokes problem using P1-P1 mixed element with a pressure stabilisation. Mumps direct solver is used. Here is the demo of the code:

import os
os.environ['OMP_NUM_THREADS'] = '1'
from mpi4py import MPI
comm = MPI.COMM_WORLD
import numpy as np
import time
import ufl,basix
from dolfinx import fem, io, mesh
from ufl import grad, div, nabla_grad, dx, inner, dot
from dolfinx.fem.petsc import LinearProblem

# build mesh
msh = mesh.create_box(MPI.COMM_WORLD,
                    [np.array([0, 0, 0]), np.array([2, 1, 0.3])],
                    [100, 50, 15], cell_type=mesh.CellType.tetrahedron)

# define element
Pv         = basix.ufl.element("CG", msh.basix_cell(), 1, shape=(msh.geometry.dim,))
P1         = basix.ufl.element("CG", msh.basix_cell(), 1)
TH         = basix.ufl.mixed_element([Pv, P1])

# create functionspace
W          = fem.functionspace(msh, TH)
V          = fem.functionspace(msh, Pv)

# define functions
dU         = ufl.TrialFunction(W)
U          = fem.Function(W)
velocity   = fem.Function(V)
f          = fem.Function(V)

# set boundary conditions
tdim = msh.topology.dim
fdim = tdim-1
msh.topology.create_connectivity(fdim, tdim)

def LEFT(xx):
    return np.isclose(xx[0],0)
def RIGHT(xx):
    return np.isclose(xx[0],2)

boundary_surface  = mesh.exterior_facet_indices(msh.topology)
left_surface      = mesh.locate_entities_boundary(msh, fdim, LEFT)
right_surface     = mesh.locate_entities_boundary(msh, fdim, RIGHT)
side              = np.concatenate((left_surface,right_surface))
other_surface     = np.setdiff1d(boundary_surface,side)

W_v_sub,_     = W.sub(0).collapse()
V_in          = fem.Function(W_v_sub)
V_noslip      = fem.Function(W_v_sub)
V_noslip.x.array[:]=0
V_in.sub(0).interpolate(lambda x: 1*4*x[1]*(1-x[1])/(1)**2 * 4*x[2]*(0.3-x[2])/(0.3)**2 )

bcu      = fem.dirichletbc(V_in, fem.locate_dofs_topological([W.sub(0),W.sub(0).collapse()[0]], fdim, left_surface), W.sub(0))
bcu1     = fem.dirichletbc(V_in, fem.locate_dofs_topological([W.sub(0),W.sub(0).collapse()[0]], fdim, right_surface), W.sub(0))
bcu2     = fem.dirichletbc(V_noslip, fem.locate_dofs_topological([W.sub(0),W.sub(0).collapse()[0]], fdim, other_surface), W.sub(0))
bcs      = [bcu,bcu1,bcu2]

# define the linear problem
h           = 2*ufl.Circumradius(msh)
v, q        = ufl.TestFunctions(W)
u, p        = ufl.split(U)
du, dp      = ufl.split(dU)

F_a = (
-inner(dp,div(v))*dx
+inner(  2*ufl.sym(nabla_grad(du)), ufl.sym(nabla_grad(v)))*dx
+inner(div(du),q)*dx
+0.01*h*h*inner(grad(dp),grad(q))*dx
)
F_L = (
inner(f,v)*dx
)

problem_U = LinearProblem(F_a, F_L, bcs=bcs, petsc_options={"ksp_type": "preonly", "pc_type": "lu", "pc_factor_mat_solver_type" : "mumps"})
 
# solve
ta=time.time()
Uh = problem_U.solve() 
tb=time.time()

if MPI.COMM_WORLD.rank == 0:
    print("Time: ",tb-ta)

This code takes about 16s with mpirun -n 24. However, when I calculate the same problem with dolfin it takes only about 8s. Here is the dolfin code:

import os
os.environ['OMP_NUM_THREADS'] = '1'
from dolfin import *
import time
from mpi4py import MPI as mpi
LIP = LagrangeInterpolator       
parameters["std_out_all_processes"] = False

# mesh      
mesh = BoxMesh(Point(0.0, 0.0, 0.0), Point(2.0, 1.0, 0.3), 100, 50, 15)

# define element
Pv     = VectorElement("CG", mesh.ufl_cell(), 1)  
P1     = FiniteElement("CG", mesh.ufl_cell(), 1)
TH     = MixedElement([Pv, P1])

# create function space and function
W      = FunctionSpace(mesh,TH)
U      = Function(W)
U0     = Function(W)
dU     = TrialFunction(W)

# set boudnary condition
def LEFT(x, on_boundary):
    tol = 1E-14
    return on_boundary and (x[0]==0)
def RIGHT(x, on_boundary):
    tol = 1E-14
    return on_boundary and (x[0]==2)
def SIDE(x, on_boundary):
    tol = 1E-14
    return on_boundary and (x[0]>0  + 1e-10) and (x[1]<2 - 1e-10)

exp_in = Expression(('1*4*x[1]*(1-x[1])/1 * 4*x[2]*(0.3-x[2])/0.09', '0.0', '0.0'), degree=2)
f      = Expression(('0.0', '0.0', '0.0'), degree=2)
V_in   = Function(FunctionSpace(mesh,Pv))
LIP.interpolate(V_in, exp_in)

bcu    = DirichletBC(W.sub(0), (0,0,0), SIDE)
bcu_1  = DirichletBC(W.sub(0), V_in, LEFT)
bcu_2  = DirichletBC(W.sub(0), V_in, RIGHT)
bcs    = [bcu,bcu_1,bcu_2]

# define linear problem
h=Circumradius(mesh)
v, q = TestFunctions(W)
u, p = split(U)
u0, p0 = split(U0)
du, dp = split(dU)

a=(
-inner(dp,div(v))*dx
+inner((grad(du)+grad(du).T), (0.5*grad(v)+0.5*grad(v).T))*dx
+inner(div(du),q)*dx
+0.01*h*h*inner(grad(dp),grad(q))*dx
)
L=inner(f,v)*dx

tic=time.time()
solve(a == L, U0, bcs)
# solver.solve(problem,U.vector())
info('Timing: Main newton solve {:4.2e} s'.format(time.time() - tic))

I also run these codes on a refined mesh. The time is 80s(dolfinx) vs 40s(dolfin). Still twice as slower.

Further, I tried 2D mesh with millions of points. The result doesn’t differs that much, but dolfinx is still around 10% ~ 15% slower.

I am wondering why could this happen? Is this related to the matrix assemble or the different setup of mumps solver? And what can I do to speed up dolfinx.

francesco-ballarin · July 4, 2024, 7:33pm

Question #1 on my mind: are you 100% sure that the comparison fair? For instance, is the same version of PETSc being used, and all its dependencies (e.g., mumps)?

Lingyue_Shen · July 5, 2024, 2:37am

I am not sure with setups since the code are copied from the demos. I didn’t make any change myself. is there any demos on checking and changing the parameters?

As for the versin of PETSC, is it possible to load the same version on dolfin and dolfinx?

My dolfin version is 2019.1 and the dolfinx version is 0.8.0

francesco-ballarin · July 5, 2024, 9:28am

How did you install dolfin and dolfinx, and on which OS?

Lingyue_Shen · July 5, 2024, 9:50am

I am using Linux.
DOLFINx installed with conda install -c conda-forge fenics-dolfinx=0.8.0 mpich
DOLFIN installed with conda install -c conda-forge fenics

dokken · July 5, 2024, 11:58am

I cannot reproduce this behavior using 5 processes on my laptop.
With ghcr.io/scientificcomputing/fenics:2023-11-15 the legacy script takes

Process 0: Timing: Main newton solve 4.66e+01 s

while on ghcr.io/fenics/dolfinx/dolfinx:v0.8.0 it takes

Time:  33.32949423789978

Please also note that you have not explicitly specified mumps on the legacy code, and rely on it being enforced implicitly.
You can ensure that you are using the same solver in legacy with:

solve(a == L, U0, bcs, solver_parameters={"linear_solver": "mumps"})

Also, note that you are only prescribing velocity conditions to all boundaries, and pressure is determined up to a constant. You should enforce the appropriate null-space in both legacy fenics and DOLFINx to ensure that the direct solver can handle it.

Lingyue_Shen · July 8, 2024, 3:25am

Thanks，I would try your advise. Maybe I make some mistake on the setups.

Topic		Replies	Views
Different results in serial and parallel run dolfinx dolfinx	23	2620	October 26, 2020
Speeding up Complex Electromagnetic simulations with dolfinx dolfinx	17	1851	July 28, 2021
Steady Stokes equation (3D) - dolfinx General	5	1744	November 9, 2022
Solve linear system in Mixed finite element method General	6	599	March 1, 2023
Super slow code (3D elasticity problem)	6	2867	July 11, 2019

Solving 3D Stokes problem is slower on dolfinx than dolfin

Related topics