I want to solve the Poisson equation repeatedly for a varying charge distribution. I know, that this topic has already been discussed (see e.g. Efficiently solving Poisson eqn repeatedly - dolfinx - FEniCS Project), but the information given there does either not work out for me or is already implemented. I nevertheless have still the feeling, that my code is not optimal and can be further optimized.
Please consider the following MWE, which was the fastest code that I was able to produce . The size of the mesh is similar to the mesh size (750’000 cells) in my application (600’000 cells to 6’000’000 cells).
import ufl
import numpy as np
from petsc4py import PETSc
from mpi4py import MPI
from dolfinx import fem, mesh, io, plot, default_scalar_type
from dolfinx.fem.petsc import assemble_vector, assemble_matrix, create_vector, apply_lifting, set_bc
from dolfinx.mesh import create_unit_square, locate_entities, create_unit_cube, exterior_facet_indices, CellType
import petsc4py as petsc4py
import time
# Create mesh
dfx_mesh = create_unit_cube(MPI.COMM_WORLD, 50, 50, 50, CellType.tetrahedron)
c_to_v = dfx_mesh.topology.connectivity(3,0)
number_cells = c_to_v.num_nodes
print("Mesh with "+str(number_cells)+" cells initialized.")
start = time.time()
# Initialize function spaces
function_space = fem.functionspace(dfx_mesh, ("Lagrange", 1))
charge_density_space = fem.functionspace(dfx_mesh, ("DG", 0))
# Create boundary condition
boundary_value = 0
uD = default_scalar_type(boundary_value)
tdim = dfx_mesh.topology.dim
fdim = tdim - 1
dfx_mesh.topology.create_entities(fdim)
dfx_mesh.topology.create_connectivity(fdim, tdim)
boundary_facets = mesh.exterior_facet_indices(dfx_mesh.topology)
boundary_dofs = fem.locate_dofs_topological(function_space, fdim, boundary_facets)
our_bcs = [fem.dirichletbc(uD, boundary_dofs, function_space)]
#Create test and trial function
phi = ufl.TrialFunction(function_space)
v = ufl.TestFunction(function_space)
# Define solution variable
PHI = fem.Function(function_space)
PHI.name = "PHI"
# Create charge density
rho = fem.Function(charge_density_space)
rho.name = "rho"
# Set initial charge distribution
rho.x.array[:] = np.zeros(number_cells)[:]
# Create forms
a = ufl.dot(ufl.grad(phi), ufl.grad(v))*ufl.dx
L = rho*v*ufl.dx
# Compile forms
bilinear_form = fem.form(a)
linear_form = fem.form(L)
# Assemble matrices and vectors
A = assemble_matrix(bilinear_form, bcs=our_bcs)
A.assemble()
b = create_vector(linear_form)
assemble_vector(b, linear_form)
# Define solver
solver = PETSc.KSP().create(dfx_mesh.comm)
solver.setOperators(A)
solver.setType(PETSc.KSP.Type.PREONLY)
solver.getPC().setType(PETSc.PC.Type.CHOLESKY)
solver.getPC().setFactorSolverType("cholmod")
# Apply Dirichlet boundary condition to the vector
apply_lifting(b, [bilinear_form], [our_bcs])
set_bc(b, our_bcs)
# Solve linear problem for the first time, to save time at later calls.
solver.solve(b, PHI.x.petsc_vec)
solver.getPC().setReusePreconditioner(True)
end = time.time()
print("Poisson solver initialization time: ", str(end-start)+"s")
def solve():
'''
Solve-function to be called repeatedly.
'''
start = time.time()
b = create_vector(linear_form)
assemble_vector(b, linear_form)
# Apply Dirichlet boundary condition to the vector
apply_lifting(b, [bilinear_form], [our_bcs])
set_bc(b, our_bcs)
# Solve linear problem
solver.solve(b, PHI.x.petsc_vec)
end = time.time()
print("Poisson solver solution time: ", str(end-start)+"s")
rho = np.zeros(number_cells)
cell_index = np.random.randint(0, number_cells)
rho[cell_index] = 100
solve()
rho = np.zeros(number_cells)
cell_index = np.random.randint(0, number_cells)
rho[cell_index] = 15
solve()
rho = np.zeros(number_cells)
cell_index = np.random.randint(0, number_cells)
rho[cell_index] = 36
solve()
This code gives as an output:
Mesh with 750000 cells initialized.
Poisson solver initialization time: 7.316346883773804s
Poisson solver solution time: 0.14932680130004883s
Poisson solver solution time: 0.12883353233337402s
Poisson solver solution time: 0.1407771110534668s
Edit/NB: I have also experimented with cholmod as a solver, which is a little bit faster in the context of my MWE (with the initial preconditioning being significantly faster), although it is somehow a little bit slower within the context of my original application:
Mesh with 750000 cells initialized.
Poisson solver initialization time: 2.8728795051574707s
Poisson solver solution time: 0.09317612648010254s
Poisson solver solution time: 0.13088488578796387s
Poisson solver solution time: 0.12652373313903809s
I compiled PETSc with:
./configure \
--COPTFLAGS="-O2" \
--CXXOPTFLAGS="-O2" \
--FOPTFLAGS="-O2" \
--with-64-bit-indices=no \
--with-debugging=no \
--with-fortran-bindings=no \
--with-shared-libraries \
--with-cuda \
--download-hypre \
--download-metis \
--download-mumps-avoid-mpi-in-place \
--download-mumps \
--download-ptscotch \
--download-scalapack \
--download-spai \
--download-suitesparse \
--download-superlu \
--download-cusparse \
--with-scalar-type=real \
--with-precision=double \
Unfortunately, I was not able to test superlu_dist, since its installation did fail in my case.
My general question is now: How to speed up this code further? At least a speedup by factor 3 would be necessary for my application. Any help here is appreciated!
More detailed questions in this context are:
1.) How to enable gpu acceleration?
Using
./configure [...] --with-cuda [---]
at the installation of petsc did not significantly alter the performance of the individual calls of solve(). Is there some not too complicated way to benefit from gpu-accelleration within the individual solve()-steps?
2.) Is rebuilding the vector within solve() necessary?
In Efficiently solving Poisson eqn repeatedly - #3 by dokken it is claimed, that it should not be necessary, to recreate/reassemble the vector b within solve(). But unfortunately, my test cases fail, if I don’t recreate and reassemble the vector b within solve() – i.e. the Poisson solver gives the wrong results, although it doesn’t crash. On the same time, profiling shows, that reassembly and recreation of b is a very time-consuming step.
3.) Do you know some adequate iterative method?
I have already played around with iterative methods, but although they speeded up preconditioning, calling solve() did also not speed up significantly. Do you have maybe an idea for an adequate iterative method (together with an associated solver) that I should give a try?
4.) Do you have more ideas for compiler flags?
I have already tested compiler flags within the compilation of the linear forms along the lines of JIT options and visualization using Pandas — FEniCSx tutorial. But this did only cause a marginal speed-up. Are there more options of compiler flags, that I should give a try?
Thank you very much for your help!
P.S. Funnily enough, the performance got much better when I one forgot accidentally setting the preconditioner via
solver.getPC().setType(PETSc.PC.Type.CHOLESKY).
Then the solution time speeds up by factor 3. But again, the Poisson solver then gives the wrong results in my test cases.