Running the slightly adapted code (with printing timings)
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:light
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.16.5
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---
# # Implementation
#
# Author: Jørgen S. Dokken
#
# ## Test problem
# To solve a test problem, we need to choose the right hand side $f$, the coefficient $q(u)$, and the boundary $u_D$. Previously, we have worked with manufactured solutions that can be reproduced without approximation errors. This is more difficult in nonlinear problems, and the algebra is more tedious. However, we will utilize the UFL differentiation capabilities to obtain a manufactured solution.
#
# For this problem, we will choose $q(u) = 1 + u^2$ and define a two dimensional manufactured solution that is linear in $x$ and $y$:
# +
import ufl
import numpy
from mpi4py import MPI
from petsc4py import PETSc
from dolfinx import mesh, fem, log
from dolfinx.fem.petsc import NonlinearProblem
from dolfinx.nls.petsc import NewtonSolver
from time import perf_counter
start = perf_counter()
def q(u):
return 1 + u**2
N = 1000
domain = mesh.create_unit_square(MPI.COMM_WORLD, N, N)
x = ufl.SpatialCoordinate(domain)
u_ufl = 1 + x[0] + 2 * x[1]
f = -ufl.div(q(u_ufl) * ufl.grad(u_ufl))
# -
# Note that since `x` is a 2D vector, the first component (index 0) represents $x$, while the second component (index 1) represents $y$. The resulting function `f` can be directly used in variational formulations in DOLFINx.
#
# As we now have defined our source term and an exact solution, we can create the appropriate function space and boundary conditions.
# Note that as we have already defined the exact solution, we only have to convert it to a Python function that can be evaluated in the interpolation function. We do this by employing the Python `eval` and `lambda`-functions.
V = fem.functionspace(domain, ("Lagrange", 1))
def u_exact(x):
return eval(str(u_ufl))
u_D = fem.Function(V)
u_D.interpolate(u_exact)
fdim = domain.topology.dim - 1
boundary_facets = mesh.locate_entities_boundary(
domain, fdim, lambda x: numpy.full(x.shape[1], True, dtype=bool)
)
bc = fem.dirichletbc(u_D, fem.locate_dofs_topological(V, fdim, boundary_facets))
# We are now ready to define the variational formulation. Note that as the problem is nonlinear, we have to replace the `TrialFunction` with a `Function`, which serves as the unknown of our problem.
uh = fem.Function(V)
v = ufl.TestFunction(V)
F = q(uh) * ufl.dot(ufl.grad(uh), ufl.grad(v)) * ufl.dx - f * v * ufl.dx
# ## Newton's method
# The next step is to define the non-linear problem. As it is non-linear we will use [Newtons method](https://en.wikipedia.org/wiki/Newton%27s_method).
# For details about how to implement a Newton solver, see [Custom Newton solvers](../chapter4/newton-solver.ipynb).
# Newton's method requires methods for evaluating the residual `F` (including application of boundary conditions), as well as a method for computing the Jacobian matrix. DOLFINx provides the function `NonlinearProblem` that implements these methods. In addition to the boundary conditions, you can supply the variational form for the Jacobian (computed if not supplied), and form and jit parameters, see the [JIT parameters section](../chapter4/compiler_parameters.ipynb).
problem = NonlinearProblem(F, uh, bcs=[bc])
# Next, we use the DOLFINx Newton solver. We can set the convergence criteria for the solver by changing the absolute tolerance (`atol`), relative tolerance (`rtol`) or the convergence criterion (`residual` or `incremental`).
solver = NewtonSolver(MPI.COMM_WORLD, problem)
solver.convergence_criterion = "incremental"
solver.rtol = 1e-6
solver.report = True
# We can modify the linear solver in each Newton iteration by accessing the underlying `PETSc` object.
ksp = solver.krylov_solver
opts = PETSc.Options()
option_prefix = ksp.getOptionsPrefix()
opts[f"{option_prefix}ksp_type"] = "gmres"
opts[f"{option_prefix}ksp_rtol"] = 1.0e-8
opts[f"{option_prefix}pc_type"] = "hypre"
opts[f"{option_prefix}pc_hypre_type"] = "boomeramg"
opts[f"{option_prefix}pc_hypre_boomeramg_max_iter"] = 1
opts[f"{option_prefix}pc_hypre_boomeramg_cycle_type"] = "v"
ksp.setFromOptions()
# We are now ready to solve the non-linear problem. We assert that the solver has converged and print the number of iterations.
log.set_log_level(log.LogLevel.INFO)
n, converged = solver.solve(uh)
assert converged
print(f"Number of interations: {n:d}")
# We observe that the solver converges after $8$ iterations.
# If we think of the problem in terms of finite differences on a uniform mesh, $\mathcal{P}_1$ elements mimic standard second-order finite differences, which compute the derivative of a linear or quadratic funtion exactly. Here $\nabla u$ is a constant vector, which is multiplied by $1+u^2$, giving a second order polynomial in $x$ and $y$, which the finite difference operator would compute exactly. We can therefore, even with $\mathcal{P}_1$ elements, expect the manufactured solution to be reproduced by the numerical method. However, if we had chosen a nonlinearity, such as $1+u^4$, this would not be the case, and we would need to verify convergence rates.
# +
# Compute L2 error and error at nodes
V_ex = fem.functionspace(domain, ("Lagrange", 2))
u_ex = fem.Function(V_ex)
u_ex.interpolate(u_exact)
error_local = fem.assemble_scalar(fem.form((uh - u_ex) ** 2 * ufl.dx))
error_L2 = numpy.sqrt(domain.comm.allreduce(error_local, op=MPI.SUM))
if domain.comm.rank == 0:
print(f"L2-error: {error_L2:.2e}")
# Compute values at mesh vertices
error_max = domain.comm.allreduce(
numpy.max(numpy.abs(uh.x.array - u_D.x.array)), op=MPI.MAX
)
if domain.comm.rank == 0:
print(f"Error_max: {error_max:.2e}")
end = perf_counter()
if MPI.COMM_WORLD.rank == 0:
print(f"{MPI.COMM_WORLD.size} -Time: {end - start:.2f}")
executed on ubuntu 22.04 with conda-forge (python 3.12), and the following env export:
name: test_scaling
channels:
- conda-forge
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- attr=2.5.1=h166bdaf_1
- binutils_impl_linux-64=2.43=h4bf12b8_2
- binutils_linux-64=2.43=h4852527_2
- blis=0.9.0=h4ab18f5_2
- bzip2=1.0.8=h4bc722e_7
- c-ares=1.34.4=hb9d3cd8_0
- c-blosc2=2.15.2=h3122c55_1
- ca-certificates=2024.12.14=hbcca054_0
- cffi=1.16.0=py312hf06ca03_0
- fenics-basix=0.9.0=py312h9c9c0ab_2
- fenics-basix-nanobind-abi=0.2.1.13=h6c05e69_2
- fenics-dolfinx=0.9.0=py312hef1a67e_108
- fenics-ffcx=0.9.0=pyh2e48890_0
- fenics-libbasix=0.9.0=h7cb7ce6_2
- fenics-libdolfinx=0.9.0=hb85e8c2_108
- fenics-ufcx=0.9.0=hb7f7608_0
- fenics-ufl=2024.2.0=pyhd8ed1ab_1
- fftw=3.3.10=mpi_mpich_hbcf76dd_10
- fmt=11.0.2=h434a139_0
- gcc_impl_linux-64=13.3.0=hfea6d02_1
- gcc_linux-64=13.3.0=hc28eda2_7
- hdf5=1.14.3=mpi_mpich_h7f58efa_9
- hypre=2.32.0=mpi_mpich_h2e71eac_1
- icu=75.1=he02047a_0
- kahip=3.18=h7d9e1f9_0
- kernel-headers_linux-64=3.10.0=he073ed8_18
- keyutils=1.6.1=h166bdaf_0
- krb5=1.21.3=h659f571_0
- ld_impl_linux-64=2.43=h712a8e2_2
- libadios2=2.10.2=mpi_mpich_hd47ee72_1
- libaec=1.1.3=h59595ed_0
- libamd=3.3.3=ss783_h889e182
- libblas=3.9.0=26_linux64_blis
- libboost=1.86.0=h6c02f8c_3
- libboost-devel=1.86.0=h1a2810e_3
- libboost-headers=1.86.0=ha770c72_3
- libbtf=2.3.2=ss783_h2377355
- libcamd=3.3.3=ss783_h2377355
- libcap=2.71=h39aace5_0
- libcblas=3.9.0=26_linux64_blis
- libccolamd=3.3.4=ss783_h2377355
- libcholmod=5.3.0=ss783_h3fa60b6
- libcolamd=3.3.4=ss783_h2377355
- libcurl=8.11.1=h332b0f4_0
- libedit=3.1.20240808=pl5321h7949ede_0
- libev=4.33=hd590300_2
- libexpat=2.6.4=h5888daf_0
- libfabric=2.0.0=ha770c72_1
- libfabric1=2.0.0=h14e6f36_1
- libffi=3.4.2=h7f98852_5
- libgcc=14.2.0=h77fa898_1
- libgcc-devel_linux-64=13.3.0=h84ea5a7_101
- libgcc-ng=14.2.0=h69a702a_1
- libgcrypt-lib=1.11.0=hb9d3cd8_2
- libgfortran=14.2.0=h69a702a_1
- libgfortran-ng=14.2.0=h69a702a_1
- libgfortran5=14.2.0=hd5240d6_1
- libgomp=14.2.0=h77fa898_1
- libgpg-error=1.51=hbd13f7d_1
- libhwloc=2.11.2=default_h0d58e46_1001
- libiconv=1.17=hd590300_2
- libklu=2.3.5=ss783_hfbdfdfc
- liblapack=3.9.0=8_h3b12eaf_netlib
- liblzma=5.6.3=hb9d3cd8_1
- libnghttp2=1.64.0=h161d5f1_0
- libnl=3.11.0=hb9d3cd8_0
- libnsl=2.0.1=hd590300_0
- libpng=1.6.45=h943b412_0
- libptscotch=7.0.6=h4c3caac_1
- libsanitizer=13.3.0=heb74ff8_1
- libscotch=7.0.6=hea33c07_1
- libsodium=1.0.20=h4ab18f5_0
- libspqr=4.3.4=ss783_hae1ff0d
- libsqlite=3.48.0=hee588c1_1
- libssh2=1.11.1=hf672d98_0
- libstdcxx=14.2.0=hc0a3c3a_1
- libstdcxx-ng=14.2.0=h4852527_1
- libsuitesparseconfig=7.8.3=ss783_h83006af
- libsystemd0=257.2=h3dc2cb9_0
- libudev1=257.2=h9a4d06a_0
- libumfpack=6.3.5=ss783_hd4f9ce1
- libuuid=2.38.1=h0b41bf4_0
- libxcrypt=4.4.36=hd590300_1
- libxml2=2.13.5=h8d12d68_1
- libzlib=1.3.1=hb9d3cd8_2
- lz4-c=1.10.0=h5888daf_1
- metis=5.1.0=hd0bcaf9_1007
- mpi=1.0.1=mpich
- mpi4py=4.0.1=py312h0a6c937_1
- mpich=4.2.3=h1a8bee6_104
- mumps-include=5.7.3=ha770c72_6
- mumps-mpi=5.7.3=h2e1f7a5_6
- ncurses=6.5=h2d0b736_2
- numpy=2.2.2=py312h72c5963_0
- openssl=3.4.0=h7b32b05_1
- parmetis=4.0.3=hc7bef4e_1007
- petsc=3.22.2=real_h91a077e_103
- petsc4py=3.22.2=py312h84d7c54_0
- pip=24.3.1=pyh8b19718_2
- pkg-config=0.29.2=h4bc722e_1009
- pugixml=1.14=h59595ed_0
- pycparser=2.22=pyh29332c3_1
- python=3.12.8=h9e4cc4f_1_cpython
- python_abi=3.12=5_cp312
- rdma-core=55.0=h5888daf_0
- readline=8.2=h8228510_1
- scalapack=2.2.0=h7e29ba8_4
- setuptools=75.8.0=pyhff2d567_0
- slepc=3.22.2=real_h754b140_300
- slepc4py=3.22.2=py312h377abe1_0
- spdlog=1.14.1=hed91bc2_1
- superlu=5.2.2=h00795ac_0
- superlu_dist=9.1.0=h0804ebd_0
- sysroot_linux-64=2.17=h0157908_18
- tk=8.6.13=noxft_h4845f30_101
- tzdata=2025a=h78e105d_0
- ucx=1.18.0=h53fb5aa_0
- wheel=0.45.1=pyhd8ed1ab_1
- yaml=0.2.5=h7f98852_2
- zeromq=4.3.5=h3b0a872_7
- zfp=0.5.5=h9c3ff4c_8
- zlib-ng=2.2.3=h7955e40_0
- zstd=1.5.6=ha6fb4c9_0
yields
# Serial
L2-error: 6.67e-16
Error_max: 4.44e-15
1 -Time: 16.94
# 2 proc
L2-error: 6.05e-16
Error_max: 4.88e-15
2 -Time: 12.98
# 4 proc
L2-error: 6.51e-16
Error_max: 5.33e-15
4 -Time: 9.46
# 8 proc
Number of interations: 8
L2-error: 6.00e-16
Error_max: 4.88e-15
8 -Time: 10.16
which clearly indicates that there is a sweet spot in terms of partitioning when you get around 4 processors. I’ve already commented on this in several other places (including recently in MPI acceleration with FEniCSx - #3 by dokken).