What's the best way to assemble a form?

Right now I have a code (using dolfinx from docker) that goes like :

form = ...
M = dolfinx.fem.assemble_matrix(form)

This works (the syntax dolfinx.fem.petsc.assemble_matrix from the tutorial doesn’t) but it appears to produce suboptimal matrices. For instance, I have a form that is essentially integration :

N_form = ufl.inner(u,v)*ufl.SpatialCoordinate(mesh)[1]**2*ufl.dx
N = dfx.fem.assemble_matrix(N_form)

I expect the resulting matrix to be a sparse PETSc matrix that is in essence diagonal. Instead, it takes as much memory as a more involved non-local form, and has as much nonzero entries.

There’s clearly something I’m doing wrong here. My use case involves matrices with dimensions 1M, I would like to leverage the power of PETSc and create diagonal and symmetric matrices, maybe using MatOption. I would like to preallocate memory for speed as described here. I want to avoid ever assembling suboptimal matrices in the first place.

Can I do that using petsc4py and dolfinx ? If so, how ?

This is not a full answer, but I’ll mention that you actually don’t expect N to be diagonal, except in special situations, e.g., if u and v are in a degree-0 DG space, or if they are in a degree-1 CG space and the form is integrated with a vertex quadrature rule.

1 Like

Any matrix generated by a DOLFINx form over a dx integral will have a specific preallocated sparsity pattern: https://github.com/FEniCS/dolfinx/blob/main/cpp/dolfinx/fem/sparsitybuild.cpp#L18-L31
which pre-allocates memory for potential non-zero entries for any dof sharing the same cell that the current DOF. This means that the matrix only has local entries, there are no nonlocal entries here (there might be off-diagonal ghost entries when running in parallel).

Without a minimal example where you illustrate how many non-zero entries you expect, and how many you get, it is hard to give any further guidance.

As the matrix returned by dolfinx is a PETSc matrix, you can use any PETsc option to modify the matrix options.
Here is a minimal code illustrating how you can split the DOLFINx code into several steps, inspecting what you send to the matrix creator:

import time

import dolfinx
import matplotlib.pyplot as plt
import numpy as np
import scipy.sparse
import ufl
from mpi4py import MPI
from petsc4py import PETSc

N = 10
degree = 2
mesh = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, N, N, N)
V = dolfinx.fem.FunctionSpace(mesh, ("CG", degree))
u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
a = ufl.inner(u, v) * ufl.SpatialCoordinate(mesh)[1]**2 * ufl.dx
jit_parameters = {"cffi_extra_compile_args": ["-Ofast", "-march=native"],
                  "cffi_libraries": ["m"]}
form_a = dolfinx.fem.form(a, jit_params=jit_parameters)
sp = dolfinx.fem.create_sparsity_pattern(form_a)
sp.assemble()
print(f"Number of non-zeros in sparsity pattern: {sp.num_nonzeros}")
print(f"Num dofs: {V.dofmap.index_map.size_local * V.dofmap.index_map_bs}")
print(f"Entries per dof {sp.num_nonzeros/(V.dofmap.index_map.size_local * V.dofmap.index_map_bs)}")
print(f"Dofs per cell {V.element.space_dimension}")
print("-" * 25)

for option in [0, 1]:
    A = dolfinx.cpp.la.petsc.create_matrix(mesh.comm, sp)
    A.setOption(A.Option.IGNORE_ZERO_ENTRIES, option)

    start = time.perf_counter()
    dolfinx.fem.petsc.assemble_matrix(A, form_a)
    A.assemble()
    end = time.perf_counter()
    print(f"PETSc ignore zero:{option} Assembly time: {end-start}")

    b, c = A.createVecs()
    start = time.perf_counter()
    A.mult(c, b)
    end = time.perf_counter()
    print(f"PETSc ignore zero:{option} MatVec time: {end-start}")
    print("-" * 25)
    # Plot sparsity pattern
    global_indices = np.asarray(V.dofmap.index_map.local_to_global(
        np.arange(V.dofmap.index_map.size_local + V.dofmap.index_map.num_ghosts, dtype=np.int32)), dtype=PETSc.IntType)
    sort_index = np.argsort(global_indices)
    is_A = PETSc.IS().createGeneral(global_indices[sort_index])
    A_loc = A.createSubMatrices(is_A)[0]
    ai, aj, av = A_loc.getValuesCSR()
    A_csr = scipy.sparse.csr_matrix((av, aj, ai))
    plt.spy(A_csr)
    plt.grid()
    plt.savefig(f"local_sparsity_{option}_{MPI.COMM_WORLD.rank}.png")

    A.destroy()

This returns the following pattern in serial:
local_sparsity_0_0
and the following two in parallel
local_sparsity_1_0
local_sparsity_1_1
where the off diagonal entries are those on the process boundary that needs to be communciated.

I would also like to note that to speed up assemble, you can set the jit_parameters, as explained here.

5 Likes

@kamensky would you care to expand ? Do you have a resource on hand detailing various integration patterns for dolfinx ? Is it because a Form object is always evaluated at vertices whereas Function object can have values between vertices in case of higher order elements ?

@dokken, thanks for your very complete answer. I think your code snippet will be more valuable to me than anything else, since it will allow me to check sparsity patterns for all my various matrices. I’m afraid though that as provided, your code doesn’t run on my dolfinx container.

I changed dolfinx.fem.form to dolfinx.fem.form.Form and its jit_params argument to jit_parameters. I’m still getting an error :

Traceback (most recent call last):
  File "xxx.py", line xxx, in xxx
    sp = dfx.fem.create_sparsity_pattern(form_a)
  File "/usr/local/dolfinx-complex/lib/python3.8/dist-packages/dolfinx/fem/__init__.py", line 35, in create_sparsity_pattern
    topology = a.mesh.topology
AttributeError: 'Form' object has no attribute 'mesh'

Ok this is my current code, removing bugs from version difference.

import time
import dolfinx as dfx
import matplotlib.pyplot as plt
import numpy as np
import scipy.sparse
import ufl
from mpi4py.MPI import COMM_WORLD
from petsc4py import PETSc as pet

N = 10
degree = 2
mesh = dfx.UnitCubeMesh(COMM_WORLD, N, N, N)
V = dfx.FunctionSpace(mesh, ("CG", degree))
u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)
a = ufl.inner(u, v) * ufl.SpatialCoordinate(mesh)[1]**2 * ufl.dx
jit_parameters = {"cffi_extra_compile_args": ["-Ofast", "-march=native"],
				"cffi_libraries": ["m"]}
form_a = dfx.fem.Form(a, jit_parameters=jit_parameters)
sp = dfx.fem.create_sparsity_pattern(form_a._cpp_object)
sp.assemble()
print(f"Number of non-zeros in sparsity pattern: {sp.num_nonzeros()}")
print(f"Num dofs: {V.dofmap.index_map.size_local * V.dofmap.index_map_bs}")
print(f"Entries per dof {sp.num_nonzeros()/(V.dofmap.index_map.size_local * V.dofmap.index_map_bs)}")
print(f"Dofs per cell {V.element.space_dimension()}")
print("-" * 25)

for option in [0, 1]:
	A = dfx.cpp.la.create_matrix(COMM_WORLD, sp)
	A.setOption(A.Option.IGNORE_ZERO_ENTRIES, option)

	start = time.perf_counter()
	dfx.fem.assemble_matrix(A, form_a)
	A.assemble()
	end = time.perf_counter()
	print(f"PETSc ignore zero:{option} Assembly time: {end-start}")

	b, c = A.createVecs()
	start = time.perf_counter()
	A.mult(c, b)
	end = time.perf_counter()
	print(f"PETSc ignore zero:{option} MatVec time: {end-start}")
	print("-" * 25)
	# Plot sparsity pattern
	global_indices = np.asarray(V.dofmap.index_map.local_to_global(
		np.arange(V.dofmap.index_map.size_local + V.dofmap.index_map.num_ghosts, dtype=np.int32)), dtype=pet.IntType)
	sort_index = np.argsort(global_indices)
	is_A = pet.IS().createGeneral(global_indices[sort_index])
	A_loc = A.createSubMatrices(is_A)[0]
	ai, aj, av = A_loc.getValuesCSR()
	A_csr = scipy.sparse.csr_matrix((av, aj, ai))
	plt.spy(A_csr)
	plt.grid()
	plt.savefig(f"local_sparsity_{option}_{COMM_WORLD.rank}.png")

	A.destroy()

This one does not throw an error but hangs on option == 1

What version of DOLFINx are you using?
Im using code compatible with DOLFINx>=0.4.0.

From the look of your initial snippet, it seems like you are using 0.3.0 or older.

I do not have time to go back to an older version of DOLFINx to figure out what is wrong (as it works with later versions)…

The point is there irregardless of the option ==0 or option == 1. The matrices are sparse, they have preallocated memory, and has a diagonal structure for local entries (and an off diagonal for those dofs owned by other processes).

My version is 0.3.1.0. I’ll try and figure things out using the kindly provided snippet, but I also have limited patience for replacing dfx.UnitCubeMesh with dfx.mesh.create_unit_cube, dfx.form with dfx.Form in hundreds of lines of code to keep up with the version change…

You’re right, there’s something to learn here regardless of the option. I was simply curious - I didn’t expect it to hang…

Do you know where it hangs? Does it hang when assembling the matrix or when you assemble the sparsity pattern?

It hangs at the spy step. Apparently I have a very dense 1M x 1M matrix. I’ll try and find a smaller problem

That is not suprising as plotting a 1M x1M matrix with matplotlib would be an issue.

I know the above code was written for the unit cube mesh and simple square form, but I was already tying on my use case on my end

I can reproduce your results with the above version 3 code though, so that’s a gain. I guess I’ll be moving on from there

Many thanks ! It was instructive to see that it’s indeed possible to leverage PETSc in dolfinx, and I had completely missed the jit_params option