Assemble bilinear form from a product of linear forms

I want to solve a PDE, with the following variational formulation

a(u, v) + b_1(u) b_2(v) = L(v) \quad \quad \forall v \in V,

where a(u,v) = \int_\Omega \nabla u \cdot \nabla v dx, and

b_1(u) = \int_{\gamma} u ds\\ b_2(v) = \int_\Omega \ell v dx,

where \ell is some known function and \gamma is a marked interior boundary.

How can one assemble the corresponding matrix in an efficient way?

I know how to assemble the vectors corresponding to b_1 and b_2, but not how to create the matrix for their outer product.

I would suggest an approach like what is suggested in: https://link.springer.com/content/pdf/10.1007/s00366-025-02107-1.pdf
Below I have made an alternate approach, based of the comments in:
[petsc-users] Outer Product of two vectors

# Usage of PETSc Python-context to assemble to outer product of two vectors matrix free.
# Author: Jørgen S. Dokken
# SPDX-license License-Identifier: MIT
from mpi4py import MPI
from petsc4py import PETSc
import dolfinx.fem.petsc
import ufl
import numpy as np


class CustomOperator:
    def __init__(self, form1, form2):
        self.form1 = dolfinx.fem.form(form1)
        self.form2 = dolfinx.fem.form(form2)
        self.b1 = dolfinx.fem.petsc.create_vector(self.form1)
        self.b2 = dolfinx.fem.petsc.create_vector(self.form2)

        self._assemble_vectors()

    def _assemble_vectors(self):
        with self.b1.localForm() as loc1, self.b2.localForm() as loc2:
            loc1.set(0)
            loc2.set(0)
        dolfinx.fem.petsc.assemble_vector(self.b1, self.form1)
        self.b1.ghostUpdate(PETSc.InsertMode.ADD, PETSc.ScatterMode.REVERSE)
        self.b1.ghostUpdate(PETSc.InsertMode.INSERT, PETSc.ScatterMode.FORWARD)

        dolfinx.fem.petsc.assemble_vector(self.b2, self.form2)
        self.b2.ghostUpdate(PETSc.InsertMode.ADD, PETSc.ScatterMode.REVERSE)
        self.b2.ghostUpdate(PETSc.InsertMode.INSERT, PETSc.ScatterMode.FORWARD)

    def mult(self, mat, X, Y):
        # Reassemble vectors, in case of coefficients or other time dependecy
        self._assemble_vectors()

        # outer(b1, b2) X = dot(b2, X) b1
        val = self.b2.dot(X)
        self.b1.copy(Y)
        Y.scale(val)

    def getDiagonal(self, mat, vec):
        vec.pointwiseMult(self.b1, self.b2)


mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 15, 7)

V = dolfinx.fem.functionspace(mesh, ("Lagrange", 2))
u = ufl.TrialFunction(V)
v = ufl.TestFunction(V)

x = ufl.SpatialCoordinate(mesh)
form1 = dolfinx.fem.form(u * ufl.ds)
form2 = dolfinx.fem.form(v * ufl.dx)

assembler = CustomOperator(form1, form2)


sp = dolfinx.cpp.la.SparsityPattern(
    mesh.comm,
    [V.dofmap.index_map, V.dofmap.index_map],
    [V.dofmap.index_map_bs, V.dofmap.index_map_bs],
)
sp.finalize()
A = dolfinx.cpp.la.petsc.create_matrix(mesh.comm, sp, "python")
A.setPythonContext(assembler)

# x = ufl.SpatialCoordinate(mesh)
# f = x[0] * v * ufl.dx

# # Set up matrix free solver
# ksp = PETSc.KSP().create(mesh.comm)
# ksp.setOperators(A)
# ksp.setType("cg")
# ksp.getPC().setType("jacobi")
# ksp.setErrorIfNotConverged(True)

# uh = dolfinx.fem.Function(V)
# b = dolfinx.fem.petsc.assemble_vector(dolfinx.fem.form(f))
# b.ghostUpdate(PETSc.InsertMode.ADD, PETSc.ScatterMode.REVERSE)
# ksp.solve(b, uh.x.petsc_vec)
# print(uh.x.array)


#  Verification
c = dolfinx.fem.petsc.create_vector(dolfinx.fem.form(form1))
with c.localForm() as loc:
    loc.set(1)
d = dolfinx.fem.petsc.create_vector(dolfinx.fem.form(form1))
A.mult(c, d)


A_numpy = np.outer(assembler.b1.array, assembler.b2.array)
d_numpy = np.dot(A_numpy, c.array)
np.testing.assert_allclose(d.array, d_numpy)
print(d.array, d_numpy)

Note that with such an approach, one cannot use a direct solver, as the full dense matrix is newer actually formed.

OK, thanks! The assembly works fine. However, I now have some follow-up questions:

  1. I still have the problem, that the final matrix is of the form A + B_1 B_2^\top, where now we can assemble B := B_1 B_2^\top. However, how can we do the sum now? I tried A.axpy(1.0, B), but this throws the following error:
    A.axpy(1.0, B)
  File "petsc4py/PETSc/Mat.pyx", line 4174, in petsc4py.PETSc.Mat.axpy
petsc4py.PETSc.Error: error code 56
[0] MatAXPY() at /Users/runner/miniforge3/conda-bld/bld/rattler-build_petsc_1741089228/work/src/mat/utils/axpy.c:133
[0] MatAXPY_BasicWithTypeCompare() at /Users/runner/miniforge3/conda-bld/bld/rattler-build_petsc_1741089228/work/src/mat/utils/axpy.c:77
[0] MatAXPY_Basic() at /Users/runner/miniforge3/conda-bld/bld/rattler-build_petsc_1741089228/work/src/mat/utils/axpy.c:238
[0] MatAXPY_Basic_Preallocate() at /Users/runner/miniforge3/conda-bld/bld/rattler-build_petsc_1741089228/work/src/mat/utils/axpy.c:172
[0] MatGetRow() at /Users/runner/miniforge3/conda-bld/bld/rattler-build_petsc_1741089228/work/src/mat/interface/matrix.c:578
[0] No support for this operation for this object type
[0] No method getrow for Mat of type python
  1. How can I apply Dirichlet BCs to the final system? Does anything change due to the new assembler?
  2. Both vectors B_1 and B_2 are very sparse due to the underlying mathematics. Is there any way to take this into account?

Dear Philipp,
My guess for your question 1 about implementing A+BB^T is to use the matrix shell technique again as Dokken did to implement the CustomOperator . At initialization of CustomOperator, you pass your A as an argument and save it into the custom operator and as you implement mult method and getDiagonal method, you modify them slightly to take into account the effect of A. I think the reason why A.axpy(1.0,B) failed for you is because the second part B is not assembled as a csr matrix but is implemented implicitly, it is dense, in fact, if you really assemble every element.
for 2 I think you should do something like apply lifting, https://docs.fenicsproject.org/dolfinx/main/python/generated/dolfinx.fem.petsc.html#dolfinx.fem.petsc.apply_lifting so that you enforce the boundary conditions. But I’m not exactly sure …
for 3 I think you don’t really need to worry because if you solve the system with Krylov space method any way the bottle neck of each iteration will be the multiplication A*x and improving the logic of BB^T with sparsity doesnot seem to help a lot.

Otherwise you can also solve (A+BB^T)x=l without introducing matrix shell as Dokken did, which was very elegant anyway. If you use the [Sherman–Morrison formula - Wikipedia](Sherman-Morrison Formula), the inverse of the matrix A+uv^T is
\begin{equation} A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u} \end{equation}
so what you should do is to do two linear solves Ay=u and Ax=l and the solution to your system will be \begin{equation} (A+uv^T)z = l\iff z = x-\frac{v^Tx}{1+v^Ty}y \end{equation}. Doing so you only need to use A which you have already. The price to pay is that now you need to do two linear solves.

I hope this can help.

I am attaching a work in progress on matrix free assemblers:

which shows how to handle DirichletBCs.

Note that the system is matrix-free, in the sense that A is never assembled.
If you combine this new code with the previous one I supplied, you can add the mat-vec product of each these assembly operations into a single vector.

As I stated in the first post, if you know that it is super-sparse, you could go down the route of:

(look for flame matrix).

Note that the matrix-free approach only stores 2xnum_dofs arrays in memory to represent the system, rather than the dense operator A that is the result of the outer product of b1 and b2.