C++ assemble_matrix_block - size mismatch w.r.t python

Good morning everyone,

1. I am attempting to translate the multiphenicsx.fem.petsc.create_matrix_block to cpp. Similarly, the function will take the Jacobian matrix and tuple of the multiphenicsx DofMapRestriction. Also, I do know that both codes will essentially end up using the C++ source code of multiphenicsx and dolfinx, which is why I’m even more lost and therefore asking for help.

Mat _create_matrix_block(
    const std::vector<std::vector<std::shared_ptr<dolfinx::fem::Form<PetscScalar>>>>& J,
    const std::array<std::vector<std::shared_ptr<multiphenicsx::fem::DofMapRestriction>>, 2>&
        restriction_J)

which will,

return multiphenicsx::fem::petsc::create_matrix_block(
        J_raw, index_maps, index_maps_bs, dofmaps_list, dofmaps_bounds);

2. Now consider this minimal python example:

import dolfinx as dfx
import multiphenicsx.fem
from mpi4py import MPI
from petsc4py.PETSc import ScalarType
import numpy as np
import ufl
from solver import NonlinearBlockProblem, NewtonBlockSolver

mesh = dfx.mesh.create_unit_interval(MPI.COMM_WORLD, 100)
boundaries = [(1, lambda x: np.isclose(x[0], 0)),
              (2, lambda x: np.isclose(x[0], 1)),]

left_facets = dfx.mesh.locate_entities(mesh, 0, boundaries[0][1])
right_facets = dfx.mesh.locate_entities(mesh, 0, boundaries[1][1])

facet_indices, facet_markers = [], []
fdim = mesh.topology.dim - 1
for (marker, locator) in boundaries:
    facets = dfx.mesh.locate_entities(mesh, fdim, locator)
    facet_indices.append(facets)
    facet_markers.append(np.full(len(facets), marker))
facet_indices = np.array(np.hstack(facet_indices), dtype=np.int32)
facet_markers = np.array(np.hstack(facet_markers), dtype=np.int32)
sorted_facets = np.argsort(facet_indices)
facet_tag = dfx.mesh.meshtags(
    mesh, fdim, facet_indices[sorted_facets], facet_markers[sorted_facets])

PHI_E = dfx.fem.FunctionSpace(mesh, ('CG', 1))
LM_IAPP = PHI_E.clone()
phi_e, lm_iapp = dfx.fem.Function(PHI_E), dfx.fem.Function(LM_IAPP)
dphi_e, dlm_iapp = ufl.TrialFunction(PHI_E), ufl.TrialFunction(LM_IAPP)
v_phi, v_lm_iapp = ufl.TestFunction(PHI_E), ufl.TestFunction(LM_IAPP)

dx = ufl.Measure('dx', mesh)
ds = ufl.Measure('ds', mesh, subdomain_data=facet_tag)

I_app = dfx.fem.Constant(mesh, ScalarType(10))
V_app = dfx.fem.Constant(mesh, ScalarType(1))
switch = dfx.fem.Constant(mesh, ScalarType(0))

residuals = [
    ufl.inner(ufl.grad(phi_e), ufl.grad(v_phi)) * dx - lm_iapp * v_phi * ds(2),
    ((1 - switch) * (I_app - lm_iapp) * v_lm_iapp * ds(2)
     + switch * (V_app - phi_e) * v_lm_iapp * ds(2))
]
jacobian = [[ufl.derivative(r_i, u_i, du_i)
             for u_i, du_i in zip([phi_e, lm_iapp], [dphi_e, dlm_iapp])] for r_i in residuals]
bc = dfx.fem.dirichletbc(dfx.fem.Constant(mesh, ScalarType(0)),
                         dfx.fem.locate_dofs_topological(PHI_E, 0, left_facets), PHI_E)
restriction=[
        multiphenicsx.fem.DofMapRestriction(
            PHI_E.dofmap, np.arange(0, (PHI_E.dofmap.index_map.size_local
                                        + PHI_E.dofmap.index_map.num_ghosts))),
        multiphenicsx.fem.DofMapRestriction(
            LM_IAPP.dofmap, dfx.fem.locate_dofs_topological(LM_IAPP, 0, right_facets))
    ]

3. Now it is important to note that a class that takes in the residuals, bcs, jacobian, functions etc. is pybinded. So for all dolfinx objects, their cpp_object is passed, with restriction_J being a tuple (restriction, restriction). The key point to note is that _create_matrix_block is isolated, so I hope the example is transparent.

So the arguments for

 Mat create_matrix_block(
    const std::vector<std::vector<const dolfinx::fem::Form<PetscScalar, T>*>>& a,
    std::array<std::vector<std::reference_wrapper<const dolfinx::common::IndexMap>>, 2> index_maps,
    const std::array<std::vector<int>, 2> index_maps_bs,
    std::array<std::vector<std::span<const std::int32_t>>, 2> dofmaps_list,
    std::array<std::vector<std::span<const std::size_t>>, 2> dofmaps_bounds,
    const std::string& matrix_type = std::string()), 

in this example problem are:
i) In C++:


ii) In python:

With from what I see, the only difference being the data types.

4. In multiphenicsx::fem::petsc::create_matrix_block, dolfinx::la::petsc::create_matrix which will return a petsc matrix.

However,

 Mat A = dolfinx::la::petsc::create_matrix(mesh->comm(), pattern, matrix_type);
PetscInt m, n, M, N;
MatGetLocalSize(A, &m, &n);
MatGetSize(A, &M, &N);

the local and global sizes are 202 for both (code is running in series) whereas in python they are 101 for both.

I found out as I was assembling the matrix, the first 101 values where identical, but the matrix generated solely on C++ was four times the size.

Appendix:

Mat _create_matrix_block(
    const std::vector<std::vector<std::shared_ptr<dolfinx::fem::Form<PetscScalar>>>>& J,
    const std::array<std::vector<std::shared_ptr<multiphenicsx::fem::DofMapRestriction>>, 2>&
        restriction_J)
{
    std::vector<std::vector<dfx_space>> function_spaces = _get_block_function_spaces(J);
    int rows = static_cast<int>(function_spaces[0].size());
    int columns = static_cast<int>(function_spaces[1].size());

    std::shared_ptr<const dolfinx::mesh::Mesh<T>> mesh = nullptr;
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < columns; j++)
        {
            auto J_ij = J[i][j];
            if (J_ij)
            {
                mesh = J_ij->mesh();
                break;
            }
        }
        if (mesh)
            break;
    }

    // Asserts & checks
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < columns; j++)
        {
            auto J_ij = J[i][j];
            if (J_ij)
            {
                assert(J_ij->mesh() == mesh);
            }
        }
    }
    for (int i = 0; i < 2; i++)
    {
        for (const auto& function_space: function_spaces[i])
        {
            assert(function_space->mesh() == mesh);
        }
    }

    // Get index_maps, index_maps_bs, dofmaps_list, dofmaps_bounds
    std::array<std::vector<std::reference_wrapper<const dfx_indexmap>>, 2> index_maps;
    std::array<std::vector<int>, 2> index_maps_bs;
    std::array<std::vector<std::span<const std::int32_t>>, 2> dofmaps_list;
    std::array<std::vector<std::span<const std::size_t>>, 2> dofmaps_bounds;

    // Only restricted case considered
    assert(static_cast<int>(restriction_J[0].size()) == rows);
    assert(static_cast<int>(restriction_J[1].size()) == columns);
    std::array<int, 2> dims = {rows, columns};
    for (int i = 0; i < 2; ++i)
    {
        for (int j = 0; j < dims[i]; ++j)
        {
            assert(function_spaces[i][j]->mesh() == mesh);
            const auto& J_ij = restriction_J[i][j];
            index_maps[i].push_back(std::cref(*(J_ij->dofmap()->index_map)));
            index_maps_bs[i].push_back(J_ij->index_map_bs());
            dofmaps_list[i].push_back(J_ij->map().first);
            dofmaps_bounds[i].push_back(J_ij->map().second);
        }
    }

    // Convert shared_ptr to raw pointers (needed for create_matrix_block)
    std::vector<std::vector<const dolfinx::fem::Form<PetscScalar, PetscReal>*>> J_raw(J.size());
    for (size_t i = 0; i < J.size(); ++i)
    {
        J_raw[i].resize(J[i].size());
        for (size_t j = 0; j < J[i].size(); ++j)
        {
            auto J_ij = J[i][j].get();
            if (J_ij)
            {
                J_raw[i][j] = J_ij;
            }
        }
    }
    // ///
    // // Print inputs
    // std::cout << "J_raw:\n";
    // for (const auto& row : J_raw) {
    //     for (const auto& elem : row) {
    //         std::cout << elem << " ";
    //     }
    //     std::cout << "\n";
    // }

    // std::cout << "index_maps:\n";
    // for (const auto& idx_map_list : index_maps) {
    //     for (const auto& idx_map : idx_map_list) {
    //         std::cout << idx_map.get().size_local() << " ";
    //     }
    //     std::cout << "\n";
    // }

    // std::cout << "index_maps_bs:\n";
    // for (const auto& bs_list : index_maps_bs) {
    //     for (const auto& bs : bs_list) {
    //         std::cout << bs << " ";
    //     }
    //     std::cout << "\n";
    // }

    // std::cout << "dofmaps_list:\n";
    // for (const auto& dofmap_list : dofmaps_list) {
    //     for (const auto& dofmap : dofmap_list) {
    //         for (const auto& val : dofmap) {
    //             std::cout << val << " ";
    //         }
    //         std::cout << "\n";
    //     }
    //     std::cout << "\n";
    // }

    // std::cout << "dofmaps_bounds:\n";
    // for (const auto& dofmap_bound_list : dofmaps_bounds) {
    //     for (const auto& dofmap_bound : dofmap_bound_list) {
    //         for (const auto& val : dofmap_bound) {
    //             std::cout << val << " ";
    //         }
    //         std::cout << "\n";
    //     }
    //     std::cout << "\n";
    // }
    // ////

    /// @param[in] a A bilinear form
    /// @param[in] index_maps A pair of index maps. Row index map is given by index_maps[0], column
    /// index map is given by index_maps[1].
    /// @param[in] index_maps_bs A pair of int, representing the block size of index_maps.
    /// @param[in] dofmaps_list An array of spans containing the dofmaps list. Row dofmap is given by
    /// dofmaps[0], while column dofmap is given by dofmaps[1].
    /// @param[in] dofmaps_bounds An array of spans containing the dofmaps cell bounds.
    /// @param[in] matrix_type = std::string(), auto decides depending on if size_local == size_global
    return multiphenicsx::fem::petsc::create_matrix_block(
        J_raw, index_maps, index_maps_bs, dofmaps_list, dofmaps_bounds, std::string());
}

edit 1: image upload error
edit 2: correct _assemble_matrix_block in appendix
edit 3: expected output from assemble_matrix_block

note: In the original minimal example

jacobian = [[ufl.derivative(r_i, u_i, du_i)
             for u_i, du_i in zip([phi_e, lm_iapp], [dphi_e, dlm_iapp])] for r_i in residuals]
restriction=[
        multiphenicsx.fem.DofMapRestriction(
            PHI_E.dofmap, np.arange(0, (PHI_E.dofmap.index_map.size_local
                                        + PHI_E.dofmap.index_map.num_ghosts))),
        multiphenicsx.fem.DofMapRestriction(
            LM_IAPP.dofmap, dfx.fem.locate_dofs_topological(LM_IAPP, 0, right_facets))
    ]
A = multiphenicsx.fem.petsc.create_matrix_block(
            J, restriction=(restriction, restriction))
 
# View matrix of zeroes            
A.zeroEntries()
A.assemble()
A.view()

where,

Mat Object: 1 MPI process
  type: seqaij
row 0:
row 1:
row 2:
row 3:
row 4:
...
row 99:
row 100:
row 101:

Update 1:

I was following the creation of sparsity->index_map(dim), in multiphenicsx::fem::petsc::create_matrix_block just at “computing the offsets for the fields” just before creating a SparsityPattern,

std::array<std::vector<std::pair<
                 std::reference_wrapper<const dolfinx::common::IndexMap>, int>>,
             2>
      maps_and_bs;
  for (std::size_t d = 0; d < 2; ++d)
  {
    for (std::size_t f = 0; f < index_maps[d].size(); ++f)
    {
      maps_and_bs[d].emplace_back(index_maps[d][f], index_maps_bs[d][f]);
    }
  }

const auto [rank_offset0, local_offset0, ghosts_new0, owners0]
      = common::stack_index_maps(maps_and_bs[0]);
    const auto [rank_offset1, local_offset1, ghosts_new1, owners1]
      = common::stack_index_maps(maps_and_bs[1]);

    std::cout << "local_offset0:\n";
    
    for (const auto& val : local_offset0) {
        std::cout << val << " ";
    }
    std::cout << "\n";

    std::cout << "local_offset1:\n";
    
    for (const auto& val : local_offset1) {
        std::cout << val << " ";
    }
    std::cout << "\n";

We get the following offsets
image.

Update 2:

I think the error comes from how I’m handling J_raw, I tried creating a matrix block with no restrictions using

dolfinx::fem::petsc::create_matrix_block(const std::vector<std::vector<const Form<PetscScalar, T>*>>& a, const std::string& type = std::string())

with J_raw as my only input, extracted as follows:

1. Python setup

J: List[List[ufl.Form]]
 J = dfx.fem.form(J)
 J_cpp = [[None if form is None else form._cpp_object for form in forms] for forms in J]

where,

In [1]: J_cpp
Out[1]: 
[[<dolfinx.cpp.fem.Form_float64 object at 0x7f0c3aadf670>, 
<dolfinx.cpp.fem.Form_float64 object at 0x7f0c3aadfef0>], 
[<dolfinx.cpp.fem.Form_float64 object at 0x7f0c3ac7c4f0>, 
<dolfinx.cpp.fem.Form_float64 object at 0x7f0c3ac7caf0>]]

2. cpp / python interface: A pybinded class takes in J_cpp in as,

std::vector<std::vector<std::shared_ptr<dolfinx::fem::Form<PetscScalar>>>> J

3. assemble_matrix_block method in constructor:

// Convert shared_ptr to raw pointers (needed for create_matrix_block)
std::vector<std::vector<const dolfinx::fem::Form<PetscScalar, PetscReal>*>> J_raw(J.size());
for (size_t i = 0; i < J.size(); ++i)
{
    J_raw[i].resize(J[i].size());
    for (size_t j = 0; j < J[i].size(); ++j)
    {
        auto J_ij = J[i][j].get();
        if (J_ij)
        {
            J_raw[i][j] = J_ij;
        }
    }
}
// Print a matrix of zeros
Mat A = dolfinx::fem::petsc::create_matrix_block(J_raw);
MatZeroEntries(_A);
MatAssemblyBegin(_A, MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(_A, MAT_FINAL_ASSEMBLY);
MatView(_A, PETSC_VIEWER_STDOUT_WORLD);

where,

J_raw:
0x55a5126f6450 0x55a5125c7890 
0x55a512199250 0x55a512158890

finally the PETSc Matrix output being,

Mat Object: 1 MPI process
  type: seqaij
row 0:
row 1:
row 2:
row 3:
row 4:
row 5:
...
row 199:
row 200:
row 201:

i.e. same output dimensions as the restricted case.

Since the memory adresses are different I was therefore wondering If I’m doing something wrong in the steps up to and including, defining J_raw.

edit 1: styling

Can you clarify why you are willing to rewrite assemble_matrix_block in C++?
Is it because you only use C++ (and never use python), or for some other reason? My understanding is that it is for “some other reason” (otherwise you wouldn’t be trying to write pybind11 wrappers for it, and I’d like to understand what the reason is.

FYI, multiphenicsx doesn’t have that function in C++ because dolfinx doesn’t either.

I am considering developing a NewtonBlockSolver class in C++ for a large, time-dependent problem with many degrees of freedom (3D mesh). The goal would be to determine if this implementation can significantly reduce simulation time, to contemplate if it is worth pursuing this.

After fixing the bug in create_matrix_block, the next steps will be to launch a benchmark for assemble_matrix_block and other related functions. This could help identify potential improvements in simulation timings, e.g. less complex case.

| Operation        | reps | wall avg | wall tot  | usr avg |   usr tot  | sys avg |   sys tot  |
|------------------|------|----------|-----------|---------|------------|---------|------------|
| Build Residual   | 2634 | 0.159364 | 419.764652| 1.825991| 4809.660000| 2.106879| 5312.151000|
| Build Jacobian   | 758  | 0.361796 | 274.242119| 2.101412| 1592.870000| 2.106187| 1596.490000|
| Update solutions | 2993 | 0.020143 | 60.288914 | 0.297511| 890.450000 | 0.336462| 1007.030000|

If I were you, I’d rather use SNES, which is shown e.g. in tutorial 02 of multiphenicsx. Unless I see strong evidence that SNES is actually the bottleneck, I don’t see how spending your time re-inventing the wheel (in this case, the wheel being the nonlinear solver) is a good use of your time.