Issue in the size of sparse or dense matrix constructed from assemble_matrix for MPI

Hello All,

I am trying to perform the linear algebric operation Ax=b, where A matrix (denoted as Kf here) is obtained from the bilinear part of Variation formulation, and I have b (denoted as phirhs_petsc here) vector transformed to PETSc format.

Initialization

from mpi4py import MPI
from petsc4py import PETSc
import numpy as np
from dolfinx.la import InsertMode
from dolfinx.fem import (Function, FunctionSpace, form)
from dolfinx.fem.petsc import (apply_lifting, assemble_matrix, assemble_vector, set_bc )
from dolfinx.mesh import (CellType, GhostMode, create_box, create_rectangle, create_unit_square, locate_entities_boundary, meshtags)
from ufl import dx, grad, dot, inner, SpatialCoordinate, Measure, inv
from ufl import TrialFunction, TestFunction
from scipy.sparse import csr_matrix,lil_matrix

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")

dtype = PETSc.ScalarType  # type: ignore
comm = MPI.COMM_WORLD
rank = comm.rank

# Mesh (Either regular or Gmesh)
domain = create_unit_square(MPI.COMM_WORLD, 2, 2, ghost_mode=GhostMode.shared_facet)

tdim = domain.topology.dim
# Assuming `mesh` is the mesh created in the code
num_elements = domain.topology.index_map(tdim).size_local
num_nodes = domain.topology.index_map(0).size_local
# Summing up the counts from all processes in MPI_COMM_WORLD
num_elements_global = MPI.COMM_WORLD.allreduce(num_elements, op=MPI.SUM)
num_nodes_global = MPI.COMM_WORLD.allreduce(num_nodes, op=MPI.SUM)
num_cells = domain.topology.index_map(tdim).size_local + domain.topology.index_map(tdim).num_ghosts
num_vertices = domain.topology.index_map(0).size_local + domain.topology.index_map(0).num_ghosts


#Fespace
Vh1 = FunctionSpace(domain, ("CG", 1))
Vh1dc = FunctionSpace(domain, ("DG", 1))
Vh0 = FunctionSpace(domain, ("DG", 0))


Transforming the local numpy array to PETSc array

num_dofs_local = Vh1.dofmap.index_map.size_local
phirhs = Trf*rhoO
phirhsNO = Function(Vh1)
phirhsNO = phirhs[:num_dofs_local]

# Convert numpy array to PETSc vector
phirhs_petsc = PETSc.Vec().createWithArray(phirhsNO)
mpi_print(f"Size of phirhs vector: {len(phirhs_petsc.array)}")

Steps to validate the size of Kf matrix compatible with phirhs_petsc vector locally:

uf, vf = TrialFunction(Vh1), TestFunction(Vh1)
a = form(inner(grad(uf), grad(vf)) * dx + uf * vf * dx)
bc = []
Kf = assemble_matrix(a, bc)
Kf.assemble()

# Construct CSR matrix from PETSc matrix
assert isinstance(Kf, PETSc.Mat)
ai, aj, av = Kf.getValuesCSR()
Asp = csr_matrix((av, aj, ai))
mpi_print(f"Size of Kf in sparse matrix form: {Asp.shape}")

To test the size the Kf, I converted to sparse matrix as in Transform dolfinx assemble matrix to numpy array

Results from previous 2 sub codes for the 2 processors

Rank 0: Size of phirhs vector: 4
Rank 0: Size of Kf in sparse matrix form: (4, 8)
Rank 1: Size of phirhs vector: 5
Rank 1: Size of Kf in sparse matrix form: (5, 9)

And when I convert Kf to dense matrix to validate the size, they get spread on global mesh when I run run on MPI

# Construct dense matrix from PETSc matrix
C = Kf.convert('dense')
C.getDenseArray()
mpi_print(f"Size of Kf in dense matrix form: {C.size}")

Result from 2 processors

Rank 0: Size of phirhs vector: 4
Rank 0: Size of Kf in sparse matrix form: (4, 8)
Rank 0: Size of Kf in dense matrix form: (9, 9)
Rank 1: Size of phirhs vector: 5
Rank 1: Size of Kf in sparse matrix form: (5, 9)
Rank 1: Size of Kf in dense matrix form: (9, 9)

How can I make sure there is compatibility for matrix vector multiplication using MPI, as I am getting the satisfactory results for sequential?
Thanks

Just convert Asp to a numpy array (using Asp.toarray(), if memory serves).

Furthermore, note that:

  1. you can carry out most of the algebraic operations in petsc, without any need to convert matrix/vector to scipy/numpy
  2. if you don’t want to assemble a petsc matrix/vector, you may as well import assemble_matrix/assemble_vector from dolfinx.fem instead of dolfinx.fem.petsc
1 Like

Thanks @francesco-ballarin

Exactly that is what I was trying to do before

Until I found my results for the ouput vector off in MPI as compared to sequential

MWE for algebraic operation

num_dofs_local = Vh1.dofmap.index_map.size_local

phirhs = Trf*rhoO
phirhsNO = Function(Vh1)
phirhsNO = phirhs[:num_dofs_local]

# Convert numpy array to PETSc vector
phirhs_petsc = PETSc.Vec().createWithArray(phirhsNO)

uf, vf = TrialFunction(Vh1), TestFunction(Vh1)
a = form(inner(grad(uf), grad(vf)) * dx + uf * vf * dx)
bc = []
Kf = assemble_matrix(a, bc)
Kf.assemble()
apply_lifting(phirhs_petsc, [a], [bc])

solver = PETSc.KSP().create(domain.comm)
solver.setOperators(Kf)
solver.setType(PETSc.KSP.Type.PREONLY)
solver.getPC().setType(PETSc.PC.Type.LU)

opvec = Function(Vh1)
solver.solve(phirhs_petsc, opvec.vector)
opvec.x.scatter_forward()
opvecO = opvec.x.array
mpi_print(f"Size of output vector: {len(opvecO)}")
mpi_print(f"Range of output vector:{max(opvecO) , min(opvecO)}")

Using Sequential

Rank 0: Size of output vector: 9
Rank 0: Range of output vector:(0.6000000000000011, 0.6000000000000009)

Using MPI (2 Procs)

Rank 0: Size of output vector: 8
Rank 0: Range of output vector:(0.5777647955458147, 0.5711207707117077)
Rank 1: Size of output vector: 8
Rank 1: Range of output vector:(0.5777647955458147, 0.5711207707117077)

The size of the output vector is understandable it is on local+ghosts, I am more concerned about the range.
What I am doing wrong here?

Your code is not complete. Could you please provide a complete demo to show your attempt? And what
is you expectation?

1 Like

@chenyongxin

Well, the complete code was little long to put here, but if it is difficult to understand my query from MWE, I will put the complete code and try to reexplain what I am trying to achieve.

Basically I want to perform this simple operation using MPI
Screenshot from 2024-04-10 13-47-58

n = # of nodes , m = # of columns
where rho is iniatialized on elements
Tf contains the basis funcitons of discontinuous galerkin elements of order 1 (DG1) arranged in sparse matrix form of size (nXm)


from mpi4py import MPI
from petsc4py import PETSc
import numpy as np
from dolfinx.la import InsertMode
from dolfinx.fem import (Function, FunctionSpace, form)
from dolfinx.fem.petsc import (apply_lifting, assemble_matrix, assemble_vector, set_bc )
# from dolfinx.fem import assemble_matrix, assemble_vector
from dolfinx.mesh import (CellType, GhostMode, create_box, create_rectangle, create_unit_square, locate_entities_boundary, meshtags)
from ufl import dx, grad, dot, inner, SpatialCoordinate, Measure, inv
from ufl import TrialFunction, TestFunction
from scipy.sparse import csr_matrix,lil_matrix

def mpi_print(s):
    print(f"Rank {comm.rank}: {s}")
def mpi_print2(s1,s2):
    print(f"Rank {comm.rank}: {s1}, {s2}")

dtype = PETSc.ScalarType  # type: ignore
comm = MPI.COMM_WORLD
rank = comm.rank

# Mesh (Either regular or Gmesh)
domain = create_unit_square(MPI.COMM_WORLD, 2, 2, ghost_mode=GhostMode.shared_facet)

tdim = domain.topology.dim
# Assuming `mesh` is the mesh created in the code
num_elements = domain.topology.index_map(tdim).size_local
num_nodes = domain.topology.index_map(0).size_local
# Summing up the counts from all processes in MPI_COMM_WORLD
num_elements_global = MPI.COMM_WORLD.allreduce(num_elements, op=MPI.SUM)
num_nodes_global = MPI.COMM_WORLD.allreduce(num_nodes, op=MPI.SUM)
num_cells = domain.topology.index_map(tdim).size_local + domain.topology.index_map(tdim).num_ghosts
num_vertices = domain.topology.index_map(0).size_local + domain.topology.index_map(0).num_ghosts


#Fespace
Vh1 = FunctionSpace(domain, ("CG", 1))
Vh1dc = FunctionSpace(domain, ("DG", 1))
Vh0 = FunctionSpace(domain, ("DG", 0))

def NO2overlapfe(rho):
    rho.x.scatter_reverse(InsertMode.add)
    rho.x.scatter_forward()
    rhoO = rho.x.array
    return rhoO

def NO2overlapvector(b_area,cell_area):
    assemble_vector(b_area.vector, cell_area)
    b_area.x.scatter_reverse(InsertMode.add)
    b_area.x.scatter_forward()
    areaK = b_area.x.array
    return areaK

# volume of the element
varea = TestFunction(Vh0)
cell_area = (form(varea*dx))
b_area = Function(Vh0)
areaK = NO2overlapvector(b_area,cell_area)

rho = Function(Vh0)
rho.vector.setArray(0.6)
rhoO = NO2overlapfe(rho)

etaf = TestFunction(Vh1dc)
Rhdc = (form(etaf*dx))
etaf_a = Function(Vh1dc)
RhdcO = NO2overlapvector(etaf_a,Rhdc)

# Mesh Connectivity
c2v = domain.topology.connectivity(domain.topology.dim, 0)
# Initialize sparse matrices
Trf = lil_matrix((num_vertices, num_cells))

for cell in range(num_cells):
    vertices = c2v.links(cell)
    dofs = Vh1dc.dofmap.cell_dofs(cell)
    for i, vertex in enumerate(vertices):
        Trf[vertex, cell] = RhdcO[dofs[i]]
Trf = Trf.tocsr()

Further Kf matrix is obtained from the weak formulation

num_dofs_local = Vh1.dofmap.index_map.size_local
phirhs = Trf*rhoO
phirhsNO = Function(Vh1)
phirhsNO = phirhs[:num_dofs_local]

# Convert numpy array to PETSc vector
phirhs_petsc = PETSc.Vec().createWithArray(phirhsNO)

uf, vf = TrialFunction(Vh1), TestFunction(Vh1)
a = form(inner(grad(uf), grad(vf)) * dx + uf * vf * dx)
bc = []
Kf = assemble_matrix(a, bc)
Kf.assemble()
apply_lifting(phirhs_petsc, [a], [bc])

solver = PETSc.KSP().create(domain.comm)
solver.setOperators(Kf)
solver.setType(PETSc.KSP.Type.PREONLY)
solver.getPC().setType(PETSc.PC.Type.LU)

opvec = Function(Vh1)
solver.solve(phirhs_petsc, opvec.vector)
opvec.x.scatter_forward()
opvecO = opvec.x.array
opvecNO = opvecO[:num_dofs_local]
mpi_print(f"Size of output vector: {len(opvecNO)}")
mpi_print(f"Range of output vector:{max(opvecNO) , min(opvecNO)}")

There are few of the variable notations used in the code contains the suffix O or NO, where O represents owner dofs, whereas NO represents owner+ghosts dofs.

Comparison of results output vectors on serial and MPI
Serial:

Rank 0: Size of output vector: 9
Rank 0: Range of output vector:(0.6000000000000011, 0.6000000000000009)

MPI 2 procs:

Rank 0: Size of output vector: 4
Rank 0: Range of output vector:(0.5777647955458146, 0.5757310754894136)
Rank 1: Size of output vector: 5
Rank 1: Range of output vector:(0.5777647955458147, 0.5711207707117077)

I am still a little lost with your question. Let me guess that you want to say when you make it parallel with 2 processes, the value range of the partitioned vector opvecO is different from that of a serial run as quoted above?

If that is your confusion, just to check the RHS vector phirhs in serial VS in parallel:

num_dofs_local = Vh1.dofmap.index_map.size_local
phirhs = Trf*rhoO
mpi_print(f"phirhs: {np.sort(phirhs)}")

and it gives the following with a 2-process run

Rank 0: phirhs: [0.025 0.025 0.025 0.05  0.05  0.075 0.075 0.125]
Rank 1: phirhs: [0.025 0.025 0.025 0.05  0.05  0.075 0.075 0.125]

and it gives the following with a serial run

Rank 0: phirhs: [0.025 0.025 0.05  0.05  0.075 0.075 0.075 0.075 0.15 ]

So it means your problem is actually the phirhs is computed differently from the matrix-vector multiplication because the A matrix Trf is built locally but not globally in parallel:

Please consider the suggestion from @francesco-ballarin to replace the building of your matrix Trf directly in PETSc format instead of numpy or scipy conversion whatever:

1 Like