Does dolfinx have its own performance analysis tool?

Dear all,
Does dolfinx have its own performance analysis tool?
What is the usage of matrix or vector memory?
I tried to use petsc4py to get memory, but it failed.

There exists a “performance test mini app” which is typically helpful to ensure good performance of DOLFINx on HPCs: GitHub - FEniCS/performance-test: Mini App for FEniCSx performance testing.

But, as far as I’m aware there are no performance analysis tools specifically designed for DOLFINx for memory usage. From my own experience I use the memory analysis tools offered by the HPC I’m using.

You can get quite alot of information through petsc4py.
For instance with:

from mpi4py import MPI
import petsc4py


petsc4py.init(["-memory_view"])


import dolfinx.fem.petsc
import ufl

mesh = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, 50, 50, 50)

V = dolfinx.fem.functionspace(mesh, ("Lagrange", 2))
u, v = ufl.TestFunction(V), ufl.TrialFunction(V)

a = u * v * ufl.dx

A = dolfinx.fem.petsc.assemble_matrix(dolfinx.fem.form(a))
A.assemble()
print(A.getInfo(A.InfoType.GLOBAL_SUM))

This gives you

{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:        total 1.1909e+09 max 4.3148e+08 min 3.4740e+08
Current process memory:                                  total 1.1884e+09 max 4.3065e+08 min 3.4644e+08

Adding “-log_view” yields

****************************************************************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

mwe_2.py on a linux-gnu-real64-32 named dokken-XPS-9320 with 3 processes, by Unknown on Fri Feb 28 16:54:19 2025
Using Petsc Release Version 3.22.0, unknown 

                         Max       Max/Min     Avg       Total
Time (sec):           2.274e+00     1.000   2.274e+00
Objects:              0.000e+00     0.000   0.000e+00
Flops:                1.538e+07     1.031   1.515e+07  4.546e+07
Flops/sec:            6.766e+06     1.031   6.665e+06  1.999e+07
MPI Msg Count:        8.000e+00     1.000   8.000e+00  2.400e+01
MPI Msg Len (bytes):  2.470e+06     1.301   2.702e+05  6.484e+06
MPI Reductions:       2.200e+01     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 2.2735e+00 100.0%  4.5458e+07 100.0%  2.400e+01 100.0%  2.702e+05      100.0%  5.000e+00  22.7%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          3 1.0 5.5104e-02 46.1 0.00e+00 0.0 1.2e+01 4.0e+00 0.0e+00  1  0 50  0  0   1  0 50  0  0     0
BuildTwoSidedF         2 1.0 5.4999e-02 46.1 0.00e+00 0.0 1.2e+01 5.2e+05 0.0e+00  1  0 50 95  0   1  0 50 95  0     0
SFSetGraph             1 1.0 1.8521e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                1 1.0 3.0662e-04 1.2 0.00e+00 0.0 1.2e+01 2.5e+04 0.0e+00  0  0 50  5  0   0  0 50  5  0     0
MatAssemblyBegin       2 1.0 6.6959e-02 4.8 0.00e+00 0.0 1.2e+01 5.2e+05 0.0e+00  1  0 50 95  0   1  0 50 95  0     0
MatAssemblyEnd         2 1.0 2.2859e-02 1.1 7.32e+04 1.3 1.2e+01 2.5e+04 1.0e+00  1  0 50  5  5   1  0 50  5 20     8
------------------------------------------------------------------------------------------------------------------------

Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

           Index Set     2              2
   IS L to G Mapping     1              0
   Star Forest Graph     1              0
              Vector     2              1
              Matrix     3              0
========================================================================================================================
1 Like