Dear all,
Does dolfinx have its own performance analysis tool?
What is the usage of matrix or vector memory?
I tried to use petsc4py to get memory, but it failed.
There exists a “performance test mini app” which is typically helpful to ensure good performance of DOLFINx on HPCs: GitHub - FEniCS/performance-test: Mini App for FEniCSx performance testing.
But, as far as I’m aware there are no performance analysis tools specifically designed for DOLFINx for memory usage. From my own experience I use the memory analysis tools offered by the HPC I’m using.
You can get quite alot of information through petsc4py.
For instance with:
from mpi4py import MPI
import petsc4py
petsc4py.init(["-memory_view"])
import dolfinx.fem.petsc
import ufl
mesh = dolfinx.mesh.create_unit_cube(MPI.COMM_WORLD, 50, 50, 50)
V = dolfinx.fem.functionspace(mesh, ("Lagrange", 2))
u, v = ufl.TestFunction(V), ufl.TrialFunction(V)
a = u * v * ufl.dx
A = dolfinx.fem.petsc.assemble_matrix(dolfinx.fem.form(a))
A.assemble()
print(A.getInfo(A.InfoType.GLOBAL_SUM))
This gives you
{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
{'block_size': 1.0, 'nz_allocated': 29096201.0, 'nz_used': 29096201.0, 'nz_unneeded': 0.0, 'memory': 0.0, 'assemblies': 1.0, 'mallocs': 0.0, 'fill_ratio_given': 0.0, 'fill_ratio_needed': 0.0, 'factor_mallocs': 0.0}
Summary of Memory Usage in PETSc
Maximum (over computational time) process memory: total 1.1909e+09 max 4.3148e+08 min 3.4740e+08
Current process memory: total 1.1884e+09 max 4.3065e+08 min 3.4644e+08
Adding “-log_view” yields
****************************************************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
mwe_2.py on a linux-gnu-real64-32 named dokken-XPS-9320 with 3 processes, by Unknown on Fri Feb 28 16:54:19 2025
Using Petsc Release Version 3.22.0, unknown
Max Max/Min Avg Total
Time (sec): 2.274e+00 1.000 2.274e+00
Objects: 0.000e+00 0.000 0.000e+00
Flops: 1.538e+07 1.031 1.515e+07 4.546e+07
Flops/sec: 6.766e+06 1.031 6.665e+06 1.999e+07
MPI Msg Count: 8.000e+00 1.000 8.000e+00 2.400e+01
MPI Msg Len (bytes): 2.470e+06 1.301 2.702e+05 6.484e+06
MPI Reductions: 2.200e+01 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 2.2735e+00 100.0% 4.5458e+07 100.0% 2.400e+01 100.0% 2.702e+05 100.0% 5.000e+00 22.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 3 1.0 5.5104e-02 46.1 0.00e+00 0.0 1.2e+01 4.0e+00 0.0e+00 1 0 50 0 0 1 0 50 0 0 0
BuildTwoSidedF 2 1.0 5.4999e-02 46.1 0.00e+00 0.0 1.2e+01 5.2e+05 0.0e+00 1 0 50 95 0 1 0 50 95 0 0
SFSetGraph 1 1.0 1.8521e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 1 1.0 3.0662e-04 1.2 0.00e+00 0.0 1.2e+01 2.5e+04 0.0e+00 0 0 50 5 0 0 0 50 5 0 0
MatAssemblyBegin 2 1.0 6.6959e-02 4.8 0.00e+00 0.0 1.2e+01 5.2e+05 0.0e+00 1 0 50 95 0 1 0 50 95 0 0
MatAssemblyEnd 2 1.0 2.2859e-02 1.1 7.32e+04 1.3 1.2e+01 2.5e+04 1.0e+00 1 0 50 5 5 1 0 50 5 20 8
------------------------------------------------------------------------------------------------------------------------
Object Type Creations Destructions. Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 2 2
IS L to G Mapping 1 0
Star Forest Graph 1 0
Vector 2 1
Matrix 3 0
========================================================================================================================
1 Like