Hello,
I’m using GitHub - FEniCS/performance-test: Mini App for FEniCSx performance testing for my performance tests for the Google Summer of Code project detailed in Google Summer of Code
I’ve been trying to make sense of the timings table that is output. For example,
[MPI_MAX] Summary of timings | reps wall avg wall tot
-------------------------------------------------------------------------------------------------------
Build BoxMesh (tetrahedra) | 1 1.900000 1.900000
Build dofmap data | 2 0.060000 0.120000
Build sparsity | 1 0.050000 0.050000
Compute connectivity 2-0 | 1 0.010000 0.010000
Compute dof reordering map | 2 0.005000 0.010000
Compute entities of dim = 2 | 1 0.270000 0.270000
Compute graph partition (SCOTCH) | 1 0.030000 0.030000
Compute local part of mesh dual graph | 2 0.470000 0.940000
Compute local-to-local map | 1 0.010000 0.010000
Compute non-local part of mesh dual graph | 1 0.000000 0.000000
Compute-local-to-global links for global/local adjacency list | 1 0.010000 0.010000
Distribute AdjacencyList nodes to destination ranks | 1 0.070000 0.070000
Distribute row-wise data (scalable) | 1 0.010000 0.010000
GPS: create_level_structure | 2 0.015000 0.030000
Gibbs-Poole-Stockmeyer ordering | 1 0.090000 0.090000
Init MPI | 1 0.080000 0.080000
Init PETSc | 1 0.000000 0.000000
Init dofmap from element dofmap | 2 0.050000 0.100000
Init logging | 1 0.000000 0.000000
PETSc Krylov solver | 1 0.960000 0.960000
SCOTCH: call SCOTCH_dgraphBuild | 1 0.000000 0.000000
SCOTCH: call SCOTCH_dgraphPart | 1 0.020000 0.020000
SparsityPattern::finalize | 1 0.080000 0.080000
Topology: create | 1 0.490000 0.490000
Topology: determine shared index ownership | 1 0.010000 0.010000
Topology: determine vertex ownership groups (owned, undetermined, unowned) | 1 0.080000 0.080000
ZZZ Assemble | 1 0.820000 0.820000
ZZZ Assemble matrix | 1 0.340000 0.340000
ZZZ Assemble vector | 1 0.090000 0.090000
ZZZ Create Mesh | 1 1.900000 1.900000
ZZZ Create RHS function | 1 0.240000 0.240000
ZZZ Create boundary conditions | 1 0.010000 0.010000
ZZZ Create facets and facet->cell connectivity | 1 0.280000 0.280000
ZZZ FunctionSpace | 1 0.060000 0.060000
ZZZ Solve | 1 0.960000 0.960000
I know that summing the columns is wrong because my wall total is a lot longer than what I’ve measured by hand the program takes to run (this is a single-core run.)
I would appreciate it if someone can detail what these numbers mean precisely – I can then also make a PR with this information to the documentation of performance-tests on GitHub.