Testing weak scalability of parallel solver: cg + hypre_amg

georgexxu · June 5, 2020, 2:53am

I was trying to test the weak scalability of parallel computing using conjugate gradient descent as the linear solver and hypre_amg as the preconditioner by working on a simple 2D Poisson equation.

So what I did is solving the Poisson equation on a 2500 * 2500 unit mesh using 1 core, then solving on 5000 * 5000 unit mesh using 4 cores, then solving on 10000*10000 using 16 cores. I only measure the time taken to solve Ax = b in each case. Here is the link to the code.

Here is the result that I got

Issue 1: Since I am using hypre_amg, I was expecting them to spend similar amount of time. However, this is not the case.

Issue 2: I also tried solving on 5000*5000 unit mesh using only one core, in comparison to using 4 cores and found that the number of linear iterations are not the same in the two cases. Is this usual? Shouldn’t the number of iterations be the same using hypre_amg in the two cases?

Could anyone give me some hints please? Since I am not very familiar with parallel computing, please point out if I said something incorrect in the question.

xlai · August 4, 2020, 9:34am

Hi @georgexxu, did you manage to find out why? I have the same results with slightly different setup.

dokken · August 4, 2020, 10:39am

Hypre AMG should not be treated as a black box, and has various options that can be tweaked. As the iteration count is non-constant, you have your first issue.
You should also note that multigrid solvers should be supplied a near nullspace. See for instance:

github.com

jorgensd/dolfinx_mpc/blob/master/python/benchmarks/bench_periodic.py#L214-L228


with dolfinx.common.Timer("MPC: Prepare solver"):
    # Create nullspace
    nullspace = PETSc.NullSpace().create(constant=True)
    PETSc.Mat.setNearNullSpace(A, nullspace)

    # Set PETSc solver options
    opts = PETSc.Options()
    if boomeramg:
        opts["ksp_type"] = "cg"
        opts["ksp_rtol"] = 1.0e-5
        opts["pc_type"] = "hypre"
        opts['pc_hypre_type'] = 'boomeramg'
        opts["pc_hypre_boomeramg_max_iter"] = 1
        opts["pc_hypre_boomeramg_cycle_type"] = "v"
        # opts["pc_hypre_boomeramg_print_statistics"] = 1

for a setup that has a constant number of iterations with hypre-amg.

georgexxu · August 6, 2020, 5:29pm

Nope. But I still go ahead and test the weak and strong scalability of hypre_amg preconditioner in FEniCS. The speedup is around 2 times faster if you quadruples the number of cores, if I remember correctly. So it is still speeding up the program although not at the optimal speed that I was expecting.
Perhaps it is because we are using it as a black box, as mentioned in the next reply by Jorgen, so the optimal speedup is not achieved. But I am not familiar with PETsc, which I guess allows more control over the solvers.

nate · August 6, 2020, 5:48pm

The performance tests for dolfin and dolfin-x provide a suite for benchmarking your hardware. I’ve used these to consistently demonstrate good scaling. Hopefully they can help you pin down where your system, compilation or formulation is experiencing a bottleneck.

If possible, I recommend you benchmark with a native compilation against your system’s MPI before testing with python or a container.

xlai · August 10, 2020, 1:39pm

Thanks Nate, I have tested the setup of fenics on our university cluster and the scaling of the performance test is less than ideal. I tried to look up the figures for a representative performance from here for comparison however it was empty.

Is there any options of using Singularity to reproduce an ideal setup?

dokken · August 10, 2020, 1:59pm

what is your setup and what iteration numbers do you get? Which setup in particular did you test?

dokken · August 10, 2020, 2:05pm

You can consider the performance data for dolfinx here:
https://fenics.github.io/performance-test-results/index.html

xlai · August 10, 2020, 4:07pm

Thanks for the link! Glad to know that dolfin-x is working well.

I ran the performance test with dolfin to test weak scaling of a poisson equation. The partition scheme used is ParMETIS (as SCOTCH wasn’t available). The test was compiled natively.

mpirun -np 40 ./dolfin-scaling-test \
--problem_type poisson \
--scaling_type weak \
--ndofs 500000 \
--petsc.log_view \
--petsc.ksp_view \
--petsc.ksp_type cg \
--petsc.ksp_rtol 1.0e-8 \
--petsc.pc_type hypre \
--petsc.pc_hypre_type boomeramg \
--petsc.pc_hypre_boomeramg_strong_threshold 0.5 \
--petsc.options_left

Each node on the cluster has 40 cores, and I’ve tested it first for intra-node performance.

	Number of iterations	Assembly	Solve
1	6	3.259	4.233
4	10	5.123	12.495
8	11	5.603	17.498
16	11	5.843	21.311
24	12	6.938	25.999
32	12	7.414	30.154
40	12	7.673	33.703

Topic		Replies	Views
Tuning gamg, hypre boomeramg or other fenicsx-compatible algebraic multigrid General	2	70	November 19, 2024
Good choice of solver parameters on parallel Stokes mixed formulation resolution? Linear Algebra	3	797	March 8, 2021
Why does PETSc krylov solver fail on a more resolved mesh on the same exact problem? Linear Algebra	1	816	July 30, 2020
AMG for elasticity problems leaks with PETSc AMG smoothers Linear Algebra	3	341	April 12, 2024
Nonlinear poisson as parallelisation example dolfinx mpi	18	119	January 23, 2025

Testing weak scalability of parallel solver: cg + hypre_amg

Related topics