How to make sure each process gets the same amount of dofs

nami · December 24, 2020, 6:26pm

Hi,

I want to solve a linear elasticity problem in parallel with 38000 dofs. I noticed that the way the dofs are distributed is not uniform. I am using the following user defined solver:

class Problem(NonlinearProblem):
    def __init__(self, J, F, bcs):
        self.bilinear_form = J
        self.linear_form = F
        self.bcs = bcs
        NonlinearProblem.__init__(self)
    def F(self, b, x):
        assemble(self.linear_form, tensor=b)
        for bc in self.bcs:
            bc.apply(b, x)
    def J(self, A, x):
        assemble(self.bilinear_form, tensor=A)
        for bc in self.bcs:
            bc.apply(A)
class CustomSolver1(NewtonSolver):
    def __init__(self):
        NewtonSolver.__init__(self, mesh.mpi_comm(),
                              PETScKrylovSolver(), PETScFactory.instance())
    def solver_setup(self, A, P, problem, iteration):
        self.linear_solver().set_operator(A)
        PETScOptions.set("ksp_type", "gmres")
        PETScOptions.set("ksp_monitor")
        PETScOptions.set("pc_type", "hypre")
        PETScOptions.set("pc_hypre_type", "euclid")
        self.linear_solver().set_from_options()

First of all, is this even possible to force the code to do an equal dof distribution? If so how? Will it improve the performance? I really appreciate your feedback on this.

nami · December 27, 2020, 11:09pm

If my questions above don’t make sense. Maybe you can help me figure this out. How can I define the options for the weakly scalable elasticity problem given here:
https://github.com/FEniCS/performance-test
the same way as the options I have in CustomSolver1 in my code above?

dokken · December 28, 2020, 8:42am

If you inspect the code of the performance test, you will observe that they use the number of dofs per node to determine how big the mesh should be (by counting vertices):

github.com

FEniCS/performance-test/blob/5be3a4cbc55e46d442f45f119eaed4d7856e012e/src/mesh.cpp#L74-L98


      
          std::size_t i0 = Nx - 10;
          std::size_t mindiff = 1000000;
          for (std::size_t i = i0; i < i0 + 20; ++i)
          {
            for (std::size_t j = i - 5; j < i + 5; ++j)
            {
              for (std::size_t k = i - 5; k < i + 5; ++k)
              {
                std::size_t diff = std::abs(nvertices(i, j, k, r) - N);
                if (diff < mindiff)
                {
                  mindiff = diff;
                  Nx = i;
                  Ny = j;
                  Nz = k;
                }
              }
            }
          }

This file has been truncated. show original

The dof distribution is guided by the graph partitioner responsible for partitioning the mesh. Dolfin uses parmetis under the hood.
When you are saying that the distribution is not uniform, how skewed is it?

nami · December 28, 2020, 4:29pm

Thank you Dokken. I am sorry I am a bit dense when it comes to C++ but I understand now that piece with your explanation. What I mean by not uniform is that some processes take up to 190 seconds to finish and some finish as fast as 30 seconds. The only thing I can think of that can affect this is the size of the problem in each processor.

dokken · December 28, 2020, 4:49pm

You need to show a specific example, and how you to the timings of how they finish.
It is possible to get great scaling with dolfin, but you need to be careful if you are using the basic commands and are expecting optimal scaling.

38 000 dofs are not a large amount of dofs, and parallel speedup will not be significant if you distribute it over a large number of processors (as inter-process communication will take more time than the speedup of assembly).

nami · December 28, 2020, 5:48pm

Oh. I see now. I would love to share my code but it is a mess right now and not easy to simplify to a minimal code. However, what you are saying about communications makes sense. I was not thinking about that. Just one last question though. Let’s say my problem is a larger version of this:
https://fenicsproject.org/pub/tutorial/html/._ftut1010.html
Do you think ‘gmres’ and ‘hypre_euclid’ are a good combination for parallel solving this?

dokken · December 28, 2020, 5:56pm

I do not have alot of experience with advection diffusion reaction problems, so I’m not certain if it is an efficient approach. On thing that will speed up your computations, is to stop using the solve command, and rather creating a NewtonSolver, see for instance: https://bitbucket.org/fenics-project/dolfin/src/b495043d6b3914383eb939ea6c6794442080a3a5/python/demo/documented/cahn-hilliard/demo_cahn-hilliard.py.rst and Set krylov linear solver paramters in newton solver
This will make sure that you reuse matrices and vectors during the solving process.

nami · December 28, 2020, 6:01pm

So something like the code in my first post?

Topic		Replies	Views
Incompressible Linear Elasticity fails in the second time step	1	484	March 5, 2021
Problems with Custom Newton Solver for Elasticity dolfinx	1	882	September 12, 2022
Limited number of cores in parallel (How to distribute equally)	0	569	February 4, 2019
Iterative Solver Convergence in High Aspect Ratio Models General	4	63	November 26, 2024
Custom Newton Solver problem with Dirichlet conditions	16	4203	January 31, 2024

How to make sure each process gets the same amount of dofs

Related topics