I want to solve a linear elasticity problem in parallel with 38000 dofs. I noticed that the way the dofs are distributed is not uniform. I am using the following user defined solver:
class Problem(NonlinearProblem):
def __init__(self, J, F, bcs):
self.bilinear_form = J
self.linear_form = F
self.bcs = bcs
NonlinearProblem.__init__(self)
def F(self, b, x):
assemble(self.linear_form, tensor=b)
for bc in self.bcs:
bc.apply(b, x)
def J(self, A, x):
assemble(self.bilinear_form, tensor=A)
for bc in self.bcs:
bc.apply(A)
class CustomSolver1(NewtonSolver):
def __init__(self):
NewtonSolver.__init__(self, mesh.mpi_comm(),
PETScKrylovSolver(), PETScFactory.instance())
def solver_setup(self, A, P, problem, iteration):
self.linear_solver().set_operator(A)
PETScOptions.set("ksp_type", "gmres")
PETScOptions.set("ksp_monitor")
PETScOptions.set("pc_type", "hypre")
PETScOptions.set("pc_hypre_type", "euclid")
self.linear_solver().set_from_options()
First of all, is this even possible to force the code to do an equal dof distribution? If so how? Will it improve the performance? I really appreciate your feedback on this.
If my questions above don’t make sense. Maybe you can help me figure this out. How can I define the options for the weakly scalable elasticity problem given here: https://github.com/FEniCS/performance-test
the same way as the options I have in CustomSolver1 in my code above?
If you inspect the code of the performance test, you will observe that they use the number of dofs per node to determine how big the mesh should be (by counting vertices):
The dof distribution is guided by the graph partitioner responsible for partitioning the mesh. Dolfin uses parmetis under the hood.
When you are saying that the distribution is not uniform, how skewed is it?
Thank you Dokken. I am sorry I am a bit dense when it comes to C++ but I understand now that piece with your explanation. What I mean by not uniform is that some processes take up to 190 seconds to finish and some finish as fast as 30 seconds. The only thing I can think of that can affect this is the size of the problem in each processor.
You need to show a specific example, and how you to the timings of how they finish.
It is possible to get great scaling with dolfin, but you need to be careful if you are using the basic commands and are expecting optimal scaling.
38 000 dofs are not a large amount of dofs, and parallel speedup will not be significant if you distribute it over a large number of processors (as inter-process communication will take more time than the speedup of assembly).
Oh. I see now. I would love to share my code but it is a mess right now and not easy to simplify to a minimal code. However, what you are saying about communications makes sense. I was not thinking about that. Just one last question though. Let’s say my problem is a larger version of this: https://fenicsproject.org/pub/tutorial/html/._ftut1010.html
Do you think ‘gmres’ and ‘hypre_euclid’ are a good combination for parallel solving this?