Running in parallel slower than serial?

Hi everyone,

I have developed a code for thrombosis modeling, which involves a lot of Advection-Diffusion-Reaction (ADR) equations under influence of flow. For the Navier-Stokes calculations I use the Oasis solver, the rest of the model is implemented in a problem specific function. For solving the ADR equations I use NonLinearVariationalSolvers. I tried to speed up my simulations by running them in parallel, but they seem to slow down instead? The computer I use has 8 cores, but the simulation time becomes longer as I add cores. I run the simulations with:

mpirun -n 8 problem=thrombosis_model

Does anyone know what the cause could be of the slowing down simulation when they are performed in parallel?

Maybe take a look at this tread:

I added the ‘export OMP_NUM_THREADS = 1’ line to my .bashrc. Indeed the parallel simulations are not that much slower anymore, but certainly not much faster. Any idea what the reason for this might be?

Consider using an iterative solver+preconditioner when running at parallel for better results if you are not doing it already. You can change the default direct lu solver inside NonlinearVariationalSolver like this:

solver  = NonlinearVariationalSolver(problem)

solver.parameters["newton_solver"]["linear_solver"] = "gmres"
solver.parameters["newton_solver"]["preconditioner"] ="hypre_euclid"

I already defined my solver as following:
# Define jacobian
J_P = derivative(F_P, cp, dcp)

# Create the Newton solver
problem = NonlinearVariationalProblem(F_P, cp, bccp, J_P)
pap_solver = NonlinearVariationalSolver(problem)
solverType = 'newton'
#solverType = 'snes' # PETSc SNES has some more options for line search etc.
prm = pap_solver.parameters
pap_solver.parameters['nonlinear_solver'] = solverType
nparam = pap_solver.parameters[solverType+'_solver']
nparam['linear_solver'] = 'gmres'
nparam['preconditioner'] = 'jacobi'

However, the equations I’m solving with them consist of 4 coupled ADR equations which depend on eachother in the reaction term. Can this be the reason that parallelization doesn’t speed them up?

Try locating the part of the problem that consumes the most computational time using:


You can also solve a simpler benchmark nonlinear problem (like nonlinear poisson) just to be sure that this is not a problem caused by available cores/threads (although @kamensky covered this).