Continuing the discussion from Parallel slower runtime:
Hello everyone,
I am running some code using dolfinx and am trying to figure out how it scales.
So I am using the code from Parallel slower runtime - #2 by dokken
in an anaconda environment newly built as proposed in https://fenicsproject.org/download/
When doing just that, my code gets significantly slower the more threads I am using…
Here is the adapted code:
import numpy as np
from dolfinx import cpp, mesh
from mpi4py import MPI
import time
L, W, H=10, 10, 10
NX, NY, NZ = 100, 100, 100
#MESH CREATION
points=[np.array([0, 0, 0]), np.array([L, W, H])]
start = time.perf_counter()
domain=mesh.create_box(
MPI.COMM_WORLD,
points,
[NX,NY,NZ],
cell_type=cpp.mesh.CellType.hexahedron,
ghost_mode=mesh.GhostMode.shared_facet
)
end = time.perf_counter()
dim=domain.topology.dim
imap = domain.topology.index_map(dim)
num_cells = imap.size_local
ghost_cells = imap.num_ghosts
print(f"MESH CREATED {domain.comm.rank}, num owned cells {num_cells} num ghost cells {ghost_cells} time : {end-start}")
And this are the timing results
mpirun -n 1 python parallelTest.py
MESH CREATED 0, num owned cells 1000000 num ghost cells 0 time : 4.300701394677162
mpirun -n 2 python parallelTest.py
MESH CREATED 0, num owned cells 500000 num ghost cells 10000 time : 39.404316327068955
MESH CREATED 1, num owned cells 500000 num ghost cells 10000 time : 39.40367920091376
mpirun -n 4 python parallelTest.py
MESH CREATED 3, num owned cells 251000 num ghost cells 10170 time : 75.6433406448923
MESH CREATED 2, num owned cells 250539 num ghost cells 10257 time : 75.64872138295323
MESH CREATED 0, num owned cells 249461 num ghost cells 10040 time : 75.64881594991311
MESH CREATED 1, num owned cells 249000 num ghost cells 10130 time : 75.64861655794084
I am a bit unsure how to solve that problem and would be super grateful for advice!