I have a general question about mpi comm with dolfinx and integrated results. In the tutorial on custom newton solvers, the L2-errors are calculated with the mpicomm.allreduce like this L2_error.append(np.sqrt(mesh.comm.allreduce(dolfinx.fem.assemble_scalar(error), op=MPI.SUM))) while the flow past a cylinder tutorial calculates lift and drag with mpicomm.gather with a manual sum afterwards like this
Which method is better? If it depends, which one is better if I wanted to, for example, save C_D and C_L to a text file like this:
# write out data
if mesh.comm.rank == 0:
dataOut = np.stack((C_D, C_L), axis=-1)
with open("datafile.txt", "w") as txt_file:
for line in dataOut:
txt_file.write(" ".join(line) + "\n")
If you are using a lot of ranks, and you are only interested in writing data out on one process, using reduce with a root or gather with a root then sum is most sensible (as you use less memory).
It all depends on the MPI implementation (and the number of processes).
You can play around with the performance of this with:
from mpi4py import MPI
import time
comm = MPI.COMM_WORLD
if comm.rank == 0:
a = 5
else:
a = 10
comm.Barrier()
start = time.perf_counter()
t = comm.reduce(a, op=MPI.SUM, root=0)
comm.Barrier()
end = time.perf_counter()
print(f"Reduce to one rank {comm.rank=}, {t=} {end-start:.2e}")
comm.Barrier()
start = time.perf_counter()
s = comm.allreduce(a, op=MPI.SUM)
comm.Barrier()
end = time.perf_counter()
print(f"Allreduce {comm.rank=}, {s=} {end-start:.2e}")
comm.Barrier()
start = time.perf_counter()
v = comm.gather(a, root=0)
if comm.rank == 0:
v = sum(v)
comm.Barrier()
end = time.perf_counter()
print(f"Gather+sum {comm.rank=}, {v=} {end-start:.2e}")