How to concatenate output when parallelizing

Use MPI.allreduce, as shown in Attempt to find max value on mesh in parallel - #2 by dokken