Parallelization without domain decomposition

Let us assume we have a vector of length 10*N.
Each vector entry is calculated by computing an integral over the whole domain.

To speed things up, the program is supposed to run on 10 cores leading to N integrations on each core. How would one go about doing this?

The issue with regular mpirun is that it employs domain decomposition. However, each core needs access to the whole domain in order to calculate the integral.

Choose your MPI communicators appropriately. I.e. don’t use COMM_WORLD when you should perhaps use COMM_SELF or a split.

1 Like

Do what @nate says, or try with the multiprocessing module from python.

https://docs.python.org/3/library/multiprocessing.html