Let us assume we have a vector of length 10*N.
Each vector entry is calculated by computing an integral over the whole domain.
To speed things up, the program is supposed to run on 10 cores leading to N integrations on each core. How would one go about doing this?
The issue with regular mpirun is that it employs domain decomposition. However, each core needs access to the whole domain in order to calculate the integral.