I was using FEniCS to solve a 3D linear wave propagation problem in frequency domain with direct solver “MUMPS”.
Due to the large-system requirement (~7M unknowns), I used 4 nodes and 56 cores/node, parallel run with command “ibrun”. The time consumption was 5 h 25 min, and memory comsumption was bout 630 GB.
However, if I used 112 cores in 1 node, it only took about 28 m 43 s.
Therefore, multi-nodes computing was at least 10 times slower than a single-node computing. Could anyone give me any suggestions? Thank you!