Different results between repeated runs in parallel

I am solving a linear elasticity problem in every loop of an optimization procedure. Even tiny differences may potentially lead to fairly different results after many loops which makes the replication of optimization results difficult.

Running mpirun -np 1 python3 demo_elasticity.py gives identical results between repeated runs as confirmed by the output:

Solution vector norm: 0.05007291838351104
Solution vector norm: 0.05007291838351104

Naively I would have expected the results to be identical also between repeated runs in parallel
However, mpirun -np 3 python3 demo_elasticity.py shows tiny differences (last digit) between repeated runs as confirmed by the output:

Solution vector norm: 0.05007291839575202
Solution vector norm: 0.05007291839575208

What is the reason for this dynamic behaviour in parallel and is there anything to do about it?

the difference is less than 1e-16, which is machine precision. This is like numerical noise from accumulating values from different processes.

Also note that the elasticity solver uses an iterative solver with way less precision (rtol=1e-8), and the gamg algorithm is dependent on partitioning.

1 Like