Ok problem solved, it was not a problem with my code but with my computer, I added export OMP_NUM_THREADS=1
to my .bashrc, as in this post. And now it works fine with best performance at 8 processors, and definitely faster than in serial.
Thanks again for your help and quick responses, I am really grateful !