How to set available threads for each resulting job in mpi?

Hi everyone!

I’m new here but I have been using fenics since 2017. Recently I bought a core i9-9900K with 8 cores (16 threads) in order to improve my simulations. I found a very annoying issue that executing the given ft01_poisson.py test problem with UnitSquareMesh(1100, 1100) by

python3 ft01_poisson.py

I found a %CPU usage of about 1000% (all threads are available) but with a total time of 20 seconds. On the other hand, if I hold only one thread (thread with number 0) available for the same code by use the taskset command, that is,

taskset -c 0 python3 ft01_poisson.py

it run with a total time of 15 seconds.

Then I conclude that using only one thread is better than use the full power by make available all threads.

My first question is if you know how to set fenics/python to use only one thread by default?

The worst and main problem happen when I try to parallel my job by using mpirun. For example by separate in two jobs using

mpirun -n 2 python3 ft01_poisson.py

In this case all threads are available for each one of the resulting jobs. It makes simulation very slow. I believe It would be very fast if I had only one thread available for each job instead of all threads available. I don’t know if my problem is clear for you. Each job uses all threads (16) and simulation becomes very slow. Unfortunately I don’t know how to set only one thread available for each job. I tried to fix this issue by use

taskset -c 0,1 mpirun -n 2 python3 ft01_poisson.py

but system ignores taskset and I still have all threads available at the same time for each job.

Thus all my tests using mpirun are very slow even though it is running without errors.

I wondering if someone could help me to solve this problem.

Thanks in advance,
Igor

Hi,

Sorry to if you have already fixed this problem and for bringing up this old post, but I was having a similar issue.

In my case the high thread usage happens during the dfn.solve call. When you don’t specify a linear solver to use i.e

solve(a == L, u, bcs)

the solver seems to use as many threads as you have cores, even when running in serial.

After specifying a specific linear solver to use i.e.

solve(a == L, u, bcs, solver_parameters={'linear_solver': 'petsc'})

only one thread is used.

I’m not sure why this happens.

Have you looked at this thread? Mumps in parallel slower than in serial.

I tried the solution in that thread by setting OMP_NUM_THREADS=1 but it didn’t do anything in my case.

In my case there is a high thread usage that happens when running in serial. If I don’t specify a linear solver in the dfn.solve call, it uses a large number of threads, but if you specify one i.e. 'petsc' or 'cg' etc. it starts to use only one thread, which fixes the problem and makes it run faster. It’s kind of weird because I think using 'petsc' is the default.

I think Igor is having the same issue with running python3 ft01_poisson.py.

Setting the OpenMP num threads variable should do the trick, as this is the only parallelization available. Maybe it is overridden later on again?
Not sure whether this helps, but the default solver is ‘umfpack’, which is parallel. Another parallel solver is ‘mumps’, so you could try wheter this behavior also happens with that. I guess that the other solvers are actually serial solvers such that you do not see such a behavior.

Oh I see, I thought the default solver was 'petsc' not 'umfpack'. I guess that explains why it was using so many threads. Thanks!