Hey,
I had a similar problem which is due to petsc (or in particular its mumps solver) using thread parallelism via OpenMP. Let us say your PC has n threads available.
Then, when you run your program in serial it already uses n threads (afaik this is the default).
Using mpirun with m instances now lets each of these create another n openmp threads, so you have in total m*n openmp threads, allthough your system can only support n. Therefore, you are slower.
You can fix this behaviour setting the omp_ num_threads environment variable like this:
export OMP_NUM_THREADS = 1
in the bash before you run your program. This will make each mpi instance only create a single omp thread.