You need to run more than just the actual solve in parallel. This is due to the fact that the linear algebra backend (PETSc) uses distributed matrices, which in turn means that these structures has to be distributed in dolfin (for instance the mesh, function space et).
What you can do is to let your __init__ and solving run with NUM_THREAD processes (which they do by default when you run mpirun -n NUM_THREAD python3 main.py,
and then internally in pre_processing and post_processing use
from mpi4py import MPI
comm = MPI.COMM_WORLD
if comm.rank == 0:
# Do serial processing on only rank 0
#...
EDIT the references to threads are to match the original authors syntax