Navier stokes equation with a spatially varing vector body force

You have not customized the solver options at all.
That is the first place to start.
I would start with using KSP type «preonly» and an LU preconditioner (mumps or superlu_dist for factorisation), and then check if you get any speedup when going from 1 to 2 to 4 processes.

See for instance Dolfinx seems much slower than dolfin in solving nonlinear mechanics - #3 by dokken