Different results of area of a deformed mesh in parallel

I see. I thought the >1 result is because I added the contribution of the ghost cells for serveral times in parallel. The real problem in my code is the wrong mapping between mesh and function. So I replace with the following part and I can get 1 now with DOLFINx 0.6.0:

Some follow-up questions:
(1) VTXWriter vs XDMFFile. Why XDMFFile still gives correct visualization even though the mapping is actually not correct in parallel but VTXWriter can dectect that?
(2) The result of area 1 with mpirun -n 1 or mpirun -n 2 in my original code is evaluated coincidentally?The wrong mapping should be independent of number of processes.
(3) I see that you use sub_to_parent = sub_to_parent // ndims is because I used a (Vector) Function whose DOFs are ordered. What if I use a MixedElement and FunctionSpace instead of VectorFunctionSpace here? The dofs are most usually ordered for a mixed element.