Suppose I have a linear PDE L(u) = b, and suppose that I discretize it to A(p)u=b(p), where p is the coordinates of the mesh. I hope to get the gradient of some functional F(u) with respect to p.

I do not know how to use control function for the coordinates of the mesh in dolfin-adjoint to calculate the mesh node sensitivity. If I directly use SpatialCoordinate(mesh) and ALE.move(Function), it seems that it calculates the shape derivative. However, I prefer to know what the impact it has on the FEM solver.

I can not be sure whether it is the same as the shape derivative. I have seen tutorials about shape optimization in dolfin-adjoint documents and discussions in

However, I am still a little confused. It seems that it mainly concerns the shape of boundaries and is not the same as mesh node sensitivity (but what the shape derivative output means for the interior mesh nodes in dolfin-adjoint?)

Dolfin-adjoint compute the shape senstivity for all nodes in the mesh. However, as shape derivatives has their main contribution on the exterior boundaries (or internal interfaces between different materials), its main contribution will be on the boundary. This follows from Hadamardâ€™s theorem in shape optimization.
However, dolfin-adjoint used the Â«method of mappingsÂ», Which computes the senstivity at every mesh node, Since it is discretly consistent (changing the position of an interior node in a continuous setting does not change the continuous solution; But in an discrete setting (using meshes, the functional changes slightly).
I would suggest reading through my preprint if you are still confused: https://arxiv.org/pdf/2001.10058.pdf

Thank you for your quick response and help! Your words are really inspiring and I try to read the paper and part of the UFL paper, but I think I still could not understand some parts well due to lacking some backgrounds, ( especially not very familiar with material derivative and Gateaux directional derivative,) so I hope to ask some quick naive questions and check some points.

Firstly, I see that the interior mesh node movements are to adapt to the change of the discretization solutions and retain mesh quality now. If some of my intended behavior is only to move the interior nodes, the paper says it calculates the mapping about the whole domain deformation, at this time, is it just \delta\theta_i only acted on the interior nodes and |DT| (the determinant of jacobian of the deformation function) equal to 1? (At this time, for \tau_{\rho}^{i}(x)=x+\rho \delta \theta_{i}(x), \delta\theta_i acts on boundary mesh is 0 )

Besides, it also seems if I do a similar to other mesh sensitivity paper, derive the gradient to coordinates directly ( like Anisotropic mesh adaption based on a posteriori estimates and optimisation of node positions or others), the direct gradient of functional to all the mesh coordinates \frac{\partial{J}}{\partial \mathrm{s}} (s: coordinates) could be seen as a small move mapping along x and y separately, too. As a result, they should be consistent with the general shape derivative. Is it right?

Secondly, I have one question about the code like

S = VectorFunctionSpace(mesh, "CG", 1)
s = Function(S)
ALE.move(mesh, s)
J = some functional
Jhat = ReducedFunctional(J, Control(s))
dJds = Jhat.derivative()

Could I interpret as ALE.move seems to make deformation \delta\theta along x and y directional, and dJds return me the gradient then?

The shape sensitivity can is the limit of the difference of perturbing each mesh node in x and y direction separately and the unperturbed mesh.

You can compute the sensitivity for the full mesh, and afterwards set the gradient at the boundary nodes to 0, or implement a mesh deformation scheme that takes this into account (see the stokes example in the paper i referenced above)