Hi,
I am using boundary conditions that make the \int_\Gamma v p \vec{n} boundary term disappear, so I would expect that using \int \nabla p \cdot v and  \int p \cdot \text{div} (v) would be the same in the weak formulation for p being a TrialFunction( P) and v being a TestFunction(V). However, the Newton iterations converge for the second one, but do not converge for the first version.
I would like to understand what is the difference in fenics between
p*div(v) * dx
and
inner(grad( p),v) * dx,
Could someone please shed a light on the difference from fenicsâ€™ perspective?
Thank you so much!
Difference between  p*div(v) and + inner(grad(p),v)
If youâ€™re using CG spaces for both p and \mathbf{v}, and the boundary term is indeed zero, these should be equivalent. However, some possible issues I can think of are:

If the space for p is discontinuous, then
inner(grad(p),v)*dx
is missing the distributional components of \nabla p on interior facets, because thedx
measure only integrates over cell interiors. You really need to do integrationbyparts on each cell individually, but, in the case of CG spaces, the boundary terms disappear on interior facets. 
The boundary condition that would lead to \int_\Gamma p\mathbf{v}\cdot\mathbf{n} = 0 on the entire boundary leaves a hydrostatic mode in the pressure, i.e., p is only defined up to a constant. Thus, unless this mode is eliminated by, e.g., using a global Lagrange multiplier to enforce \int_\Omega p = 0 or applying a Dirichlet BC for the pressure at one node, the discrete problem is illposed, which will sometimes cause solvers to fail.
Thank you so much for considering and responding to my question.

I am using Lagrange spaces, with degrees 1 and 2. As far as I know these should theoretically result in continuous spaces.

I donâ€™t fully understand why you are saying the problem becomes illposed for the hydrostatic case, even though you are probably right. If p is only defined up to a constant, i.e. is a function of only on variable, its gradient does not vanish, it just becomes a constant, so it still provides information. Maybe you are referring to the fact that through applying the gradient, we lose the information on the â€ś1â€ť part of the â€ś1, x, yâ€ť basis generating the P space. And thatâ€™s true, but that happens every time the moment we apply a derivative on a function. Are you saying that we only have information about the derivative of p, and not p itself, thus the problem is illposed because the Jacobian has certain empty line / column?
Thank you again for replying.
What I mean by â€śonly defined up to a constantâ€ť is: If (\mathbf{u},p) is a solution to the problem, then (\mathbf{u},p+C) where C\in\mathbb{R} (i.e., a constant) is also a solution, so the solution is not unique, and the problem is therefore illposed. A similar phenomenon occurs, e.g., in the Poisson equation with pure Neumann BCs, where some constraint must be added to choose one of the infinitelymany solutions.