GMRES 3D Naviers Stokes

Hello,
I’m using fenicsx 0.9 in a conda environment to solve Naviers Stokes in 3D. In order to converge towards a stationary state, I use a coupled approach with a Newton and iterate over time (this approach allows me to achieve large CFLs). In 2D, this works very well with MUMPS, but the computational cost becomes too high in 3D.
So I try to use a GMRES iterative solver instead of LU-MUMPS. I’ve tested it in 2D on a very simple case of laminar flow in a channel, but I’m running into problems.
If I don’t give a preconditioner, a LU is taken by default, the ksp converges in 2 iterations and so does the newton (for a single time step) but in 3D the problem of calculation time returns.
If I don’t give a preconditioner pc_type = “none” or “jacobi”, then the ksp solver needs several thousand iterations, the relative norm of the newton drops to 10^-6 and then drops very slowly (so it takes a long time to reach 10^-8, for example). For hypre_amg, residuals drop to 10^-4 and then stagnate.
With Asm, sor and bjacobi I only get NaN.

Am I missing anything about the use of gmres?
Thank you very much for your time.

If I don’t give a preconditioner, a LU is taken by default, the ksp converges in 2 iterations and so does the newton (for a single time step) but in 3D the problem of calculation time returns.

Without remembering the exact specifics of the input arguments here, this sounds like you’re still just using a direct solver (LU) in this case.

Without knowing anything about the problem that you’re trying to solve here it’s hard to say much detailed stuff, other than to just refer you to the scientific literature to see what people in published articles are doing to converge their simulations. How many degrees of freedom do your 2D and 3D simulations have, how are you initializing your solver (i.e. which initial conditions are you using), what does your fluid domain look like and what kind of turbulence model are you using?

Thank you for your reply. My goal is to do optimization, that’s why I want to obtain a fast convergence and a stationary state. My computational domain is a rectangle (2d) or a box/cylinder (3d) with an inlet and an outlet on the side faces, the rest is wall. In general, my simulations will resemble flows around obstacles placed in the domain by optimization. I use the spalart allmaras turbulence model in order to do RANS simulation.
I’ve seen in the literature that SSOR is used with GMRES, but I can’t find any way of using it in fenicsx. ILU is also recommended but is not available in parallel on fenicsx. I’m also in the process of discovering FGMRES and seeing if it can help me.

So you’re solving a saddle-point problem? Look into Schur-complement preconditioners, in PETSc this is done with so-called fieldsplits.

This is a pretty good read on the topic: https://academic.oup.com/book/27915?login=false

Thank you very much for your reply.
After some further research I found quite a few articles/topics that talked about this fieldsplit method. I also took a look at the fenicsx_pctools project: Rafinex-external-RIFLE / FEniCSx-pctools · GitLab.
However, a few concepts are still unclear to me, notably the notion of saddle point problem. If I’ve understood correctly, this concerns matrices with a zero lower left block . But with my MixedElement formulation for velocity and pressure and with`

I_ = Identity(mesh.topology.dim); T = -p * I_ + (mu_+mu_T) * (grad(v) + transpose(grad(v)))  F_stdy = (-div(v) * w_p * dx + inner(grad(w_v), T) * dx +rho_ * inner(grad(v)*v, w_v) * dx). I get a globally diagonal Jacobian ( visualized with code Jac = derivative(F_stdy, uh) J_mat = dolfinx.fem.assemble_matrix(form(Jac), bcs)
import matplotlib.pyplot as plt
J_ = J_mat.to_scipy(ghosted=False)
nnz = J_.nnz
fig, ax = plt.subplots(figsize=(8, 8))
ax.spy(J_, markersize=0.1)
plt.show().

I obtain a global diagonal matrix

However, in the Naviers Stokes demo of fenicsx_pc_tools, there is no use of MixedElement and the Jacobian is created manually with

F0 = inner((v-v_prev)/k, v_te)*dx + inner(dot(grad(v), v), v_te) * dx + nu * inner(grad(v), grad(v_te)) * dx - p * div(v_te) * dx
F1 = -div(v) * p_te * dx

a00 = ufl.derivative(F0, v)
a01 = ufl.derivative(F0, p)
a10 = ufl.derivative(F1, v)
a11 = None

F_ufl = [F0, F1]
J_ufl = [[a00, a01], [a10, a11]]

And in this case I get a typical Jacobian of a sadlle point problem.

Am I therefore obliged to use a method without MixedElement and with block-created matrices (as in the case of fenicsx_pc_tools) in order to use field_split with Schur and possibly with fenicsx_pc_tools?
Thanks a lot !

The default type of a PETSc KSP solver is not an LU solver. However, if you are using `dolfinx.nls.petsc.NewtonSolver, it defaults to LU: dolfinx/cpp/dolfinx/nls/NewtonSolver.cpp at v0.9.0.post1 · FEniCS/dolfinx · GitHub

To try to break down your statement:

What are the exact options you have set?
Did you set "ksp_type": "gmres"? What about the "pc_type", what did you set it to?

As you haven’t provided any of your code it is rather hard to give sound advice (maybe you haven’t configured everything correctly).
What is the output of newton_solver.krylov_solver.view() after a single time-step of the various configurations?

Please format this code with mutiple lines, so that it is readable.

Secondly, you can think about these kind of block structures of the function-space level.
You will have a velocity function space (V) and a pressure space (Q), which you can set up your Jacobian for.
In general you will have a block structure as

\begin{pmatrix} J_{u, u} & J_{u, p}\\ J_{p,u} & J_{p,p} \end{pmatrix} \begin{pmatrix} \delta u\\ \delta p \end{pmatrix}= -\begin{pmatrix} F_u(u_k, p_k, v)\\ F_p(u_k, p_k, q) \end{pmatrix}

where I have split the residual form of F into a part with the test function v (F_u) and one with the test function q (F_p).

Here, J_{u, p} is \frac{\partial F_u}{\partial p}.

A mixed-element intertwines all the degrees of freedom (for data locality), meaning that it is harder to extract the appropriate sub-spaces. (That is why you don’t see this block structures clearly, as the dofs might be ordered along the lines of (ux_0, uy_0, p_0, ux_1, uy_1, p_1,…)

You do not need to use fenicsx_pc_tools to do a field split (however, it offers some nice wrappers). You can consider:

as an example.

This will be massively simplified in the v0.10 release, with:

1 Like

Thank you for your answers, I really appreciate it.

When I tried gmres, I used “ksp_type”: “gmres”. Then I tried several preconditioners with “pc_type”. With “pc_type” = "jacobi" or “pc_type”= “none”, the gmres converged in several thousand iterations. However, the Newton’s residuals stopped falling after 10^-6.
With “pc_type”=“asm” or “bjacobi” or “sor”, I only get NaN.
With “hypre” and “pc_hypre_type”="boomerang" the gmres residuals remain constant.

I’ve now understood that I need to use a fieldsplit preconditioner if I want to achieve rapid convergence, especially with the Schur Complement.
However, if I don’t use fenicsx_pc_tools, is the field_splied preconditioner compatible with a MixedElement formulation, despite the fact that the Jacobian doesn’t appear in the form you explained? Or is it necessary to construct the problem with nest/block matrices?

As I am not the author of fenicsx_pc_tools (@jackhale ping) I am not sure if they do the automatic setup for nest/block systems if you give it a mixedelememt. Theoretically one could set this up manually.

The fenicsx-pctools examples are based around block (and potentially also nest with adaptation) assembled systems - MixedElement using a monolithic assembly won’t work, or at least, it would require a lot of work.

There is a basic setup here:

https://gitlab.com/rafinex-external-rifle/fenicsx-pctools/-/blob/main/demo/navier-stokes-pcd/demo_navier-stokes-pcd.py?ref_type=heads

Which could be improved with the incoming PR SNES solver interface for standard, block and nest systems by jorgensd · Pull Request #3648 · FEniCS/dolfinx · GitHub

However, that example still uses LU decomposition internally so it will not scale well.

A proper setup for Navier-Stokes + Temperature with no LU decomposition is shown here which we tested in 3D up to 8192 MPI ranks:

https://gitlab.com/rafinex-external-rifle/fenicsx-pctools/-/blob/main/examples/rayleigh-benard-convection/test_rayleigh_benard.py?ref_type=heads#L120

We should probably work this into an more advanced Navier-Stokes demo rather than leaving it inside this test with the temperature block as an additional complication.

2 Likes

First of all, thank you very much, I’ve made a lot of progress.
After reading more papers on Naviers Stokes preconditioning I made a code inspired by the Naviers Stokes and Rayleigh demos from fenicsx_pc_tools. I got very good scalability by increasing the number of processors and dof of the problem.
However, I get a new problem: Reynold’s number.
As I increase the number of reynolds, the fgmres needs many more iterations (a sign that preconditioning is becoming less and less efficient?). To test this, I made a basic 2D and 3D case of flow in a rectangular pipe. For Re = 100, the fgmres needs at most a few hundred iterations. But for Re = 500, for example, even 5000 iterations aren’t enough to achieve the relative tolerance of 10^-2. This means that the calculation time becomes very long.
I use a Newton method with pseudo time steps to facilitate convergence. I’ve noticed that it’s quite easy for the fgmres to converge rapidly in the first Newton step, but that it gets more complicated once the Newton has descended by 2-4 orders of magnitude. I’ve tried playing with the Schur parameters (upper, lower etc) and the fgmres but without success.

Do I need to use anything other than the PCD to converge at higher Reynold’s ?

Here’s a section of my code (fairly close to the demos for simplicity’s sake) which contains the main parts. I’ve removed the parts relating to the SNES solver, as they’re identical to the demos.

mesh, cell_tags, facet_tags = read_from_msh(msh_file, MPI.COMM_WORLD, gdim=2)
facet_tags.name = "facet_tags"
tags = {}
tags["inlet"] = 1
tags["outlet"] = 2
tags["walls"] = 3

V_element = basix.ufl.element("CG", mesh.basix_cell(), 2, shape = (mesh.topology.dim,))
P_element = basix.ufl.element("CG", mesh.basix_cell(), 1)
A_element = basix.ufl.element("CG", mesh.basix_cell(), 1)
V_v = functionspace(mesh, V_element)
V_p = functionspace(mesh, P_element)

v = Function(V_v, name="v")
v_prev = Function(V_v, name="v_prev")
p = Function(V_p, name="p")
v_te = TestFunction(V_v)
p_te = TestFunction(V_p)

#################### Weak Form ######################
F0 = inner((v-v_prev)/k, v_te)*dx
F0 += (inner(grad(v)* v_prev, v_te) + inner(grad(v_prev)* v, v_te) - inner(grad(v_prev)* v_prev, v_te)) * dx
#F0 = inner(dot(grad(v), v), v_te) *dx
F0 += nu_ * inner(grad(v), grad(v_te)) * dx - p * div(v_te) * dx
#F0 += inner(k(alpha) * v, v_te)*dx
F1 = -div(v) * p_te * dx

a00 = derivative(F0, v)
a01 = derivative(F0, p)
a10 = derivative(F1, v)
a11 = None

F_ufl = [F0, F1]
J_ufl = [[a00, a01], [a10, a11]]
F_dfx = fem.form(F_ufl)
J_dfx = fem.form(J_ufl)

########### BC ##########################
def v_inflow_eval(x):
    values = np.zeros((mesh.geometry.dim, x.shape[1]))
    #values[0] = 4.0 * x[1] * (1.0 - x[1])
    values[0] = v_max_inlet
    values[1] = np.zeros(x.shape[1])
    if mesh.geometry.dim==3:
        values[2] = np.zeros(x.shape[1])
    return values
v_wall = Function(V_v)
v_inflow = Function(V_v)
v_inflow.interpolate(v_inflow_eval)

fdim = mesh.topology.dim - 1

wall_dofs_v = locate_dofs_topological(V_v, fdim, facet_tags.find(tags["walls"]))
inlet_dofs_v = locate_dofs_topological(V_v, fdim, facet_tags.find(tags["inlet"]))
bcs = [dirichletbc(v_inflow, inlet_dofs_v), dirichletbc(v_wall, wall_dofs_v)]

pdeproblem = PDEProblem(F_dfx, J_dfx, [v, p], bcs)

########################## PCD bc ###################
# Localize boundary dofs for pressure
inlet_dofs_p = locate_dofs_topological(V_p, fdim, facet_tags.find(tags["inlet"]))
outlet_dofs_p = locate_dofs_topological(V_p, fdim, facet_tags.find(tags["outlet"]))
pcd_type = "PCDPC_vY"  # pick one of the two versions specified in the dictionary below
bcs_pcd = {
    "PCDPC_vX": [dirichletbc(Function(V_p), inlet_dofs_p)],
    "PCDPC_vY": [dirichletbc(Function(V_p), outlet_dofs_p)],
}[pcd_type]
ds_in = ufl.Measure("ds", domain=mesh, subdomain_data=facet_tags, subdomain_id=tags["inlet"])
appctx = {"nu": nu_, "v": v, "bcs_pcd": bcs_pcd, "ds_in": ds_in}  # application context

########################## Solver setup ###################
opts["ksp_type"] = "fgmres" 
opts["ksp_max_it"] = 5000
opts["ksp_converged_reason"] = None
opts["ksp_rtol"] = 1e-2
opts["ksp_gmres_modifiedgramschmidt"] = None
opts["ksp_gmres_restart"] = 150
opts["ksp_pc_side"] = "right"

opts["pc_type"] = "python"
opts["pc_python_type"] = "fenicsx_pctools.WrappedPC"
opts.prefixPush("wrapped_")
opts["pc_type"] = "fieldsplit"
opts["pc_fieldsplit_type"] = "schur"
opts["pc_fieldsplit_schur_fact_type"] = "lower"
opts["pc_fieldsplit_schur_precondition"] = "user"
opts["pc_fieldsplit_0_fields"] = 0  # velocity
opts["pc_fieldsplit_1_fields"] = 1  # pressure

# For simplicity, we use a direct solver to invert the velocity block:
opts["fieldsplit_0_ksp_type"] = "preonly"
opts["fieldsplit_0_pc_type"] = "python"
opts["fieldsplit_0_pc_python_type"] = "fenicsx_pctools.WrappedPC"
opts["fieldsplit_0_wrapped_pc_type"] = "hypre"
for key, val in hypre_common_settings.items():
            opts[f"fieldsplit_0_wrapped_{key}"] = val

# -- PCD on pressure block
opts["fieldsplit_1_ksp_type"] = "preonly"
opts["fieldsplit_1_pc_type"] = "python"
opts["fieldsplit_1_pc_python_type"] = "fenicsx_pctools.PCDPC_vY"
# ---- SOR on mass matrix
opts["fieldsplit_1_pcd_Mp_ksp_type"] = "richardson"
opts["fieldsplit_1_pcd_Mp_ksp_max_it"] = 2
opts["fieldsplit_1_pcd_Mp_pc_type"] = "sor"
# ---- HYPRE on stiffness matrix
opts["fieldsplit_1_pcd_Ap_ksp_type"] = "preonly"
opts["fieldsplit_1_pcd_Ap_pc_type"] = "hypre"
for key, val in hypre_common_settings.items():
    opts[f"fieldsplit_1_pcd_Ap_{key}"] = val

opts.prefixPop()  # wrapped_
opts.prefixPop()  # ns_

ite = 1
res_0 = 1
for i in range(1000):
    t0 = time.time()

    v_prev.x.array[:] = v.x.array[:]
    v_prev.x.scatter_forward()

    solver.solve(None, x0)
    vec_to_functions(x0, pdeproblem.solution_vars)
    v, p = pdeproblem.solution_vars

    v.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
    v.x.scatter_forward()

    if stop_global:
        break
    ite += 1

    t1 = time.time()
    if mesh_comm.rank==0:
        print("Temps écoulé: ", t1-t0)

Thank you very much !

Great that you got it (partially) working and thanks for sharing your implementation here.

In short, the PCD preconditioner is not “Reynolds number robust” - the number of iterations increases as the Reynolds number increases. I am not up to date with the literature on possible improvements to PCD towards being Reynolds number robust - what I do know however is the paper [2004.09398] A Reynolds-robust preconditioner for the Scott-Vogelius discretization of the stationary incompressible Navier-Stokes equations shows Reynolds number robustness and has an implementation in Firedrake.

Edit: To clarify further, DOLFINx does not yet have an implementation of “F-cycle on augmented momentum block” in Figure 7. We are currently working on general low-level functionality for geometric multigrid and also assembling operators on non-standard patches, but it is a way off.

Very well, thank you very much for the information.
So, I am limited by preconditioning in feniscx but my topological optimization code is written in fenicsx, I will think about using firedrake later.
For the moment, how would you tackle my problem? Use a split approach to solve Naviers Stokes? but I might be limited by the CFL (like in the cylinder benchmark tutorial). I can also implement a SUPG-GLS to see if this allows me to converge at higher Reynolds? Use a Picard-type approach instead of Newton?
The only missing step in my optimizer is this 3D solver.
Thanks again for your time.

Hello,
After exploring the idea of approximating the schur complement with an augmented lagrangein method, I came across a grad-div implementation in freefem (GitHub - prj-/moulin2019al: Augmented Lagrangian Preconditioner for Hydrodynamic Stability Analysis). I have the impression that this can be converted to fenicsx. In fact, using splitfield multiplicative we obtain the lower block triangular part of the saddle point matrix. Then we add Sp -1 = -(γ + Re-1)Mp-1 in the pressure block (null) to obtain our preconditioner (using PCFieldSplitGetSubKSP and KSPSetOperators).

However, I get the error

"0 SNES Function norm 1.142225359199e+01

Linear ns_ solve did not converge due to DIVERGED_PC_FAILED iterations 0

PC failed due to SUBPC_ERROR ".

I’ve tried replacing the velocity and pressure block solvers with direct LU solvers, but nothing changes, even if I remove the Sp matrix. I think it’s coming from the multiplicative pc? I’d like to point out that with the schur +Selfp method or by doing a direct LU without gmres, the code works just fine.
Do you have any idea where this is coming from?

Here is my simple, usable code:

from mpi4py import MPI
import dolfinx
import time
from ufl import split, TestFunctions, TestFunction, derivative, adjoint, action, Identity, grad, transpose, inner, dx, replace, div, Measure, TrialFunction, sqrt, dot
import ufl
import basix.ufl
from petsc4py import PETSc
from petsc4py.PETSc import Options
from dolfinx.fem.petsc import NonlinearProblem, apply_lifting, assemble_vector, set_bc, assemble_matrix, create_matrix, create_vector
from dolfinx.fem import Function, functionspace, Constant, dirichletbc, locate_dofs_topological, locate_dofs_geometrical, form, assemble_scalar
from dolfinx.mesh import create_box, create_rectangle, locate_entities_boundary, CellType
from dolfinx.nls.petsc import NewtonSolver
from dolfinx import mesh
import numpy as np
from dolfinx.io.gmshio import model_to_mesh
import scipy
import gmsh
comm = MPI.COMM_WORLD

gmsh.initialize()

tags = {}
tags["inlet"] = 1
tags["outlet"] = 2
tags["wall"] = 3

model = gmsh.model
model_rank = 0
comm = MPI.COMM_WORLD
if comm.rank == model_rank:
    factory = model.occ
    s1 = factory.addRectangle(-1, 0, 0, 1, 1)
    s2 = factory.addRectangle(0, -1, 0, 7.5, 2)
    s3 = 3
    factory.fuse([(2, s1)], [(2, s2)], tag=s3)
    factory.synchronize()
    ov = model.getBoundary([(2, s3)])
    l1, l2, l3, l4, l5, l6 = (val[1] for val in ov)  # counterclockwise (l6 = inlet)

    # Tag interior
    model.addPhysicalGroup(2, [s3], tag=0)

    # Tag boundaries
    model.addPhysicalGroup(1, [l6], tag=tags["inlet"])
    model.setPhysicalName(1, tags["inlet"], "inlet")
    model.addPhysicalGroup(1, [l4], tag=tags["outlet"])
    model.setPhysicalName(1, tags["outlet"], "outlet")
    model.addPhysicalGroup(1, [l1, l2, l3, l5], tag=tags["wall"])
    model.setPhysicalName(1, tags["wall"], "wall")

    # Set uniform mesh size at all points
    mesh_size = 0.025
    model.mesh.setSize(model.getEntities(0), mesh_size)

    # Generate 2D mesh
    model.mesh.generate(dim=2)


# Convert Gmsh mesh to DOLFINx
mesh, _, facet_tags = model_to_mesh(model, MPI.COMM_WORLD, rank=model_rank, gdim=2)
facet_tags.name = "facet_tags"
gmsh.finalize()

solver = "non lu"
precond = "mal"

x = ufl.SpatialCoordinate(mesh)
fdim = mesh.topology.dim-1
metadata = {"quadrature_degree": 4}
dx = Measure("dx", domain=mesh, metadata=metadata)

num_cells = mesh.topology.index_map(mesh.topology.dim).size_local + mesh.topology.index_map(mesh.topology.dim).num_ghosts
cell_indices = np.arange(num_cells, dtype=np.int32)
h_values = mesh.h(dim=mesh.topology.dim, entities=cell_indices); h_avg = np.mean(h_values)

v_max_inlet = Constant(mesh, PETSc.ScalarType(1))
cfl = 3
dt = cfl * h_avg / v_max_inlet
k = Constant(mesh, PETSc.ScalarType(dt))

Re = 100
rho_ = Constant(mesh, PETSc.ScalarType(1))
mu_ = Constant(mesh, PETSc.ScalarType(rho_*v_max_inlet*2/Re))
nu_ = Constant(mesh, PETSc.ScalarType(mu_/rho_))

V_element = basix.ufl.element("CG", mesh.basix_cell(), 2, shape = (mesh.topology.dim,))
P_element = basix.ufl.element("CG", mesh.basix_cell(), 1)
U_element = basix.ufl.mixed_element([V_element,P_element])
U = functionspace(mesh, U_element); U0 = U.sub(0);P0 = U.sub(1) 

w_v, w_p = TestFunctions(U)
uh = Function(U)
v, p = split(uh)
uh_prev = Function(U)
v_prev, p_prev = split(uh_prev)

V_v = U.sub(0).collapse()[0]

def v_inflow_eval(x):
    values = np.zeros((2, x.shape[1]))
    values[0] = 4.0 * x[1] * (1.0 - x[1])
    values[1] = np.zeros(x.shape[1])
    return values

def v0_expr(x):
    values = np.zeros((mesh.topology.dim, x.shape[1]), dtype=PETSc.ScalarType)
    values[0] = v_max_inlet  # v_x = v_max_inlet, v_y = 0, v_z = 0
    return values

v_wall = Function(V_v)
v_wall.x.array[:] = 0.0
v_inflow = Function(V_v)
v_inflow.interpolate(v_inflow_eval)

# Localize boundary dofs for velocity
fdim = mesh.topology.dim - 1
wall_dofs_v = locate_dofs_topological((U0,V_v), fdim, facet_tags.find(tags["wall"]))
inlet_dofs_v = locate_dofs_topological((U0,V_v), fdim, facet_tags.find(tags["inlet"]))
bcs = [dirichletbc(v_inflow, inlet_dofs_v, U0), dirichletbc(v_wall, wall_dofs_v, U0)]

#uh.sub(0).interpolate(v0_expr)
#uh.sub(1).interpolate(lambda x: np.zeros(x.shape[1]))

F_PV = (-div(v) * w_p * dx + nu_ * inner(grad(v),grad(w_v)) * dx + inner(grad(v)*v, w_v) * dx - p * div(w_v) * dx)
if precond == "mal":
    gamma_ = 1.0
    F_PV += - gamma_*div(v)*div(w_v) *dx


class NonlinearPDE_SNESProblem:
    def __init__(self, F, u, bc):
        V = u.function_space
        du = TrialFunction(V)
        self.L = form(F)
        self.a = form(derivative(F, u, du))
        self.bc = bc
        self._F, self._J = None, None
        self.u = u

    def F(self, snes, x, F):
        """Assemble residual vector."""

        x.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
        x.copy(self.u.x.petsc_vec)
        self.u.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

        with F.localForm() as f_local:
            f_local.set(0.0)
        assemble_vector(F, self.L)
        apply_lifting(F, [self.a], bcs=[self.bc], x0=[x], alpha=-1.0)
        F.ghostUpdate(addv=PETSc.InsertMode.ADD, mode=PETSc.ScatterMode.REVERSE)
        set_bc(F, self.bc, x, -1.0)

    def J(self, snes, x, J, P):
        """Assemble Jacobian matrix."""

        x.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
        x.copy(self.u.x.petsc_vec)
        self.u.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

        J.zeroEntries()
        assemble_matrix(J, self.a, bcs=self.bc)
        J.assemble()


#Snes solver
problem = NonlinearPDE_SNESProblem(F_PV, uh, bcs)
J = create_matrix(problem.a)
b = create_vector(problem.L)

# Create Newton solver and solve
snes = PETSc.SNES().create()
snes.setFunction(problem.F, b)
snes.setJacobian(problem.J, J)
problem_prefix = "ns_"
snes.setOptionsPrefix(problem_prefix)

opts = PETSc.Options()
opts.prefixPush(problem_prefix)
opts["snes_type"] = "newtonls"
opts['snes_linesearch_type'] = 'bt'
opts['snes_linesearch_order']= 2
opts['snes_monitor'] = None 

if solver == "Direct":
    snes.getKSP().setType("preonly")
    snes.getKSP().getPC().setType("lu")
    snes.getKSP().getPC().setFactorSolverType("mumps")
    opts.prefixPop()  # ns_
    snes.setFromOptions()
else :

    U_v, map_u = U.sub(0).collapse()  # sous-espace vitesse + indices
    U_p, map_p = U.sub(1).collapse()

    U_indexmap = U.dofmap.index_map
    localdofs_u = np.setdiff1d( U_indexmap.local_to_global(np.array(map_u)), U_indexmap.ghosts ) # getting the global velocity dofs that live on this core
    localdofs_p = np.setdiff1d( U_indexmap.local_to_global(np.array(map_p)), U_indexmap.ghosts ) # getting the global pressure dofs that live on this core

    snes.getKSP().setType("fgmres")
    snes.getKSP().getPC().setType("fieldsplit")
    opts["ksp_max_it"] = 2000
    opts["ksp_converged_reason"] = None
    opts["ksp_rtol"] = 1e-3
    #opts["ksp_monitor"] = None
    opts["ksp_gmres_restart"] = 200
    opts["ksp_pc_side"] = "right"

    ISu = PETSc.IS().createGeneral(np.array(localdofs_u,dtype=np.int32), comm=mesh.comm)
    ISp = PETSc.IS().createGeneral(np.array(localdofs_p,dtype=np.int32), comm=mesh.comm)
    snes.getKSP().getPC().setFieldSplitIS(("u", ISu))
    snes.getKSP().getPC().setFieldSplitIS(("p", ISp))

    if precond == "Simple":

        opts["pc_fieldsplit_type"] = "schur"
        opts["pc_fieldsplit_schur_fact_type"]= "full"
        opts["pc_fieldsplit_schur_precondition"]= "selfp"

        opts["fieldsplit_u_ksp_type"]= "preonly"
        opts["fieldsplit_u_pc_type"]= "hypre"
        opts["fieldsplit_u_pc_hypre_type"]= "boomeramg"
        opts["fieldsplit_u_pc_hypre_boomeramg_no_CF"]= None
        opts["fieldsplit_u_pc_hypre_boomeramg_coarsen_type"]= "HMIS"
        opts["fieldsplit_u_pc_hypre_boomeramg_interp_type"]= "ext+i"
        opts["fieldsplit_u_pc_hypre_boomeramg_P_max"]= 4
        opts["fieldsplit_u_pc_hypre_boomeramg_agg_nl"]= 1
        opts["fieldsplit_u_pc_hypre_boomeramg_agg_num_paths"]= 2

        opts["fieldsplit_p_ksp_type"]= "preonly"
        opts["fieldsplit_p_pc_type"]= "jacobi"

        opts.prefixPop()  # ns_
        snes.setFromOptions()
    
    if precond == "mal":
        opts["pc_fieldsplit_type"] = "multiplicative"

        opts["fieldsplit_u_ksp_type"] = "gmres"
        opts["fieldsplit_u_ksp_pc_side"] = "right"
        opts["fieldsplit_u_ksp_rtol"] = "1e-1"
        opts["fieldsplit_u_ksp_gmres_restart"] = 50
        opts["fieldsplit_u_pc_type"] = "asm"
        opts["fieldsplit_u_pc_asm_overlap"] = "1"
        opts["fieldsplit_u_sub_pc_type"] = "lu"
        opts["fieldsplit_u_sub_pc_factor_mat_solver_type"] = "mumps"
        #opts["fieldsplit_u_ksp_type"] = "preonly"
        #opts["fieldsplit_u_pc_type"] = "lu"
        #opts["fieldsplit_u_pc_factor_mat_solver_type"] = "mumps"

        opts["fieldsplit_p_ksp_type"] = "cg"
        opts["fieldsplit_p_ksp_max_it"] = "5"
        opts["fieldsplit_p_pc_type"] = "jacobi"
        #opts["fieldsplit_p_ksp_type"] = "preonly"
        #opts["fieldsplit_p_pc_type"] = "lu"
        #opts["fieldsplit_p_pc_factor_mat_solver_type"] = "mumps"

        opts.prefixPop()
        snes.setFromOptions()
        #snes.setUp()             # construit SNES et KSP principal
        #snes.getKSP().setUp()              # construit le KSP global
        #snes.getKSP().getPC().setUp()               # construit le PC FieldSplit
        subksp = snes.getKSP().getPC().getFieldSplitSubKSP()
        vel_ksp, pres_ksp = subksp

        q_p, r_p = ufl.TrialFunction(U_p), ufl.TestFunction(U_p)
        S_form = form (-1/(gamma_ + 1/Re) * q_p*r_p *dx)
        S = assemble_matrix(S_form)
        S.assemble()
        pres_ksp.setOperators(S)
        snes.getKSP().getPC().view()


x = uh.x.petsc_vec.copy()
x.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)

t0 = time.time()

snes.solve(None, x)
t1 = time.time()
if comm.rank ==0:
    print("Temps écoulé: ", t1-t0)
x.copy(uh.x.petsc_vec)
uh.x.petsc_vec.ghostUpdate(addv=PETSc.InsertMode.INSERT, mode=PETSc.ScatterMode.FORWARD)
uh.x.scatter_forward()
snes.getKSP().getPC().view()