How to get more information from solver failure

Hi,
I’m working on converting some Fenics code into Fenicsx and wanted to see if I could get some help to better understand why I’m getting an error message.

Right now I’m getting the error message:

Failed to successfully call PETSc function 'KSPSolve'. PETSc error code is: 76, Error in external library.

I’m curious how I could get more insight on the error / what the error actually is. I’ve read that it often means I’m running out of memory and to use a ksp solver, however that also has a problem where the error goes to inf.

For context the error happens on roughly timestep 270.

I’m currently using the newton solver with the ksp options:

opts[f"{option_prefix}ksp_type"] = "gmres"
opts[f"{option_prefix}pc_factor_mat_solver_type"] = "mumps"
opts[f"{option_prefix}ksp_max_it"] = 50 

If anyone is curious (though I don’t expect anyone to read it) here is the current thermo gel code:

And here is the fenics code:

Hi Jorge,

I’ve read that it often means I’m running out of memory

This should be easy to check, by opening your task manager or otherwise monitoring your memory usage in a terminal while running your script. Try running you Jupyter notebook locally, if you haven’t been doing that already.

however that also has a problem where the error goes to inf.

As in, the solution diverges? It could just be that the “external library error” that you’re encountering is just a divide-by-zero or other floating-point error that happens once your numerical solution tends to infinity. This should be easy to check by just printing the norm of your numerical solution every time step.

Try running the same simulation for only 260-265-ish time steps to see whether the solution still makes sense, or whether you’re already seeing unreasonably large numbers. Try running the same problem with a finer mesh or smaller time step, to see whether that impacts things. If that doesn’t give you any insights I’d start to simplify your problem by taking out parts of your model, in an attempt to identify the problematic areas.

1 Like

Thanks for the response!

Yeah I’ve been running on my laptop and so far it doesn’t seem to be the case (unless there is a memory limit setting I’m unaware of). Python is using less than 1 gig and the mesh is a 2d 15,15 mesh.

Yeah the solution diverges. I’m also leaning towards that being the problem. I’ve looked at the output and it all seems fine (physical with realistic values) until just this one step (which happens semi randomly).

When you’re saying to print the norms of the solution, do you mean to print the L2 norm? I did that and they seem reasonable. It makes me wondering if maybe there is a local error that is too high? Is there a good way to get that from dolfinx?

I’ve been trying to simplify the simulation but honestly it seems to be coming from the coupling of heat + Gel swelling effects. Either one independently works just fine, so it’s quite hard to debug.

When you’re saying to print the norms of the solution, do you mean to print the L2 norm?

Yeah the L2-norm, or even just the solution vector’s 2-norm – just anything that can indicate solution divergence.

It makes me wondering if maybe there is a local error that is too high? Is there a good way to get that from dolfinx?

The infinity-norm of a vector is the maximum absolute value found in that vector – you can try looking at that. What I tend to do for debugging is to convert vectors to Numpy arrays, with which you can then use utilities like linalg.norm or anything else that could be useful.

Next to looking at the norms of the solution vector, you could try to compute the 2-norms and infinity-norms of the terms of your variational form to see where things are blowing up. If you assemble each term as a vector (with assemble_vector) you can convert the vectors to Numpy. If you see that one specific term is blowing up, that could give you some info on where in your model things are blowing up.

Yeah I’ve been running on my laptop and so far it doesn’t seem to be the case (unless there is a memory limit setting I’m unaware of). Python is using less than 1 gig and the mesh is a 2d 15,15 mesh.

As a sanity check, try running the same model with smaller time steps and a finer mesh. It’s usually a pretty low-effort thing to do, but it could give you some information on how rapidly solution divergence occurs. It also helps to eliminate some low-hanging fruit that could result in non-convergence issues. Lastly, try tightening your Newton solver convergence tolerance to something like 1e-14 or so.

Thanks for the very detailed response. I think I found the issue which is there was some buckling happening causing an instability In the mesh. How to fix that is another issue.

One question I still had is what you meant about using the assemble_vector function.
Do you have an example. In my case I’m doing a nonlinear solve, so I just have a residual form. I would have loved to be able to plot the local residuals (or just see the numpy matrix) for each on of the terms if that’s possible.

Best,
Jorge

Consider looking at this Dolfinx tutorial, which briefly covers the Dolfinx Newton solver for nonlinear problems. If you have a residual form you can use assemble_vector from dolfinx.fem to construct a residual vector, where each vector entry corresponds to one basis function of your TestFunction. This is exactly how the right-hand-side is constructed inside the Newton solver. The norm of the residual vector will give you an idea of how well your nonlinear solution has converged.

Your residual form will consist of multiple terms, right? If you use assemble_vector to construct the corresponding vectors of each separate term of your residual form, you can compute their norms. You can check whether the norm of any specific term starts blowing up.

1 Like

Thanks great thanks!

That worked for me and I can see the residual changing now and such.

One last thing I had a question of is if’s normal that the reported error from the solver and the error computed from the residual differ?

For example here the newton solver error reported is 3.4e-13

While the norm I’m calculating is 2.18e-11
Screenshot 2023-10-14 at 3.39.29 PM

They are both very small so I wouldn’t be surprised if it’s some numerical error but I wanted to make sure that there wasn’t something else I wasn’t seeing.

For context I am setting the Dirichlet BC’s to 0 and applying them to the vector before getting the norm.

You’re likely seeing small differences because of how the Dirichlet BCs are being taken into account. For example, inside the Newton solver the columns corresponding to Dirichlet BCs are moved to the right-hand side of the iterative linearized system (i.e. to the residual vector) by a process called lifting, which is also covered in the tutorial that I mentioned earlier. That likely affects the norm of the residual vector inside the Newton solver. The difference you showed shouldn’t matter much.

1 Like