A new non-zero element appeared in the time loop, causing malloc

Hi everyone!
As the title suggests,my program generated non-zero elements during matrix assembly at step 394 of the loop.I run it in parallel twice with the same result(also at step 394 instead of other steps), which means it may be a code issue.
Because reproducing the error may take a long time to calculate, the code I provided is missing the solution part:hou_gmres_para_pre(A,b,x,maxit,10e-8) is custom gmres solver.I am sure it is correct because it can run 393 steps.

import petsc4py
from mpi4py import MPI
from petsc4py import PETSc
import numpy as np
import dolfinx
from dolfinx import default_real_type, fem, io, mesh,default_scalar_type
from dolfinx.fem.petsc import assemble_matrix, assemble_vector
from ufl import (CellDiameter, FacetNormal, TestFunction, TrialFunction, avg,
                 conditional, div, dot, dS, ds, dx, grad, gt, inner, outer,
                 TrialFunctions,TestFunctions,curl,cross)
import ufl
from dolfinx.io import gmshio,XDMFFile
from basix.ufl import element,mixed_element 
from dolfinx.fem.petsc import (apply_lifting, assemble_matrix, assemble_vector, 
                                 create_vector, set_bc,create_matrix)
import time
from dolfinx.fem.petsc import (apply_lifting, assemble_matrix_block, assemble_vector_block, 
                                 create_vector_block, set_bc,create_matrix_block,create_matrix)
import os 

def norm_L2(comm, v):
    """Compute the L2(Ω)-norm of v"""
    return np.sqrt(comm.allreduce(fem.assemble_scalar(fem.form(inner(v, v) * dx)), op=MPI.SUM))

def norm_L2_B(comm, v):
    # """Compute the L2(Ω)-norm of B""" input is A not B 
    return np.sqrt(comm.allreduce(fem.assemble_scalar(fem.form(inner(curl(v), curl(v)) * dx)), op=MPI.SUM))

def norm_r(x):
    return (x[0]**2+x[1]**2+x[2]**2)**0.5

def r_2(x):
    return (x[0]**2+x[1]**2+x[2]**2)

def noslip(x):
    return np.stack((np.zeros(x.shape[1]), np.zeros(x.shape[1]), np.zeros(x.shape[1])))

def boundT_0(x):
    values = np.zeros((1, x.shape[1]), dtype=PETSc.ScalarType)
    return values

def boundT_1(x):
    values = np.ones((1, x.shape[1]), dtype=PETSc.ScalarType)
    return values

def boundA(x):
    values = np.zeros((gdim, x.shape[1]), dtype=PETSc.ScalarType)
    return values

def initial_A(x):
    values = np.zeros((gdim, x.shape[1]), dtype=PETSc.ScalarType)
    values[0]=x[0]*15/np.sqrt(2)/16*np.sin(np.pi*(norm_r(x)-ri))*(2*x[2]**2-r_2(x))/r_2(x) \
            -x[1]/norm_r(x)*5/np.sqrt(2)/16*(-48*ri*ro+(4*ro+ri*(4+3*ro))*6*norm_r(x)-4*(4+3*(ri+ro))*r_2(x)+9*norm_r(x)**3)
    values[1]=x[1]*15/np.sqrt(2)/16*np.sin(np.pi*(norm_r(x)-ri))*(2*x[2]**2-r_2(x))/r_2(x) \
            +x[0]/norm_r(x)*5/np.sqrt(2)/16*(-48*ri*ro+(4*ro+ri*(4+3*ro))*6*norm_r(x)-4*(4+3*(ri+ro))*r_2(x)+9*norm_r(x)**3)
    values[2]=x[2]*15/np.sqrt(2)/16*np.sin(np.pi*(norm_r(x)-ri))*(2*x[2]**2-r_2(x))/r_2(x)
    return values

def T_x(x):
    return (2*norm_r(x)-ri-ro)

def initial_T(x):
    return ro*ri/norm_r(x)-ri+21/np.sqrt(17920*np.pi)*(1-3*T_x(x)**2+3*T_x(x)**4-T_x(x)**6)*((x[0]**2+x[1]**2)/(x[0]**2+x[1]**2+x[2]**2))**2*\
           (1-8*x[0]**2/(x[0]**2+x[1]**2)+8*x[0]**4/(x[0]**2+x[1]**2)**2)

t_end=0.1 
num_time_steps =100 

maxit=1000

out_interval=1
Ekman=0.001
Pm=5
Ra=100
Pr=1
ro=20/13
ri=7/13
tag="cut"
##########
csv_dir_u='u_csv'+tag+'/'
csv_dir_B='B_csv'+tag+'/'

Vol=4/3*np.pi*(ro**3-ri**3)

if MPI.COMM_WORLD.Get_rank() ==0:

    if not os.path.exists(csv_dir_B):
        os.makedirs(csv_dir_B)

    if not os.path.exists(csv_dir_u):
        os.makedirs(csv_dir_u)

    filelist=["Ekin_"+tag+".txt","Mkin_"+tag+".txt","SCAL_U_"+tag+".txt","Force_L_"+tag+".txt","Force_C_"+tag+".txt","Force_I_"+tag+".txt",
            "Force_O_"+tag+".txt","Force_V_"+tag+".txt"]

    for filename in filelist:
        if os.path.exists(filename):
            with open(filename,"w") as file:
                pass 

#msh,cell_markers,facet_markers=gmshio.read_from_msh("ball8.msh",MPI.COMM_WORLD,gdim=3) 
read_mesh = io.XDMFFile(MPI.COMM_WORLD, "cp6n_0.24w.xdmf" ,"r") 
msh=read_mesh.read_mesh(name="Grid")  
gdim = msh.geometry.dim   

U_space=fem.functionspace(msh,("RT",2)) 
P_space=fem.functionspace(msh,("DG",1)) 
A_space=fem.functionspace(msh,("N1curl",2)) 
T_space=fem.functionspace(msh,("CG",1)) 

u,v=ufl.TrialFunction(U_space), ufl.TestFunction(U_space)
P,q=ufl.TrialFunction(P_space), ufl.TestFunction(P_space)
A,Av=ufl.TrialFunction(A_space), ufl.TestFunction(A_space)
T,Tv=ufl.TrialFunction(T_space), ufl.TestFunction(T_space)

u_offset = U_space.dofmap.index_map.size_local * U_space.dofmap.index_map_bs
p_offset = u_offset + P_space.dofmap.index_map.size_local * P_space.dofmap.index_map_bs
A_offset = p_offset + A_space.dofmap.index_map.size_local * A_space.dofmap.index_map_bs
T_offset = A_offset + T_space.dofmap.index_map.size_local * T_space.dofmap.index_map_bs

# Funcion space for visualising the velocity field
WW = fem.functionspace(msh, ("CG", 4, (gdim,)))
#TT = fem.functionspace(msh, ("Discontinuous Lagrange", k + 1))
TT = fem.functionspace(msh, ("Lagrange", 1))


u_n_1=fem.Function(U_space)
u_n_2=fem.Function(U_space)
u_D = fem.Function(U_space)

A_n_1=fem.Function(A_space)
A_n_2=fem.Function(A_space)

A_n_1.interpolate(initial_A)
A_n_2.interpolate(initial_A)


T_n_1=fem.Function(T_space)
T_n_2=fem.Function(T_space)
T_n_1.interpolate(initial_T)
T_n_2.interpolate(initial_T)
T_D=fem.Function(T_space)
T_D.interpolate(boundT_1)   #cell_tag_oc.find(12)


boundaries = [(1, lambda x: np.isclose(x[0]**2+x[1]**2+x[2]**2, ro**2)),
              (2, lambda x: np.isclose(x[0]**2+x[1]**2+x[2]**2, ri**2))]

facet_indices, facet_markers = [], []
fdim = msh.topology.dim - 1
for (marker, locator) in boundaries:
    facets = mesh.locate_entities(msh, fdim, locator)
    facet_indices.append(facets)
    facet_markers.append(np.full_like(facets, marker))
facet_indices = np.hstack(facet_indices).astype(np.int32)
facet_markers = np.hstack(facet_markers).astype(np.int32)
sorted_facets = np.argsort(facet_indices)
facet_tag = mesh.meshtags(msh, fdim, facet_indices[sorted_facets], facet_markers[sorted_facets])
ds = ufl.Measure("ds", domain=msh, subdomain_data=facet_tag)

msh.topology.create_connectivity(msh.topology.dim-1, msh.topology.dim)

class BoundaryCondition_u():
    def __init__(self, type, marker, values):
        self._type = type
        if type == "Dirichlet":
            u_D = fem.Function(U_space)
            u_D.interpolate(values)
            facets = facet_tag.find(marker)
            dofs = fem.locate_dofs_topological(U_space, fdim, facets)
            self._bc = fem.dirichletbc(u_D, dofs)
        elif type == "Neumann":
                self._bc = inner(values, v) * ds(marker)
        elif type == "Robin":
            self._bc = values[0] * inner(u-values[1], v)* ds(marker)
        else:
            raise TypeError("Unknown boundary condition: {0:s}".format(type))
    @property
    def bc(self):
        return self._bc

    @property
    def type(self):
        return self._type


class BoundaryCondition_A(): 
    def __init__(self, type, marker, values):
        self._type = type
        if type == "Dirichlet":
            u_D = fem.Function(A_space)
            u_D.interpolate(values)
            facets = facet_tag.find(marker)
            dofs = fem.locate_dofs_topological(A_space, fdim, facets)
            self._bc = fem.dirichletbc(u_D, dofs)
        elif type == "Neumann":
                self._bc = inner(values, v) * ds(marker)
        elif type == "Robin":
            self._bc = values[0] * inner(u-values[1], v)* ds(marker)
        else:
            raise TypeError("Unknown boundary condition: {0:s}".format(type))
    @property
    def bc(self):
        return self._bc

    @property
    def type(self):
        return self._type


class BoundaryCondition_T():
    def __init__(self, type, marker, values):
        self._type = type
        if type == "Dirichlet":
            u_D = fem.Function(T_space)
            u_D.interpolate(values)
            #facets = facet_tag.find(marker)
            dofs = fem.locate_dofs_geometrical(T_space, boundaries[marker-1][1])
#            fem.locate_dofs_topological((W.sub(2),W2), fdim, facets)
            
            self._bc = fem.dirichletbc(u_D, dofs)
        elif type == "Neumann":
                self._bc = inner(values, v) * ds(marker)
        elif type == "Robin":
            self._bc = values[0] * inner(u-values[1], v)* ds(marker)
        else:
            raise TypeError("Unknown boundary condition: {0:s}".format(type))
    @property
    def bc(self):
        return self._bc

    @property
    def type(self):
        return self._type


boundary_conditions_u = [BoundaryCondition_u("Dirichlet", 1, noslip),
                       BoundaryCondition_u("Dirichlet",2, noslip)]


boundary_conditions_T = [BoundaryCondition_T("Dirichlet", 1, boundT_0),
                       BoundaryCondition_T("Dirichlet", 2, boundT_1)]


bcs=[]
for condition in boundary_conditions_u:
    if condition.type == "Dirichlet":
        bcs.append(condition.bc)

for condition in boundary_conditions_T: 
    if condition.type == "Dirichlet": 
        bcs.append(condition.bc) 

   

delta_t = fem.Constant(msh, default_real_type(t_end / num_time_steps))
alpha = fem.Constant(msh, default_real_type(6.0))

h = CellDiameter(msh)
n = FacetNormal(msh)
x = ufl.SpatialCoordinate(msh)


def mean(vn,vn_1): 
    return 1/2*(vn+vn_1) 

def star(v_n_1,v_n_2):
    return 1/2*(3*v_n_1-v_n_2)

def jump(phi, n):
    return outer(phi("+"), n("+")) + outer(phi("-"), n("-"))


lmbda = conditional(gt(dot(star(u_n_1,u_n_2), n), 0), 1, 0)
u_uw = lmbda("+") * mean(u,u_n_1)("+") + lmbda("-") * mean(u,u_n_1)("-")



a_00=Ekman*(inner(u / delta_t, v) * dx - inner(u_n_1 / delta_t, v) * dx)


a_00+= Ekman*(-inner(mean(u,u_n_1), div(outer(v, star(u_n_1,u_n_2)))) * dx +\
    inner((dot(star(u_n_1,u_n_2), n))("+") * u_uw, v("+")) * dS + \
    inner((dot(star(u_n_1,u_n_2), n))("-") * u_uw, v("-")) * dS + \
    inner(dot(star(u_n_1,u_n_2), n) * lmbda * u, v) * ds)


a_00+=Ekman * (inner(grad(mean(u,u_n_1)), grad(v)) * dx
                     + (alpha / avg(h)) * inner(jump(mean(u,u_n_1), n), jump(v, n)) * dS)

a_00+=2*inner(cross(ufl.as_vector([0,0,1]),mean(u,u_n_1)),v)*dx



zeros=fem.Constant(msh, default_real_type(0))

a_01= -inner(P, div(v)) * dx    
a_10= -inner(div(u), q) * dx
a_11= inner(zeros*P,q)*dx    

a_02  = 1.0/Pm * inner(A/ delta_t,cross(curl(star(A_n_1,A_n_2)),v))*dx - 1.0/Pm * inner(A_n_1/ delta_t,cross(curl(star(A_n_1,A_n_2)),v))*dx
a_00 += 1.0/Pm *inner(cross(curl(star(A_n_1,A_n_2)),mean(u,u_n_1)),cross(curl(star(A_n_1,A_n_2)),v))*dx


a_03 = -Ra * inner(ufl.as_vector([x[0],x[1],x[2]])*mean(T,T_n_1),v) *dx 

a_22 = inner(A/ delta_t,Av) * dx -inner(A_n_1/ delta_t,Av) * dx
a_20 = inner(cross(curl(star(A_n_1,A_n_2)),mean(u,u_n_1)),Av) *dx
a_22 +=1/Pm *inner(curl(mean(A,A_n_1)),curl(Av)) *dx 
a_22 +=inner(cross(curl(A),n),Av)*ds 

a_33 =inner(T / delta_t, Tv) * dx -inner(T_n_1 / delta_t, Tv) * dx 
a_33 +=1/Pr*inner(grad(mean(T,T_n_1)),grad(Tv))*dx 
a_33 +=0.5*inner(dot(star(u_n_1,u_n_2),grad(mean(T,T_n_1))),Tv)*dx 
a_33 -=0.5*inner(dot(star(u_n_1,u_n_2),grad(Tv)),mean(T,T_n_1))*dx 

a_00_final=fem.form(ufl.lhs(a_00))
a_01_final=fem.form(ufl.lhs(a_01))
a_02_final=fem.form(ufl.lhs(a_02))
a_20_final=fem.form(ufl.lhs(a_20))

a_11_final=fem.form(a_11)
a_10_final=fem.form(ufl.lhs(a_10))

a_22_final=fem.form(ufl.lhs(a_22))

a_33_final=fem.form(ufl.lhs(a_33))

L0=fem.form(ufl.rhs(a_00)+ufl.rhs(a_01)+ufl.rhs(a_02)+ufl.rhs(a_03))

L1 = fem.form(inner(fem.Constant(msh, 0.0), q) * dx)
L2=fem.form(ufl.rhs(a_22)+ufl.rhs(a_20))
L3=fem.form(ufl.rhs(a_33))

a = [    
        [a_00_final, a_01_final, a_02_final,     None,    ],
        [a_10_final, a_11_final,     None,       None,    ],
        [a_20_final,     None,   a_22_final,     None,    ],
        [   None,        None,       None,      a_33_final,],
    ]    

L = [L0, L1, L2, L3]  

def jump2(phi):
    return phi("+") - phi("-")

ap=(inner(grad(P), grad(q)) * dx
                     + (alpha / avg(h)) * inner(jump2(P), jump2(q)) * dS)

fp=Ekman/delta_t*inner(P,q)*dx   
fp+=Ekman*inner(grad(P),grad(q))*dx  
fp+=Ekman*inner(dot(star(u_n_1,u_n_2),grad(P)),q)*dx 
fp+=Ekman*(alpha / avg(h)) * inner(jump(P,n), jump(q,n)) * dS 

Mp=fem.form(inner(P,q)*dx)    
Ap=fem.form(ap)     
Fp=fem.form(fp)     

a_p = ([   
        [a_00_final, a_01_final, a_02_final,     None,   ],
        [   None,    a_11_final,   None,          None,   ],
        [   None,     None,      a_22_final,      None,   ],
        [   None,     None,       None,      a_33_final, ],
    ])    

A = create_matrix_block(a) 
b=  create_vector_block(L) 
x = A.createVecRight() 

ksp = PETSc.KSP().create(msh.comm)  # type: ignore  
ksp.setOperators(A)  
ksp.setType("gmres")  
ksp.setInitialGuessNonzero(False)  
ksp.getPC().setType(PETSc.PC.Type.BJACOBI)  
opts = PETSc.Options()  # type: ignore 
opts["ksp_rtol"] = 1.0e-10    
opts["ksp_atol"] = 1.0e-3      
opts["ksp_max_it"] =2000      
ksp.setFromOptions()  

A.zeroEntries()  
assemble_matrix_block(A, a, bcs=bcs)  # type: ignore  
A.assemble()   

with b.localForm() as b_loc: 
    b_loc.set(0) 
assemble_vector_block(b, L, a, bcs=bcs)  # type: ignore 

P=fem.petsc.create_matrix_nest(a_p)
#P = fem.petsc.assemble_matrix_nest(a_p, bcs=bcs) 
P.assemble() 

b = create_vector_block(L) 
with b.localForm() as b_loc:
        b_loc.set(0)
assemble_vector_block(b, L, a, bcs=bcs)  # type: ignore

comm = MPI.COMM_WORLD  
rank=MPI.COMM_WORLD.Get_rank()  
size = comm.Get_size()   

nested_IS = P.getNestISs()   
P.destroy()  

P0 = assemble_matrix(a_00_final, bcs=bcs) 
P0.assemble() 
b0 = create_vector(L0) 
x0 = b0.duplicate()  
ksp_g0 = PETSc.KSP().create(msh.comm)  # type: ignore 
ksp_g0.setOperators(P0) 
ksp_g0.setType('gmres') 
ksp_g0.setInitialGuessNonzero(False) 
ksp_g0.getPC().setType(PETSc.PC.Type.ASM) 
opts = PETSc.Options()  # type: ignore 
opts["ksp_rtol"] = 1.0e-10    
opts["ksp_atol"] = 1.0e-6    
opts["ksp_max_it"] =2000       
ksp_g0.setFromOptions()  
#ksp_g0.setMonitor(monitor_residual)  

P1 = assemble_matrix(Ap, bcs=[])  
P1.assemble()   
b1 = create_vector(L1)  
x1 = b1.duplicate()    
ksp_g1 = PETSc.KSP().create(msh.comm)  # type: ignore   
ksp_g1.setOperators(P1)  
ksp_g1.setType('gmres')  #'preonly'   
ksp_g1.setInitialGuessNonzero(False)   
ksp_g1.getPC().setType(PETSc.PC.Type.GAMG)   
opts = PETSc.Options()  # type: ignore  
opts["ksp_rtol"] = 1.0e-10    
opts["ksp_atol"] = 1.0e-6     
opts["ksp_max_it"] =2000       
ksp_g1.setFromOptions()   
#ksp_g1.setMonitor(monitor_residual)    

P12 = assemble_matrix(Mp, bcs=[])  
P12.assemble()    
b2 = create_vector(fem.form(L1))   
x2 = b2.duplicate()     
ksp_g12 = PETSc.KSP().create(msh.comm)  # type: ignore  
ksp_g12.setOperators(P12)   
ksp_g12.setType('cg')  #'preonly'   
ksp_g12.setInitialGuessNonzero(False)     
ksp_g12.getPC().setType(PETSc.PC.Type.BJACOBI)     
opts = PETSc.Options()  # type: ignore    
opts["ksp_rtol"] = 1.0e-10  
opts["ksp_atol"] = 1.0e-6    
opts["ksp_max_it"] =2000       
ksp_g12.setFromOptions()    

PFP = create_matrix(Fp)    
PFP.assemble()   

P2 = assemble_matrix(a_22_final,bcs=bcs)  
P2.assemble()  
b2 = create_vector(L2)    
x2 = b2.duplicate()  
ksp_g2 = PETSc.KSP().create(msh.comm)  # type: ignore  
ksp_g2.setOperators(P2)    
ksp_g2.setType('cg') 
ksp_g2.getPC().setType(PETSc.PC.Type.BJACOBI)   
opts = PETSc.Options()  # type: ignore
opts["ksp_rtol"] = 1.0e-6    
opts["ksp_atol"] = 1.0e-6     
opts["ksp_max_it"] =2000       
ksp_g2.setFromOptions()   
#ksp_g2.setMonitor(monitor_residual)   

P3 = assemble_matrix(a_33_final, bcs=bcs)
P3.assemble()
b3 = create_vector(L3)
x3 = b3.duplicate() 
ksp_g3 = PETSc.KSP().create(msh.comm)  # type: ignore
ksp_g3.setOperators(P3) 
ksp_g3.setType('cg')  
ksp_g3.setInitialGuessNonzero(False) 
ksp_g3.getPC().setType(PETSc.PC.Type.GAMG) 
opts = PETSc.Options()  # type: ignore 
opts["ksp_rtol"] = 1.0e-10   
opts["ksp_atol"] = 1.0e-6     
opts["ksp_max_it"] =2000      
ksp_g3.setFromOptions()    
#sp_g3.setMonitor(monitor_residual)     

P01 = assemble_matrix(a_01_final, bcs=bcs)    
P01.assemble()    

P02 = assemble_matrix(a_02_final, bcs=bcs)  
P02.assemble()   

t=0  
u_vis = fem.Function(WW)   
P_vis = fem.Function(TT)   
A_vis = fem.Function(WW)    
B_vis=fem.Function(WW)    
T_vis = fem.Function(TT)    

u_file = io.VTXWriter(msh.comm, "u_dynamo_"+tag+".bp4", [u_vis._cpp_object],engine="BP4") 
P_file = io.VTXWriter(msh.comm, "P_dynamo_"+tag+".bp4", [P_vis._cpp_object],engine="BP4") 
A_file = io.VTXWriter(msh.comm, "A_dynamo_"+tag+".bp4", [A_vis._cpp_object],engine="BP4") 
T_file = io.VTXWriter(msh.comm, "T_dynamo_"+tag+".bp4", [T_vis._cpp_object],engine="BP4") 
B_file = io.VTXWriter(msh.comm, "B_dynamo_"+tag+".bp4", [B_vis._cpp_object],engine="BP4")

u_file.write(t) 
A_file.write(t) 
T_file.write(t) 

dof_coordinates = WW.tabulate_dof_coordinates() 
num_local_dofs = WW.dofmap.index_map.size_local 

u_h=fem.Function(U_space)
p_h=fem.Function(P_space)
A_h=fem.Function(A_space)
T_h=fem.Function(T_space)

## time loop
for n in range(1000):
    t += delta_t.value

    A.zeroEntries()  
    fem.petsc.assemble_matrix_block(A, a, bcs=bcs)  # type: ignore  
    A.assemble()  
    
    with b.localForm() as b_loc:   
        b_loc.set(0)   
    fem.petsc.assemble_vector_block(b, L, a, bcs=bcs)  # type: ignore   
    
    P0 = assemble_matrix(a_00_final, bcs=bcs)   
    P0.assemble()   
    
    P3 = assemble_matrix(a_33_final, bcs=bcs)   
    P3.assemble()  
    
    P2= assemble_matrix(a_22_final, bcs=bcs)   
    P2.assemble()  
    
    P02 = assemble_matrix(a_02_final, bcs=bcs)   
    P02.assemble()    

    PFP = assemble_matrix(Fp, bcs=[])    
    PFP.assemble()   

    hou_gmres_para_pre(A,b,x,maxit,10e-8)    # SOLVE problem
    
    u_h.x.array[:u_offset] = x.array_r[:u_offset] 
    p_h.x.array[:p_offset-u_offset]=x.array_r[u_offset:p_offset] 
    A_h.x.array[:A_offset-p_offset]=x.array_r[p_offset:A_offset] 
    T_h.x.array[:T_offset-A_offset]=x.array_r[A_offset:T_offset]  
    
    u_h.x.scatter_forward()
    p_h.x.scatter_forward()
    A_h.x.scatter_forward() 
    T_h.x.scatter_forward() 
    
    ekin=norm_L2(msh.comm,u_h)
    mkin=norm_L2_B(msh.comm,A_h)

    if n % 50 ==0:
       
        u_vis.interpolate(u_h) 
        P_vis.interpolate(p_h)
        A_vis.interpolate(A_h)
        T_vis.interpolate(T_h)

        B_expr = fem.Expression(curl(A_h),WW.element.interpolation_points()) 
        B_vis.interpolate(B_expr) 

        u_file.write(t) 
        A_file.write(t) 
        T_file.write(t) 
        B_file.write(t) 
        
    u_n_2.x.array[:] = u_n_1.x.array[:]
    u_n_1.x.array[:] = u_h.x.array[:]

    A_n_2.x.array[:] = A_n_1.x.array[:]
    A_n_1.x.array[:] = A_h.x.array[:]

    T_n_2.x.array[:] = T_n_1.x.array[:]
    T_n_1.x.array[:] = T_h.x.array[:]

try:
    u_file.close()
    A_file.close()
    T_file.close()
    P_file.close()
    B_file.close()
except NameError:
    pass

The error is:

 File "/data/huawei_share/wanghao/work_fenicsx/pcd_A.py", line 964, in <module>
    P0 = assemble_matrix(a_00_final, bcs=bcs)
  File "/public/home/wanghao/.conda/envs/fenicsx09/lib/python3.13/functools.py", line 931, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/public/home/wanghao/.conda/envs/fenicsx09/lib/python3.13/site-packages/dolfinx/fem/petsc.py", line 445, in assemble_matrix
    assemble_matrix_mat(A, a, bcs, diagonal, constants, coeffs)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/home/wanghao/.conda/envs/fenicsx09/lib/python3.13/site-packages/dolfinx/fem/petsc.py", line 469, in assemble_matrix_mat
    A.assemblyEnd(PETSc.Mat.AssemblyType.FLUSH)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "petsc4py/PETSc/Mat.pyx", line 3336, in petsc4py.PETSc.Mat.assemblyEnd
petsc4py.PETSc.Error: error code 63
[ 0] MatAssemblyEnd() at /home/conda/feedstock_root/build_artifacts/bld/rattler-build_petsc_1738766536/work/src/mat/interface/matrix.c:5873
[ 0] MatAssemblyEnd_MPIAIJ() at /home/conda/feedstock_root/build_artifacts/bld/rattler-build_petsc_1738766536/work/src/mat/impls/aij/mpi/mpiaij.c:779
[ 0] MatSetValues_MPIAIJ() at /home/conda/feedstock_root/build_artifacts/bld/rattler-build_petsc_1738766536/work/src/mat/impls/aij/mpi/mpiaij.c:598
[ 0] Argument out of range
[ 0] New nonzero at (384,55860) caused a malloc. Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check

I read a webpage with the same error result as mine.For the assembly of the Navistox equation, my assembly problem occurred in the fluid block.This confuses me.
[ Malloc while assembling Jacobian in parallel - General - FEniCS Project]

As we do not have the mesh, we cannot reproduce the error.
One thing to note is that I would recommend creating the matrices P0, P1 etc outside the loop, as done with A, as you otherwise will experience a memory creep from PETSc (as you haven’t called P0.destroy()) and you are recreating the matrix structure in the loop, making it very slow.

Thanks dokken!
Even if the grid is given, it takes 16 processes to run for 6 hours to reproduce the error.The cost of reproducing is expensive, and I cannot understand this error. It was reported at step 394 in the loop, not at the beginning.So I’m just here to collect some possible suggestions.
For the assembly you mentioned being placed outside the loop, it is because P0 P1 P2 P3 are the main diagonal block matrices of my preconditioner matrix, and they are time-dependent.

I mean the construction of the matrix, not the assembly itself. Ie similar to

Where A is created outside the loop, then entries are zeroed out, and the matrix is filled with new data.

Thank you, Dokken, for providing suggestions for improvement.The problem has been solved!