# Speeding up dolfinx code

Dear colleagues,

I want to solve Usadel equations with fenicsx for a structure superconductor/ ferromagnet/ superconductor. These equations are formulated for 8 complex functions, \chi^{1}, … \chi^{8} with the help of 2to2 matrices:
\hat{\gamma} = \chi^1 \hat{1} + \chi^{2} \hat{\sigma}_x + \chi^{3} \hat{\sigma}_y + \chi^{4} \hat{\sigma}_z,\qquad \hat{ \tilde{\gamma} } = \chi^5 \hat{1} + \chi^{6} \hat{\sigma}_x + \chi^{7} \hat{\sigma}_y + \chi^{8}\hat{\sigma}_z;\\ \hat{N} = (\hat{1} + \hat{\gamma}\hat{\tilde{\gamma}})^{-1}, \qquad \hat{\tilde{N}} = -(\hat{1} + \hat{\tilde{\gamma}}\hat{\gamma})^{-1}.\\
Here hats \hat{\ } are used for matrices 2 to 2, \hat{\sigma}_{x,y,z} are Pauli matrices.

Using there matrices, I formulate Usadel equations as \nabla^2 \hat{\gamma} + 2\nabla\hat{\gamma} \hat{\tilde{N}}\hat{\tilde{\gamma}} \nabla{\hat{\gamma}} = \omega_n \hat{\gamma},\ \nabla^2 \hat{\tilde{\gamma}} - 2\nabla\hat{\tilde{\gamma}} \hat{{N}}\hat{\gamma} \nabla{\hat{\tilde{\gamma}}} = \omega_n \hat{\tilde{\gamma}}.

These equations are non-linear, to solve them I use Picard iterations which go as follows:
\hat{\tilde{N}}^{(n)} = -\left(1 + \hat{\tilde{\gamma}}^{(n)} \hat{\tilde{\gamma}}^{(n)}\right)^{-1}, \qquad \hat{N}^{(n)} = \left(1 + \hat{\gamma}^{(n)}\hat{\tilde{\gamma}}^{(n)}\right)^{-1};\\ 2 \nabla' \hat{\gamma}\hat{\tilde{N}} \hat{\tilde{\gamma}} \nabla\hat{\gamma} \to \frac{1}{3}\left\{ 2 \nabla' \hat{\gamma}^{(n+1)}\hat{\tilde{N}}^{(n)} \hat{\tilde{\gamma}}^{(n)} \nabla\hat{\gamma}^{(n)} + 2 \nabla' \hat{\gamma}^{(n)}\hat{\tilde{N}}^{(n)} \hat{\tilde{\gamma}}^{(n)} \nabla\hat{\gamma}^{(n+1)} + 2 \nabla' \hat{\gamma}^{(n)}\hat{\tilde{N}}^{(n)} \hat{\tilde{\gamma}}^{(n+1)} \nabla\hat{\gamma}^{(n)} \right\},\\ 2 \nabla' \hat{\tilde{\gamma}}\hat{N} \hat{\gamma} \nabla\hat{\tilde{\gamma}} \to \frac{1}{3}\left\{ 2 \nabla' \hat{\tilde{\gamma}}^{(n+1)}\hat{N}^{(n)} \hat{\gamma}^{(n)} \nabla\hat{\tilde{\gamma}}^{(n)} + 2 \nabla' \hat{\tilde{\gamma}}^{(n)}\hat{N}^{(n)} \hat{\gamma}^{(n)} \nabla\hat{\tilde{\gamma}}^{(n+1)} + 2 \nabla' \hat{\tilde{\gamma}}^{(n)}\hat{N}^{(n)} \hat{\gamma}^{(n+1)} \nabla\hat{\tilde{\gamma}}^{(n)} \right\}.
Here n is the number of the current iteration in non-linearity. After I obtain the solution of the full Usadel equations after a few iterations in the ferromagnet, I calculate charge fluxes, one from the left superconductor to the ferromagnet, I_L, the another from the ferromagnet to superconductor, I_R. Superconductors play a role of boundary conditions only.
These currents are calculated as integrals: I \propto \int dS \operatorname{Im}\ \operatorname{Tr}_2\left\{ \left(1 + \hat{\tilde{\gamma}}\hat{\gamma}\right)^{-3}\hat{\tilde{\gamma}}\left[\nabla\hat{{\gamma}} - \hat{{\gamma}}(\nabla \hat{\tilde{\gamma}})\hat{{\gamma}}\right] \right\}

From physical point of view, I have to obtain the relation I_L = I_R. However, for a MixedElement containing linear functions, I didn’t succeeded. As far as I understand, the problem comes from too small degree of Lagrange polynomials. However, the program appears to be too slow, hours. Are there any ideas how to speed the code up?

MWE goes as follows:


from mpi4py import MPI
import ufl
import dolfinx
import dolfinx.fem.petsc as petsc

import numpy as np
import sympy as sp
from sympy.physics import msigma

import time
t_start = time.time()

# dimensionless geometrical sizes
d_F = 20./25.
L_y = 100./25.
w_F = 30./25.
d_S = 50./25.

t = 0.1 # temperature t = T/T_c

N_F_x = 10*int(w_F)
N_F_y = 10*int(L_y)
N_F_z = 10*int(d_F +2*d_S)

print(" The mesh with the dimensions N_F = ", (N_F_x, N_F_y, N_F_z)," is beign created...")

mesh_In_F = dolfinx.mesh.create_box(comm=MPI.COMM_WORLD,
points=((-w_F, L_y/2, -d_S - d_F/2), (0., -L_y/2, d_F/2 + d_S)),
n=(N_F_x, N_F_y, N_F_z),
cell_type=dolfinx.mesh.CellType.tetrahedron)

chi_el = ufl.FiniteElement("CG", mesh_In_F.ufl_cell(), 1)
V = dolfinx.fem.FunctionSpace(mesh_In_F, ufl.MixedElement([chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el]))
print(" Mesh was created. ")

return np.logical_and.reduce((-d_S - d_F/2 <=  p[2], p[2] <= - d_F/2,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

return np.logical_and.reduce((d_F/2 <= p[2], p[2] <= d_F/2 + d_S,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

mesh_In_F.topology.create_connectivity(mesh_In_F.topology.dim-1, mesh_In_F.topology.dim)

SF_R_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_right_lead)
SF_R_facets_mark = np.zeros_like(SF_R_facets) + 1
SF_R_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_R_facets).astype(np.int32), np.array(SF_R_facets_mark).astype(np.int32))

SF_L_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_left_lead)
SF_L_facets_mark = np.zeros_like(SF_L_facets) + 2
SF_L_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_L_facets).astype(np.int32), np.array(SF_L_facets_mark).astype(np.int32))

dofs_L = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_left_lead) for i in range(V.num_sub_spaces)]
dofs_R = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_right_lead) for i in range(V.num_sub_spaces)]

# here iter is the index of the current iteration in non-linearity for the Usadel equation
def getBCs(chi_L, chi_R):
bc_L = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_L[i])), dofs_L[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
bc_R = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_R[i])), dofs_R[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
BCs = [*bc_L, *bc_R]
return BCs

omega_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((2*0 + 1)*t))

Delta_s = 1.
varphi_L = -np.pi/3
varphi_R = np.pi/3
chi_L = [0, 0, np.exp(1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
chi_R = [0, 0, np.exp(1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
BCs = getBCs(chi_L, chi_R)

eta = ufl.TestFunction(V)
chi_TF = ufl.TrialFunction(V)
chi_prev = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi_prev.sub(i).interpolate(lambda x: np.full((x.shape[1],), 0.))
chi = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi.sub(i).interpolate(lambda x: np.full((x.shape[1],), 1. ))

sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
gamma_m = ufl.as_matrix(sigma_basis[0]*chi_prev[0] + sigma_basis[1]*chi_prev[1] +
sigma_basis[2]*chi_prev[2] + sigma_basis[3]*chi_prev[3])
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi_prev[4] + sigma_basis[1]*chi_prev[5] +
sigma_basis[2]*chi_prev[6] + sigma_basis[3]*chi_prev[7])
gamma_mx = ufl.as_matrix(sigma_basis[0]*chi_TF[0] + sigma_basis[1]*chi_TF[1] +
sigma_basis[2]*chi_TF[2] + sigma_basis[3]*chi_TF[3])
gamma_mtx = ufl.as_matrix(sigma_basis[0]*chi_TF[4] + sigma_basis[1]*chi_TF[5] +
sigma_basis[2]*chi_TF[6] + sigma_basis[3]*chi_TF[7])
N_m = ufl.inv(ufl.Identity(2) + gamma_m*gamma_mt)
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

A = +2*( ufl.Dx(gamma_mx, 0)*N_mt*gamma_mt*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_mx, 1)*N_mt*gamma_mt*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_mx, 2)*N_mt*gamma_mt*ufl.Dx(gamma_m, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 2) )/3
B = -2*( ufl.Dx(gamma_mtx, 0)*N_m*gamma_m*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mtx, 1)*N_m*gamma_m*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mtx, 2)*N_m*gamma_m*ufl.Dx(gamma_mt, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_m*ufl.Dx(gamma_mtx, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_m*ufl.Dx(gamma_mtx, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_m*ufl.Dx(gamma_mtx, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_mx*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_mx*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_mx*ufl.Dx(gamma_mt, 2) )/3

def proj_2_2_to_8_1(A, B):
sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
return [ufl.tr(A*sigma_basis[0])/2,
ufl.tr(A*sigma_basis[1])/2,
ufl.tr(A*sigma_basis[2])/2,
ufl.tr(A*sigma_basis[3])/2,
ufl.tr(B*sigma_basis[0])/2,
ufl.tr(B*sigma_basis[1])/2,
ufl.tr(B*sigma_basis[2])/2,
ufl.tr(B*sigma_basis[3])/2]

def get_norm_of_difference(a, b):
comm = a.function_space.mesh.comm
difference =  dolfinx.fem.form((a - b)**2 * ufl.dx)
E = np.abs(np.sqrt(comm.allreduce(dolfinx.fem.assemble_scalar(difference), MPI.SUM)))
return E

def F0_mm_f_ufl(x, omega_n):
return omega_n*ufl.Identity(8)

x = ufl.SpatialCoordinate(mesh_In_F)
ufl.inner(ufl.as_vector(proj_2_2_to_8_1(A, B)), eta) - \
ufl.inner(F0_mm_f_ufl(x, omega_n)*chi_TF, eta)
dx = ufl.Measure("dx", domain=mesh_In_F)
LeftHS_fem_form = dolfinx.fem.form(LeftHS*dx)
I_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((0, 0, 0, 0, 0, 0, 0, 0)))
RightHS_fem_form = dolfinx.fem.form(ufl.inner(I_n, eta)*dx)

omega_n.value = (2*0 + 1)*t
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()

RightHS_v = dolfinx.fem.petsc.create_vector(RightHS_fem_form)

solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.ILU)
solver.setOperators(LeftHS_mmm)

iter = 0
norm_of_difference = get_norm_of_difference(chi, chi_prev)
iter_max = 25
atol = 1e-4
print(" Iterations in non-linearity: ")
while (iter <= iter_max and norm_of_difference >= atol):
with RightHS_v.localForm() as loc_1:
loc_1.set(0)
RightHS_v= dolfinx.fem.petsc.assemble_vector(RightHS_v, RightHS_fem_form)
dolfinx.fem.petsc.apply_lifting( RightHS_v, [LeftHS_fem_form], [BCs])
dolfinx.fem.petsc.set_bc(RightHS_v, BCs)
solver.solve(RightHS_v, chi.vector)
chi.x.scatter_forward()
norm_of_difference = get_norm_of_difference(chi, chi_prev)
chi_prev.x.array[:] = chi.x.array
chi_prev.x.scatter_forward()
iter = iter + 1
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
solver.setOperators(LeftHS_mmm)
print(" Iter = ", iter, " is finishesd with norm_of_difference =", norm_of_difference)

gamma_m = ufl.as_matrix(sigma_basis[0]*chi[0] + sigma_basis[1]*chi[1] +
sigma_basis[2]*chi[2] + sigma_basis[3]*chi[3])
dgamma_mx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[0], 0) + sigma_basis[1]*ufl.Dx(chi[1], 0) +
sigma_basis[2]*ufl.Dx(chi[2], 0) + sigma_basis[3]*ufl.Dx(chi[3], 0))
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi[4] + sigma_basis[1]*chi[5] +
sigma_basis[2]*chi[6] + sigma_basis[3]*chi[7])
dgamma_mtx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[4], 0) + sigma_basis[1]*ufl.Dx(chi[5], 0) +
sigma_basis[2]*ufl.Dx(chi[6], 0) + sigma_basis[3]*ufl.Dx(chi[7], 0))
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_R_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(1)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
print(" currenr from F to right S: ", I)

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_L_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(2)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
print(" currenr from left S to F: ", I)

print(" time used = ", time.time() - t_start)

1 Like

I would suggest you start by timing different parts of your code, so that you can point anyone interested in helping you to the bit that takes time for your application.
This can easily be done with

import time

start = time.perf_counter()
# Do something
end = time.perf_counter()
print(f"NAME_OF_TASK: {end-start:.3e}")

Dear dokken,

Here is the code used for time measuring:

from mpi4py import MPI
import ufl
import dolfinx
import dolfinx.fem.petsc as petsc

import numpy as np
import sympy as sp
from sympy.physics import msigma

import time
t_start = time.time()

# dimensionless geometrical sizes
d_F = 20./25.
L_y = 100./25.
w_F = 30./25.
d_S = 50./25.

t = 0.1 # temperature t = T/T_c

N_F_x = 10*int(w_F)
N_F_y = 10*int(L_y)
N_F_z = 10*int(d_F +2*d_S)

print(" The mesh with the dimensions N_F = ", (N_F_x, N_F_y, N_F_z)," is beign created...")

mesh_In_F = dolfinx.mesh.create_box(comm=MPI.COMM_WORLD,
points=((-w_F, L_y/2, -d_S - d_F/2), (0., -L_y/2, d_F/2 + d_S)),
n=(N_F_x, N_F_y, N_F_z),
cell_type=dolfinx.mesh.CellType.tetrahedron)

chi_el = ufl.FiniteElement("CG", mesh_In_F.ufl_cell(), 1)
V = dolfinx.fem.FunctionSpace(mesh_In_F, ufl.MixedElement([chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el,
chi_el]))
print(" Mesh was created. ")

return np.logical_and.reduce((-d_S - d_F/2 <=  p[2], p[2] <= - d_F/2,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

return np.logical_and.reduce((d_F/2 <= p[2], p[2] <= d_F/2 + d_S,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

mesh_In_F.topology.create_connectivity(mesh_In_F.topology.dim-1, mesh_In_F.topology.dim)

SF_R_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_right_lead)
SF_R_facets_mark = np.zeros_like(SF_R_facets) + 1
SF_R_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_R_facets).astype(np.int32), np.array(SF_R_facets_mark).astype(np.int32))

SF_L_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_left_lead)
SF_L_facets_mark = np.zeros_like(SF_L_facets) + 2
SF_L_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_L_facets).astype(np.int32), np.array(SF_L_facets_mark).astype(np.int32))

dofs_L = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_left_lead) for i in range(V.num_sub_spaces)]
dofs_R = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_right_lead) for i in range(V.num_sub_spaces)]

# here iter is the index of the current iteration in non-linearity for the Usadel equation
def getBCs(chi_L, chi_R):
bc_L = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_L[i])), dofs_L[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
bc_R = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_R[i])), dofs_R[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
BCs = [*bc_L, *bc_R]
return BCs

omega_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((2*0 + 1)*t))

Delta_s = 1.
varphi_L = -np.pi/3
varphi_R = np.pi/3
chi_L = [0, 0, np.exp(1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
chi_R = [0, 0, np.exp(1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
BCs = getBCs(chi_L, chi_R)

eta = ufl.TestFunction(V)
chi_TF = ufl.TrialFunction(V)
chi_prev = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi_prev.sub(i).interpolate(lambda x: np.full((x.shape[1],), 0.))
chi = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi.sub(i).interpolate(lambda x: np.full((x.shape[1],), 1. ))

sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
gamma_m = ufl.as_matrix(sigma_basis[0]*chi_prev[0] + sigma_basis[1]*chi_prev[1] +
sigma_basis[2]*chi_prev[2] + sigma_basis[3]*chi_prev[3])
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi_prev[4] + sigma_basis[1]*chi_prev[5] +
sigma_basis[2]*chi_prev[6] + sigma_basis[3]*chi_prev[7])
gamma_mx = ufl.as_matrix(sigma_basis[0]*chi_TF[0] + sigma_basis[1]*chi_TF[1] +
sigma_basis[2]*chi_TF[2] + sigma_basis[3]*chi_TF[3])
gamma_mtx = ufl.as_matrix(sigma_basis[0]*chi_TF[4] + sigma_basis[1]*chi_TF[5] +
sigma_basis[2]*chi_TF[6] + sigma_basis[3]*chi_TF[7])
N_m = ufl.inv(ufl.Identity(2) + gamma_m*gamma_mt)
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

A = +2*( ufl.Dx(gamma_mx, 0)*N_mt*gamma_mt*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_mx, 1)*N_mt*gamma_mt*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_mx, 2)*N_mt*gamma_mt*ufl.Dx(gamma_m, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 2) )/3
B = -2*( ufl.Dx(gamma_mtx, 0)*N_m*gamma_m*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mtx, 1)*N_m*gamma_m*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mtx, 2)*N_m*gamma_m*ufl.Dx(gamma_mt, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_m*ufl.Dx(gamma_mtx, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_m*ufl.Dx(gamma_mtx, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_m*ufl.Dx(gamma_mtx, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_mx*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_mx*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_mx*ufl.Dx(gamma_mt, 2) )/3

def proj_2_2_to_8_1(A, B):
sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
return [ufl.tr(A*sigma_basis[0])/2,
ufl.tr(A*sigma_basis[1])/2,
ufl.tr(A*sigma_basis[2])/2,
ufl.tr(A*sigma_basis[3])/2,
ufl.tr(B*sigma_basis[0])/2,
ufl.tr(B*sigma_basis[1])/2,
ufl.tr(B*sigma_basis[2])/2,
ufl.tr(B*sigma_basis[3])/2]

def get_norm_of_difference(a, b):
comm = a.function_space.mesh.comm
difference =  dolfinx.fem.form((a - b)**2 * ufl.dx)
E = np.abs(np.sqrt(comm.allreduce(dolfinx.fem.assemble_scalar(difference), MPI.SUM)))
return E

def F0_mm_f_ufl(x, omega_n):
return omega_n*ufl.Identity(8)

t_form = time.time()
print("-> Creating FEM forms...")
x = ufl.SpatialCoordinate(mesh_In_F)
ufl.inner(ufl.as_vector(proj_2_2_to_8_1(A, B)), eta) - \
ufl.inner(F0_mm_f_ufl(x, omega_n)*chi_TF, eta)
dx = ufl.Measure("dx", domain=mesh_In_F)
LeftHS_fem_form = dolfinx.fem.form(LeftHS*dx)
I_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((0, 0, 0, 0, 0, 0, 0, 0)))
RightHS_fem_form = dolfinx.fem.form(ufl.inner(I_n, eta)*dx)
print("-> FEM forms are created with time used = ", time.time() - t_form, "\n")

omega_n.value = (2*0 + 1)*t
t_LHS = time.time()
print("-> First assembling of LHS matrix...")
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
print("-> LHS matrix is assebmled in time = ", time.time() - t_LHS, "\n")

print("-> Creating of RHS vector...")
t_RHS = time.time()
RightHS_v = dolfinx.fem.petsc.create_vector(RightHS_fem_form)
print("-> RHS vector is created in time = ", time.time()  - t_RHS, "\n")

solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.ILU)
solver.setOperators(LeftHS_mmm)

iter = 0
norm_of_difference = get_norm_of_difference(chi, chi_prev)
iter_max = 25
atol = 1e-4
print(" Iterations in non-linearity: ")
while (iter <= iter_max and norm_of_difference >= atol):

t_RHS = time.time()
print("---> Preparing of RHS vector... ")
with RightHS_v.localForm() as loc_1:
loc_1.set(0)
RightHS_v= dolfinx.fem.petsc.assemble_vector(RightHS_v, RightHS_fem_form)
dolfinx.fem.petsc.apply_lifting( RightHS_v, [LeftHS_fem_form], [BCs])
dolfinx.fem.petsc.set_bc(RightHS_v, BCs)
print("---> RHS vector is done in time = ", time.time() - t_RHS,"\n")

t_system = time.time()
print("---> Solving of a linear system...")
solver.solve(RightHS_v, chi.vector)
chi.x.scatter_forward()
print("---> Linear system is solved in time = ", time.time() - t_system,"\n")

norm_of_difference = get_norm_of_difference(chi, chi_prev)
chi_prev.x.array[:] = chi.x.array
chi_prev.x.scatter_forward()
iter = iter + 1

t_LHS = time.time()
print("---> Preparing of LHS Matrix...")
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
solver.setOperators(LeftHS_mmm)
print("---> LHS matrix for a linear system is done in time = ", time.time() - t_LHS,"\n")

print(" Iter = ", iter, " is finishesd with norm_of_difference =", norm_of_difference, "\n")

gamma_m = ufl.as_matrix(sigma_basis[0]*chi[0] + sigma_basis[1]*chi[1] +
sigma_basis[2]*chi[2] + sigma_basis[3]*chi[3])
dgamma_mx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[0], 0) + sigma_basis[1]*ufl.Dx(chi[1], 0) +
sigma_basis[2]*ufl.Dx(chi[2], 0) + sigma_basis[3]*ufl.Dx(chi[3], 0))
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi[4] + sigma_basis[1]*chi[5] +
sigma_basis[2]*chi[6] + sigma_basis[3]*chi[7])
dgamma_mtx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[4], 0) + sigma_basis[1]*ufl.Dx(chi[5], 0) +
sigma_basis[2]*ufl.Dx(chi[6], 0) + sigma_basis[3]*ufl.Dx(chi[7], 0))
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_R_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(1)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
print(" currenr from F to right S: ", I)

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_L_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(2)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
print(" currenr from left S to F: ", I)

print(" time used = ", time.time() - t_start)


Typical output goes as follows:

 The mesh with the dimensions N_F =  (10, 40, 40)  is beign created...
Mesh was created.
-> Creating FEM forms...
-> FEM forms are created with time used =  0.015367507934570312

-> First assembling of LHS matrix...
-> LHS matrix is assebmled in time =  28.22361731529236

-> Creating of RHS vector...
-> RHS vector is created in time =  0.00025773048400878906

Iterations in non-linearity:
---> Preparing of RHS vector...
---> RHS vector is done in time =  2.014235496520996

---> Solving of a linear system...
---> Linear system is solved in time =  2.335974931716919

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.87514328956604

Iter =  1  is finishesd with norm_of_difference = 4.833472627134693

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.0137710571289062

---> Solving of a linear system...
---> Linear system is solved in time =  2.6611082553863525

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.853037357330322

Iter =  2  is finishesd with norm_of_difference = 0.8245152167148219

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.000584840774536

---> Solving of a linear system...
---> Linear system is solved in time =  2.728177547454834

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.922567129135132

Iter =  3  is finishesd with norm_of_difference = 0.22354354140679672

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.0153162479400635

---> Solving of a linear system...
---> Linear system is solved in time =  2.689244270324707

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.91971254348755

Iter =  4  is finishesd with norm_of_difference = 0.08945213933669799

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.0090768337249756

---> Solving of a linear system...
---> Linear system is solved in time =  2.6780736446380615

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.809255599975586

Iter =  5  is finishesd with norm_of_difference = 0.03567672364057387

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.03198504447937

---> Solving of a linear system...
---> Linear system is solved in time =  2.6629178524017334

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.971071243286133

Iter =  6  is finishesd with norm_of_difference = 0.015752475846741037

---> Preparing of RHS vector...
---> RHS vector is done in time =  1.9841434955596924

---> Solving of a linear system...
---> Linear system is solved in time =  2.7339539527893066

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.765327215194702

Iter =  7  is finishesd with norm_of_difference = 0.00669283384055061

---> Preparing of RHS vector...
---> RHS vector is done in time =  1.9946134090423584

---> Solving of a linear system...
---> Linear system is solved in time =  2.6735894680023193

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  28.382792949676514

Iter =  8  is finishesd with norm_of_difference = 0.0029584609449828563

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.0215728282928467

---> Solving of a linear system...
---> Linear system is solved in time =  2.6708736419677734

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  29.341240406036377

Iter =  9  is finishesd with norm_of_difference = 0.001283145212423076

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.050027847290039

---> Solving of a linear system...
---> Linear system is solved in time =  2.7267017364501953

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  28.719499111175537

Iter =  10  is finishesd with norm_of_difference = 0.0005647141536477506

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.0711829662323

---> Solving of a linear system...
---> Linear system is solved in time =  2.8433594703674316

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  28.872745275497437

Iter =  11  is finishesd with norm_of_difference = 0.000246611795292919

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.078775644302368

---> Solving of a linear system...
---> Linear system is solved in time =  2.7357213497161865

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  27.885119199752808

Iter =  12  is finishesd with norm_of_difference = 0.00010827009990300962

---> Preparing of RHS vector...
---> RHS vector is done in time =  2.052931308746338

---> Solving of a linear system...
---> Linear system is solved in time =  2.7425034046173096

---> Preparing of LHS Matrix...
---> LHS matrix for a linear system is done in time =  28.368104457855225

Iter =  13  is finishesd with norm_of_difference = 4.739143892036076e-05

currenr from F to right S:  2.448588273869488
currenr from left S to F:  -1.966393124881869
time used =  458.40062141418457


As you see, the most time consuming operation is assembling the left-hand-side matrix of the system.

Dear dokken,

I am sorry to bother you again, but is there any way to speed up the assembling the matrix of my system?

I recommend you systematically simplify (or the other direction, build up) your problem until you notice performance becoming unacceptable for your application. This will narrow down the component which requires optimization. The developers of FEniCS would be interested to know if you can report poor performance in a specific component of the project; however, it is unlikely we are motivated to debug the entirety of your project on your behalf.

Sorry.

As far as I understand, the equations cannot be futher simplified. The problem comes from the fact that there are the matrices \hat{N}, \hat{\tilde{N}} which are defined as some inverse matrices, \hat{N} = (1 + \hat{\gamma}\hat{\tilde{\gamma}})^{-1} and \hat{\tilde{N}} = -(1 + \hat{\tilde{\gamma}}\hat{\gamma})^{-1} from 2 per 2 matrices \hat{\gamma}, \hat{\tilde{\gamma}} with all non-zero coefficients.
The equations describing \hat{N} and \hat{\tilde{N}} are very long.

Therefore, may be more correct question is how can I use mpi for parallel asembling of the whole system matrix, particulary the last of those two lines:

 LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()


with some lenghty LeftHS_fem_form?

Would you use, for example, jit flags for such an optimization?

Are you certain the time is spent assembling the matrix? Or is the time spent checking the JIT cache and/or JIT compilation of your formulation?

Dear nate,

I solve a non-linear equation replacing it with the set of its linear approxmiations. So, I have a loop over these approximations.

In every step of the loop I do the following code:

print("---> Preparing of LHS Matrix...")
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
solver.setOperators(LeftHS_mmm)
print("---> LHS matrix for a linear system is done in time = ", time.time() - t_LHS,"\n")


This part takes approximately 30 seconds for linear Lagrange polynomials. This time holds for the first iteration too.

The object LeftHS_fem_form is defined outside of the loop. It depends on the solution from the previous step, which is chi_prev in the examplse above. That’s why I reassemble its matrix every step.

From this point of view, I would say that the time is spent for assembling the matrix.

There are a few things you can do:

1. Change from mixedelement to vectorelement
chi_el = ufl.VectorElement("Lagrange", mesh_In_F.ufl_cell(), 1, dim=8)
V = dolfinx.fem.FunctionSpace(mesh_In_F, chi_el)

1. Reuse the matrix
print("---> Preparing of LHS Matrix...")
LeftHS_mmm.zeroEntries()
dolfinx.fem.petsc.assemble_matrix(LeftHS_mmm, LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()

print("---> LHS matrix for a linear system is done in time = ",
time.time() - t_LHS, "\n")

1 Like

Dear dokken,

So, I don’t need to reasseble this matrix for every step even if it depends on the previous solution?

Thank you very much, I am going to try it!

chi_el = ufl.VectorElement("Lagrange", mesh_In_F.ufl_cell(), 1, dim=8)
V = dolfinx.fem.FunctionSpace(mesh_In_F, chi_el)


You do need to reassemble it, but you do not need to recreate the matrix (it should save you some time).

Further, you can optimize the code for assembly by calling:

LeftHS_fem_form = dolfinx.fem.form(
LeftHS*dx, jit_options={"cffi_extra_compile_args": ["-O3", "-march=native"], "cffi_libraries": ["m"]})


which reduces the runtime a bit further.

Secondly, you can run your code over multiple processors (however, you would have to use something else than ILU for that).

You can use Hypre PILUT.


solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.HYPRE)
solver.getPC().setHYPREType("pilut")
solver.setOperators(LeftHS_mmm)


For parallel execution, you also need:

    RightHS_v = dolfinx.fem.petsc.assemble_vector(RightHS_v, RightHS_fem_form)
dolfinx.fem.petsc.apply_lifting(RightHS_v, [LeftHS_fem_form], [BCs])
mode=petsc.PETSc.ScatterMode.REVERSE)
dolfinx.fem.petsc.set_bc(RightHS_v, BCs)
mode=petsc.PETSc.ScatterMode.FORWARD)


i.e. a last scatter forward for the rhs.

The current settings reduced the assembly time to 13 seconds with two processes.

With 4 processes it is reduced to 7 seconds.

3 Likes

Dear dokken,

thank you very much!

When I tried this preconditioner, then its solution does not coince with that one which I obtained using 1 process with ILU.

solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.HYPRE)
solver.getPC().setHYPREType("pilut")
solver.setOperators(LeftHS_mmm)


That’s why time decreased from 270 seconds to 30 seconds. Are there any onther preconditioners which may be usefull here?

I tried with

solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.JACOBI)
solver.setOperators(LeftHS_mmm)


This preconditioner gives 300 s on 1 process and 145 for 4 processes.

Please provide the full code that you are currently running, as you might be missing some communication steps for parallel computing.

Note that if you are bound to serial solvers, the assembly time cannot be reduced further than a given point. The form you are considering is rather complex, and requires a lot of operations for assembly. The easiest way to reduce the runtime is to make your code parallel compatible.

Dear dokken,

Here it is the current program with your suggestions:

from mpi4py import MPI
comm=MPI.COMM_WORLD

import ufl
import dolfinx
import dolfinx.fem.petsc as petsc

import numpy as np
import sympy as sp
from sympy.physics import msigma

import time
t_start = time.time()

def to_log(s):
if comm.rank == 0:
print(s)

# dimensionless geometrical sizes
d_F = 20./25.
L_y = 100./25.
w_F = 30./25.
d_S = 50./25.

t = 0.1 # temperature t = T/T_c

N_F_x = 10*int(w_F)
N_F_y = 10*int(L_y)
N_F_z = 10*int(d_F +2*d_S)

to_log(" The mesh with the dimensions N_F = " + str(N_F_x) + " " + str(N_F_y) + " "+  str(N_F_z) + " is beign created...")

mesh_In_F = dolfinx.mesh.create_box(comm,
points=((-w_F, L_y/2, -d_S - d_F/2), (0., -L_y/2, d_F/2 + d_S)),
n=(N_F_x, N_F_y, N_F_z),
cell_type=dolfinx.mesh.CellType.tetrahedron)

chi_el = ufl.VectorElement("Lagrange", mesh_In_F.ufl_cell(), 1, dim=8)
V = dolfinx.fem.FunctionSpace(mesh_In_F, chi_el)
to_log(" Mesh was created. ")

return np.logical_and.reduce((-d_S - d_F/2 <=  p[2], p[2] <= - d_F/2,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

return np.logical_and.reduce((d_F/2 <= p[2], p[2] <= d_F/2 + d_S,  - L_y/2 <= p[1], p[1] <= L_y/2, np.isclose(p[0], 0.) ))

mesh_In_F.topology.create_connectivity(mesh_In_F.topology.dim-1, mesh_In_F.topology.dim)

SF_R_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_right_lead)
SF_R_facets_mark = np.zeros_like(SF_R_facets) + 1
SF_R_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_R_facets).astype(np.int32), np.array(SF_R_facets_mark).astype(np.int32))

SF_L_facets = dolfinx.mesh.locate_entities_boundary(mesh_In_F, mesh_In_F.topology.dim - 1, is_left_lead)
SF_L_facets_mark = np.zeros_like(SF_L_facets) + 2
SF_L_subdomain_data = dolfinx.mesh.meshtags(mesh_In_F, mesh_In_F.topology.dim-1, np.array(
SF_L_facets).astype(np.int32), np.array(SF_L_facets_mark).astype(np.int32))

dofs_L = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_left_lead) for i in range(V.num_sub_spaces)]
dofs_R = [dolfinx.fem.locate_dofs_geometrical((V.sub(i), V.sub(i).collapse()[0]), is_right_lead) for i in range(V.num_sub_spaces)]

# here iter is the index of the current iteration in non-linearity for the Usadel equation
def getBCs(chi_L, chi_R):
bc_L = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_L[i])), dofs_L[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
bc_R = [dolfinx.fem.dirichletbc(dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType(chi_R[i])), dofs_R[i][0], V.sub(i)) for i in range(V.num_sub_spaces)]
BCs = [*bc_L, *bc_R]
return BCs

omega_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((2*0 + 1)*t))

Delta_s = 1.
varphi_L = -np.pi/3
varphi_R = np.pi/3
chi_L = [0, 0, np.exp(1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_L)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
chi_R = [0, 0, np.exp(1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0,
0, 0, np.exp(-1j*varphi_R)*(omega_n.value/Delta_s - np.sqrt(omega_n.value**2/Delta_s**2 + 1.)), 0]
BCs = getBCs(chi_L, chi_R)

eta = ufl.TestFunction(V)
chi_TF = ufl.TrialFunction(V)
chi_prev = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi_prev.sub(i).interpolate(lambda x: np.full((x.shape[1],), 1.))
chi = dolfinx.fem.Function(V, dtype=np.complex128)
for i in range(V.num_sub_spaces):
chi.sub(i).interpolate(lambda x: np.full((x.shape[1],), 0. ))

sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
gamma_m = ufl.as_matrix(sigma_basis[0]*chi_prev[0] + sigma_basis[1]*chi_prev[1] +
sigma_basis[2]*chi_prev[2] + sigma_basis[3]*chi_prev[3])
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi_prev[4] + sigma_basis[1]*chi_prev[5] +
sigma_basis[2]*chi_prev[6] + sigma_basis[3]*chi_prev[7])
gamma_mx = ufl.as_matrix(sigma_basis[0]*chi_TF[0] + sigma_basis[1]*chi_TF[1] +
sigma_basis[2]*chi_TF[2] + sigma_basis[3]*chi_TF[3])
gamma_mtx = ufl.as_matrix(sigma_basis[0]*chi_TF[4] + sigma_basis[1]*chi_TF[5] +
sigma_basis[2]*chi_TF[6] + sigma_basis[3]*chi_TF[7])
N_m = ufl.inv(ufl.Identity(2) + gamma_m*gamma_mt)
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

A = +2*( ufl.Dx(gamma_mx, 0)*N_mt*gamma_mt*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_mx, 1)*N_mt*gamma_mt*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_mx, 2)*N_mt*gamma_mt*ufl.Dx(gamma_m, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mt*ufl.Dx(gamma_mx, 2) +\
ufl.Dx(gamma_m, 0)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 0) +\
ufl.Dx(gamma_m, 1)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 1) +\
ufl.Dx(gamma_m, 2)*N_mt*gamma_mtx*ufl.Dx(gamma_m, 2) )/3
B = -2*( ufl.Dx(gamma_mtx, 0)*N_m*gamma_m*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mtx, 1)*N_m*gamma_m*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mtx, 2)*N_m*gamma_m*ufl.Dx(gamma_mt, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_m*ufl.Dx(gamma_mtx, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_m*ufl.Dx(gamma_mtx, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_m*ufl.Dx(gamma_mtx, 2) +\
ufl.Dx(gamma_mt, 0)*N_m*gamma_mx*ufl.Dx(gamma_mt, 0) +\
ufl.Dx(gamma_mt, 1)*N_m*gamma_mx*ufl.Dx(gamma_mt, 1) +\
ufl.Dx(gamma_mt, 2)*N_m*gamma_mx*ufl.Dx(gamma_mt, 2) )/3

def proj_2_2_to_8_1(A, B):
sigma_basis_p = [sp.eye(2), msigma(1), msigma(2), msigma(3)]
sigma_basis = [ufl.as_matrix(np.array(sigma_basis_p[i], dtype=np.complex128).tolist()) for i in range(len(sigma_basis_p)) ]
return [ufl.tr(A*sigma_basis[0])/2,
ufl.tr(A*sigma_basis[1])/2,
ufl.tr(A*sigma_basis[2])/2,
ufl.tr(A*sigma_basis[3])/2,
ufl.tr(B*sigma_basis[0])/2,
ufl.tr(B*sigma_basis[1])/2,
ufl.tr(B*sigma_basis[2])/2,
ufl.tr(B*sigma_basis[3])/2]

def get_norm_of_difference(a, b):
comm = a.function_space.mesh.comm
difference =  dolfinx.fem.form((a - b)**2 * ufl.dx)
E = np.abs(np.sqrt(comm.allreduce(dolfinx.fem.assemble_scalar(difference), MPI.SUM)))
return E

def F0_mm_f_ufl(x, omega_n):
return omega_n*ufl.Identity(8)

t_form = time.time()
to_log("-> Creating FEM forms...")
x = ufl.SpatialCoordinate(mesh_In_F)
ufl.inner(ufl.as_vector(proj_2_2_to_8_1(A, B)), eta) - \
ufl.inner(F0_mm_f_ufl(x, omega_n)*chi_TF, eta)
dx = ufl.Measure("dx", domain=mesh_In_F)
LeftHS_fem_form = dolfinx.fem.form(LeftHS*dx , jit_options={"cffi_extra_compile_args": ["-O3", "-march=native"], "cffi_libraries": ["m"]})
I_n = dolfinx.fem.Constant(mesh_In_F, petsc.PETSc.ScalarType((0, 0, 0, 0, 0, 0, 0, 0)))
RightHS_fem_form = dolfinx.fem.form(ufl.inner(I_n, eta)*dx)
to_log("-> FEM forms are created with time used = " + str( time.time() - t_form) + "\n")

omega_n.value = (2*0 + 1)*t
t_LHS = time.time()
to_log("-> First assembling of LHS matrix...")
LeftHS_mmm = dolfinx.fem.petsc.assemble_matrix(LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
to_log("-> LHS matrix is assebmled in time = " + str(time.time() - t_LHS) + "\n")

to_log("-> Creating of RHS vector...")
t_RHS = time.time()
RightHS_v = dolfinx.fem.petsc.create_vector(RightHS_fem_form)
to_log("-> RHS vector is created in time = " + str( time.time()  - t_RHS) + "\n")

#solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
#solver.setType(petsc.PETSc.KSP.Type.GMRES)
#solver.getPC().setType(petsc.PETSc.PC.Type.JACOBI)
#solver.setOperators(LeftHS_mmm)

solver = petsc.PETSc.KSP().create(mesh_In_F.comm)
solver.setType(petsc.PETSc.KSP.Type.GMRES)
solver.getPC().setType(petsc.PETSc.PC.Type.HYPRE)
solver.getPC().setHYPREType("pilut")
solver.setOperators(LeftHS_mmm)

iter = 0
norm_of_difference = get_norm_of_difference(chi, chi_prev)
iter_max = 25
atol = 1e-4
to_log(" Iterations in non-linearity: ")
while (iter <= iter_max and norm_of_difference >= atol):

t_RHS = time.time()
to_log("---> Preparing of RHS vector... ")
with RightHS_v.localForm() as loc_1:
loc_1.set(0)
RightHS_v = dolfinx.fem.petsc.assemble_vector(RightHS_v, RightHS_fem_form)
dolfinx.fem.petsc.apply_lifting(RightHS_v, [LeftHS_fem_form], [BCs])
mode=petsc.PETSc.ScatterMode.REVERSE)
dolfinx.fem.petsc.set_bc(RightHS_v, BCs)
mode=petsc.PETSc.ScatterMode.FORWARD)

to_log("---> RHS vector is done in time = " + str( time.time() - t_RHS) + "\n")

t_system = time.time()
to_log("---> Solving of a linear system...")
solver.solve(RightHS_v, chi.vector)
to_log(f"Convergence reason: { solver.getConvergedReason()}")
chi.x.scatter_forward()
to_log("---> Linear system is solved in time = " + str( time.time() - t_system) + "\n")

norm_of_difference = get_norm_of_difference(chi, chi_prev)
chi_prev.x.array[:] = chi.x.array
chi_prev.x.scatter_forward()
iter = iter + 1

t_LHS = time.time()
to_log("---> Preparing of LHS Matrix...")
LeftHS_mmm.zeroEntries()
dolfinx.fem.petsc.assemble_matrix(LeftHS_mmm, LeftHS_fem_form, bcs=BCs)
LeftHS_mmm.assemble()
to_log("---> LHS matrix for a linear system is done in time = "+
str(time.time() - t_LHS) + "\n")

to_log(" Iter = " + str(iter) + " is finishesd with norm_of_difference =" + str( norm_of_difference) + "\n")

gamma_m = ufl.as_matrix(sigma_basis[0]*chi[0] + sigma_basis[1]*chi[1] +
sigma_basis[2]*chi[2] + sigma_basis[3]*chi[3])
dgamma_mx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[0], 0) + sigma_basis[1]*ufl.Dx(chi[1], 0) +
sigma_basis[2]*ufl.Dx(chi[2], 0) + sigma_basis[3]*ufl.Dx(chi[3], 0))
gamma_mt = ufl.as_matrix(sigma_basis[0]*chi[4] + sigma_basis[1]*chi[5] +
sigma_basis[2]*chi[6] + sigma_basis[3]*chi[7])
dgamma_mtx = ufl.as_matrix(sigma_basis[0]*ufl.Dx(chi[4], 0) + sigma_basis[1]*ufl.Dx(chi[5], 0) +
sigma_basis[2]*ufl.Dx(chi[6], 0) + sigma_basis[3]*ufl.Dx(chi[7], 0))
N_mt = - ufl.inv(ufl.Identity(2) + gamma_mt*gamma_m)

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_R_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(1)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
to_log(" currenr from F to right S: " + str(I) )

ds = ufl.Measure("ds", domain=mesh_In_F, subdomain_data=SF_L_subdomain_data)
j_x = dolfinx.fem.form( 4*ufl.pi*ufl.tr(-N_mt*N_mt*N_mt*gamma_mt*( dgamma_mx - gamma_m*dgamma_mtx*gamma_m) )*ds(2)  )
comm = chi.function_space.mesh.comm
I = t*np.imag(comm.allreduce(dolfinx.fem.assemble_scalar(j_x), MPI.SUM))
to_log(" currenr from left S to F: " + str(I))

to_log(" time used = " + str(time.time() - t_start))


So, instead of HYPRE pilut I tried JACOBI. It works in 193 seconds with 2 processes. Also it gives reasonable values for the currents. The variant with HYPRE works in 43 seconds and gives zeros instead of currents. As I understand from the table Summary of Sparse Linear Solvers Available In PETSc — PETSc 3.20.4 documentation , the reason HYPRE does not converge stems from the complex numbers in my system. Then only way I see is to try multiprocessing with different preconditioners, but may be there is some trivial knowledge that I do not have about which preconditioner one should use?

I forgot that Hypre doesn’t do complex numbers
I am not very familiar with the system of equations that you are solving, so I dont know a preconditioner of the top of my head.

The current timings that you are mentioning, are they for the whole code or for the individual solve step?
Because if we look back at what you had earlier, you spent 28 seconds per assembly per time step and 450 seconds in total.

Dear dokken,

You help was very appreciated.

are they for the whole code or for the individual solve step?

No, now I am speaking about the whole code. 28 seconds per assembly were reduced to approximately 15 seconds.

But it is still a question for me, how to deal with degree of polynomials higher than 1. I think that there is no way except for multiprocessing and a server with lots of RAM. For degree=2, matrix assembly takes more than 3-4 minutes. I cannot say more precisely because I was mainly focused on the case of linear functions.

With the current formulation, I agree that you would need to use MPI to solve higher order problems.