Hey there!
I have to solve a complex problem from dynamic thermoelasticity. A corresponding toy problem would be the solution of the wave equation (with Dirichlet boundary values and a time-dependent force term):
import dolfin as df
import numpy as np
T=0.5
dt=0.01
nx=ny=1000
f=df.Expression('x[0]<=0.1 && x[1]<=0.1 ? fmin(10*t,1) : 0',t=0,degree=0)
mesh=df.UnitSquareMesh(nx,ny)
V=df.FunctionSpace(mesh,'CG',2)
bc=df.DirichletBC(V,df.Constant(0),df.CompiledSubDomain('on_boundary'))
u_m=df.Function(V)
u_mm=df.Function(V)
u=df.TrialFunction(V)
v=df.TestFunction(V)
a=u*v*df.dx+dt**2/4*df.dot(df.grad(u),df.grad(v))*df.dx
L=(2*u_m-u_mm)*v*df.dx-dt**2/4*df.dot(df.grad(2*u_m+u_mm),df.grad(v))*df.dx+dt**2*f*v*df.dx
u=df.Function(V)
t=0
N=np.int64(T/dt)
A,b=df.assemble_system(a,L,bc)
for l in range(N):
t+=dt
b=df.assemble(L)
bc.apply(b)
df.solve(A,u.vector(),b,'gmres','icc')
u_mm.assign(u_m)
u_m.assign(u)
f.t=t
In my original problem assembling the right hand side takes roughly the same amount of time as solving the linear system. Currently, the code practically runs on a single core, and this is just too time consuming. Hence, I would like to benefit from those 16 cores of my machine.
Now, replacing the solve command line by
df.solve(A,u.vector(),b,'mumps')
allows to run the script via mpirun. This reduces the time for assembling by a factor of 10, but solving becomes slower by a factor of 2.
Hence, my question is: Is it possible to benefit from multicore processing when solving a linear system, and if not, is it possible to restrict parallelization to the assembling process?