Where to put control(u)

Hi I am solving the inverse problem using fenics, I have a question about what would be a proper way of setting this up. Do you know if I should put control=Control(u) before define J and forward(), or after those?

#control parameter u, which is also an initial state for the dynamic model
u = Function(V)
u.vector()[:] = 0
#run forward simulation, "storage" store the intermediate field value that will be use to calculate objective function 
forward(u,storage)

#objective function and run optimization
j = Assemble(inner(storage[''u_final"],storage["u_final"])*dx)
Jhat = ReducedFunctional(J, control)
problem = MinimizationProblem(Jhat)
solver = IPOPTSolver(problem, parameters=parameters)
u_opt = solver.solve()

You only need to define control = Control(parameter) before you create your reduced functional. It does not matter if the control is defined before or after the forward solve or definition of J

Hi Dokken, thanks for the help! In my case, it seems that if I place control = Control(parameter) at different places leads to very different optimization results.

I think the thing might be that I update the parameter in the forward function. Do you think it would matter in that case?

It should not matter. Please make a minimal example that illustrates the differences you observe

I am sorry that it’s hard to come up with a minimal example in my case, I try to attach the code here and make it clear:

The forward model uses Chorin’s projection method to solve the NS equation:

def forward(u0,V,total_steps,objective,storage,points,mask,mesh,loss_fn):

    T = 10   # final time
    num_steps = 10000    # number of time steps
    dt = T / num_steps # time step size
    mu = 0.001
    nu = mu
    rho = 1            # density

    inflow   = 'near(x[0], -1)'
    outflow  = 'near(x[0], 4)'
    walls    = 'near(x[1], -0.5) || near(x[1], 0.5)'
    cylinder = 'on_boundary && not(near(x[0], -1)) && not(near(x[0], 4)) && not(near(x[1], -0.5)) && not(near(x[1], 0.5))'
    
    Q = FunctionSpace(mesh, "Lagrange", 1)
    V1 = FunctionSpace(mesh, "Lagrange", 1)
    
    # Define inflow profile
    inflow_profile = ('1', '0')

    # Define boundary conditions
    bcu_inflow = DirichletBC(V, Expression(inflow_profile, degree=2), inflow)
    bcu_walls = DirichletBC(V, Constant((0, 0)), walls)
    bcu_cylinder = DirichletBC(V, Constant((0, 0)), cylinder)

    bcp_outflow = DirichletBC(Q, Constant(0), outflow)
    bcu = [bcu_inflow,bcu_walls, bcu_cylinder]
    bcp = [bcp_outflow]

    # Define trial and test functions
    u = TrialFunction(V)
    v = TestFunction(V)
    p = TrialFunction(Q)
    q = TestFunction(Q)

    # Define functions for solutions at previous and current time steps
    u1 = Function(V)
    p1 = Function(Q)


    # Define expressions used in variational forms
    n  = FacetNormal(mesh)
    f  = Constant((0, 0))
    k  = Constant(dt)
    nu = Constant(nu)
    rho = Constant(rho)

    # Define variational problem for step 1
    F1 = inner(u - u0, v)*dx + k*inner(grad(u0)*u0, v)*dx + \
         k*nu*inner(grad(u), grad(v))*dx - inner(f, v)*dx
    a1 = lhs(F1)
    L1 = rhs(F1)

    # Define variational problem for step 2
    a2 = inner(grad(p), grad(q))*dx
    L2 = -(1/k)*div(u1)*q*dx

    # Define variational problem for step 3
    a3 = inner(u, v)*dx
    L3 = inner(u1, v)*dx - k*inner(grad(p1), v)*dx

    # Assemble matrices
    A1 = assemble(a1)
    A2 = assemble(a2)
    A3 = assemble(a3)

    # Use amg preconditioner if available
    prec = "amg" if has_krylov_solver_preconditioner("amg") else "default"

    # Use nonzero guesses - essential for CG with non-symmetric BC
    parameters['krylov_solver']['nonzero_initial_guess'] = True

    # Time-stepping
    t = 0
    for n in range(total_steps):

        # Update current time
        t += dt

        # Step 1: Tentative velocity step
        b1 = assemble(L1)
        [bc.apply(A1, b1) for bc in bcu]
        solve(A1, u1.vector(), b1, 'gmres', 'default')

        # Step 2: Pressure correction step
        b2 = assemble(L2)
        [bc.apply(A2, b2) for bc in bcp]
        [bc.apply(p1.vector()) for bc in bcp]
        solve(A2, p1.vector(), b2, 'gmres', 'default')

        # Step 3: Velocity correction step
        b3 = assemble(L3)
        [bc.apply(A3, b3) for bc in bcu]
        solve(A3, u1.vector(), b3, 'gmres', 'default')
        # Plot solution

        # Update previous solution
        u0.assign(u1)
        if n%10==0:
            d_observe = Function(V, name="data")
            d_observe.vector()[:] = storage['observation'][n+1,:]
            if objective is not None:
                objective(u0, d_observe, n,points,loss_fn=loss_fn)
   

I have this storage function and objective function, but it can be ignored, attached here in case is needed:


def objective_point_wise(storage, u=None, d_observe=None,idx=None,points=None,finalize=False,obversation_steps=None):
    if finalize:
        count = 0
        for idx in range(obversation_steps.shape[0]):
            step = obversation_steps[idx]
            if idx==0:
                loss = storage['functional'][step]
                count = count + 1
            else:
                if step>=len(storage['functional']):
                    continue
                else:
                    loss += storage['functional'][step]
                    count = count + 1
        storage["loss"] = loss/count
        return loss/count
    if idx==0:
        storage["functional"] = []
        storage["estimate_field"]=[]
        storage["gt_observation"]=[]
        storage["gt_field"]=[]
        storage["loss"] = 0
        loss = 0
    observation = [d_observe(p) for p in points]
    prediction = [u(p) for p in points]
    storage["functional"].append(sum([(observation[i][0]-prediction[i][0])**2+(observation[i][1]-prediction[i][1])**2 for i in range(len(points))])/len(points))

Then the optimization goes like this:

storage = {"functional": [],'solution':solution,'observation':observation, "time": 0, "estimate_field": [], "gt_observation": [],'gt_field':[],'loss':0}
obj = lambda u, d, idx,observed_coords,loss_fn, Q: objective_point_wise(storage,u, d, idx,observed_coords,loss_fn, Q)
observation_setup = {"start_time":500,"timestep":10,"endtime":580}
obversation_steps = np.arange(observation_setup["endtime"]-observation_setup["start_time"])[0:observation_setup["endtime"]-observation_setup["start_time"]:observation_setup["timestep"]]
obversation_steps = np.floor(obversation_steps/observation_setup["timestep"]).astype(np.int)


# initialize u at 0
V = VectorFunctionSpace(mesh, 'P', 2)
u= Function(V)
u.vector()[:] = 0

forward(u,V=V,total_steps=observation_setup["endtime"]-observation_setup["start_time"],
            objective=obj,storage=storage,points=observed_coords,mask=mask,mesh=mesh,loss_fn=loss_fn)
control = Control(u)

J = objective_point_wise(storage, finalize=True,obversation_steps=obversation_steps)
print("J:{}".format(J))
Jhat = ReducedFunctional(J, control, eval_cb_post=eval_cb)
problem = MinimizationProblem(Jhat)
parameters = {'maximum_iterations': 100}

solver = IPOPTSolver(problem, parameters=parameters)
rho_opt = solver.solve()

Depending on where I put control = Control(u), I get different results. I am also wondering if it’s the correct way for doing this initial state optimization task.

I cannot really help you if I cannot execute the code.
There is always ways of making minimal examples, start by solving our problem on a built in mesh,where you create some dummy data for your observations.
Then solve for only one time step, instead of multiple steps.

You could even check that your functional is working using a coupled Stokes solver instead of a Chorin scheme. All these things would vastly simplify your code.

I also note that you are using an unstable finite element pair, P1-P1.

1 Like

I came up with a minimal example! Here is the code. If I place control = Control(u_init) before or after forward, I got the different optimization results.


from dolfin import *
from dolfin_adjoint import *
import numpy as np
def forward(u_init,V,total_steps,objective,storage,points,mesh,timesteps=1,store=False):

    dt = 0.001
    mu = 0.1
    nu = mu
    rho = 1            # density

    inflow   = 'near(x[0], 0)'
    outflow  = 'near(x[0], 1)'
    walls    = 'near(x[1], 0) || near(x[1],1)'

    Q = FunctionSpace(mesh, "Lagrange", 1)
    
    # Define inflow profile
    inflow_profile = ('1', '0')

    # Define boundary conditions
    bcu_inflow = DirichletBC(V, Expression(inflow_profile, degree=2), inflow)
    bcu_walls = DirichletBC(V, Constant((0, 0)), walls)

    bcp_outflow = DirichletBC(Q, Constant(0), outflow)
    bcu = [bcu_inflow,bcu_walls]
    bcp = [bcp_outflow]

    # Define trial and test functions
    u = TrialFunction(V)
    v = TestFunction(V)
    p = TrialFunction(Q)
    q = TestFunction(Q)

    # Define functions for solutions at previous and current time steps
    u0 = u_init
    u1 = Function(V)
    p1 = Function(Q)


    # Define expressions used in variational forms
    n  = FacetNormal(mesh)
    f  = Constant((0, 0))
    k  = Constant(dt)
    nu = Constant(nu)
    rho = Constant(rho)

    # Define variational problem for step 1
    F1 = inner(u - u0, v)*dx + k*inner(grad(u0)*u0, v)*dx + \
         k*nu*inner(grad(u), grad(v))*dx - inner(f, v)*dx
    a1 = lhs(F1)
    L1 = rhs(F1)

    # Define variational problem for step 2
    a2 = inner(grad(p), grad(q))*dx
    L2 = -(1/k)*div(u1)*q*dx

    # Define variational problem for step 3
    a3 = inner(u, v)*dx
    L3 = inner(u1, v)*dx - k*inner(grad(p1), v)*dx

    # Assemble matrices
    A1 = assemble(a1)
    A2 = assemble(a2)
    A3 = assemble(a3)

    # Use nonzero guesses - essential for CG with non-symmetric BC
    parameters['krylov_solver']['nonzero_initial_guess'] = True

    # Time-stepping
    t = 0
    u_solution_dof = np.zeros([total_steps+1,int(V.dim())])
    if store:
        u_solution = np.zeros([total_steps+1,2, Q.dim()])
        u_solution_dof = np.zeros([total_steps+1,int(V.dim())])
        flux_x, flux_y = u1.split(deepcopy=True)
        u_solution[0,0, :] = flux_x.compute_vertex_values()
        u_solution[0,1, :] = flux_y.compute_vertex_values()
        u_solution_dof[0,:] = u1.vector().get_local()
        p_solution = np.zeros([total_steps+1, Q.dim()])
        p_solution[0, :] = p1.compute_vertex_values()
        
    for n in range(total_steps):
        # Update current time
        t += dt

        # Step 1: Tentative velocity step
        b1 = assemble(L1)
        [bc.apply(A1, b1) for bc in bcu]
        solve(A1, u1.vector(), b1, 'gmres', 'default')

        # Step 2: Pressure correction step
        b2 = assemble(L2)
        [bc.apply(A2, b2) for bc in bcp]
        [bc.apply(p1.vector()) for bc in bcp]
        solve(A2, p1.vector(), b2, 'gmres', 'default')

        # Step 3: Velocity correction step
        b3 = assemble(L3)
        [bc.apply(A3, b3) for bc in bcu]
        solve(A3, u1.vector(), b3, 'gmres', 'default')
        # Plot solution
        
        # Update previous solution
        u0.assign(u1)
        if not store:
            if n%timesteps==0 and (n!=0 or total_steps==1):
                print("save:{} timestep".format(n))
                d_observe = Function(V, name="data")
                d_observe.vector()[:] = storage['observation'][n+1,:]
                if objective is not None:
                    objective(u0, d_observe, n,points)

        if store:
            flux_x, flux_y = u1.split(deepcopy=True)
            u_solution[n+1,0, :] = flux_x.compute_vertex_values()
            u_solution[n+1,1, :] = flux_y.compute_vertex_values()     
            u_solution_dof[n+1, :] = u1.vector().get_local()
            p_solution[n+1,:] = p1.compute_vertex_values()
    
    return u_solution_dof



def objective_point_wise(storage, u=None, d_observe=None,idx=None,points=None,finalize=False):
    if finalize:
        for loss in storage['functional']:
            storage["loss"] = storage["loss"]+loss
        loss = storage["loss"]
        return loss
    if idx==0:
        storage["functional"] = []
        storage["gt_field"]=[]
        storage["loss"] = 0
    observation = [d_observe(p) for p in points]
    prediction = [u(p) for p in points]
    
    storage["functional"].append(sum([(observation[i][0]-prediction[i][0])**2+(observation[i][1]-prediction[i][1])**2 for i in range(len(points))])/len(points))
    
    storage["time"]=idx

def eval_cb(j, u):
    print(j)

    

## get observatons
mesh = UnitSquareMesh(10, 10)
V = VectorFunctionSpace(mesh, 'P', 2)
u0 = Function(V)
u0.vector()[:] = 1
u_solution_dof = forward(u0,V,total_steps=2,objective=None,storage=None,points=None,mesh=mesh,store=True)
observed_coords = (V.tabulate_dof_coordinates())[::2,:]

storage = {"functional": [],'solution':u_solution_dof,'observation':u_solution_dof, "time": 0, "estimate_field": [], "gt_observation": [],'gt_field':[],'loss':0}

obj = lambda u, d, idx,observed_coords: objective_point_wise(storage,u, d, idx,observed_coords)
observation_setup = {"start_time":0,"timestep":1,"endtime":1}
obversation_steps = np.arange(observation_setup["endtime"]-observation_setup["start_time"])[0:observation_setup["endtime"]-observation_setup["start_time"]:observation_setup["timestep"]]
obversation_steps = np.floor(obversation_steps/observation_setup["timestep"]).astype(np.int)
print(obversation_steps)

## start recover init state
V = VectorFunctionSpace(mesh, 'P', 2)
u_init = Function(V)
control = Control(u_init)
forward(u_init,V=V,total_steps=observation_setup["endtime"]-observation_setup["start_time"],
            objective=obj,storage=storage,points=observed_coords,mesh=mesh)
# control = Control(u_init)

J = objective_point_wise(storage, finalize=True)
print("J:{}".format(J))
Jhat = ReducedFunctional(J, control, eval_cb_post=eval_cb)
problem = MinimizationProblem(Jhat)
parameters = {'maximum_iterations': 100}

solver = IPOPTSolver(problem, parameters=parameters)
u_init_estimates = solver.solve()

Besides, I use the Chorin scheme as I don’t want the initial state to depend on p. I am using P2-P1 now.

The whole point it is rather tricky for anyone to help you when your code contains so many different aspects.
As I said, this is not intended behaviour, but all the different aspects of your code makes it tricky to find the culprit.
For instance, the following minimal example does not cause the same issues:

from dolfin import *
from dolfin_adjoint import *

def forward(uh, f):
    V = uh.function_space()
    u = TrialFunction(V)
    v = TestFunction(V)
    a = inner(grad(u), grad(v)) * dx + inner(u, v) * dx
    L = inner(f, v) * dx
    solve(a==L, uh)
    return assemble(inner(uh,uh)*dx + inner(grad(uh), grad(uh))*dx)

mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, "CG", 1)
u = Function(V)
f = Function(V)
f.vector()[:] = 1

control = Control(f)
J = forward(u, f)
#control = Control(f)


Jhat = ReducedFunctional(J, control)
problem = MinimizationProblem(Jhat)
parameters = {'maximum_iterations': 100}
solver = IPOPTSolver(problem, parameters=parameters)
u_init_estimates = solver.solve()
print(u_init_estimates.vector().norm("l2"))
2 Likes

So this problem illustrates why we use ReducedFunctional(J, Control(f)) and not just ReducedFunctional(J, f). Basically, some operations are done in-place, such as solve and Function.assign. When using these functions you change the value of your function (say f) and so you might be interested in controlling either the first or the second “version” of this function.
While we could have just always chosen the leaf node, i.e. the first “version” of f (which was how it behaved before Control was introduced), but that would make things like this invalid:

f = Function(V) # This is the first version of f
f.assign(Constant(1)) # This is implicitly creates a new version of f that is the result of assigning 1 to it.
forward(f)
...

If you have a time-dependent problem, where u is overwritten for every time-step, then the difference between

control = Control(u)
forward(u)

and

forward(u)
control = Control(u)

is that in the first case you control (or compute gradients with respect to) the initial condition, and in the second case you control the solution at the last time step. Effectively this means you just compute the partial derivative of the functional J with respect to u at the last time step (i.e. no PDEs).

2 Likes

I see, thanks sebastkm, so basically I should have

control = Control(u)
forward(u)

since I want to optimize the initial state.

Yes. That would be correct.

1 Like

This is awesome, thanks sebastkm. Just a follow-up question, this won’t matter if the parameter doesn’t get updated during the forward() modeling right?

Correct. Which is why most examples will just write ReducedFunctional(J, Control(f)).

1 Like

Hi sebastkm, I found that the using the code to solve inverse problem is super slow. Do you think this is expected here, or the gradient calculation here should be as fast as forward modeling speed hence shouldn’t be that slow.

In this case you could get a bit of a speedup by defining a (or three) KrylovSolver outside of your time-loop. But that will only be a general speedup, and won’t fix your problem.

The reason your gradient computations are slow is that you are using the pointwise evaluation feature (i.e. u(p)). The adjoint implementation for this is really slow at the moment.
You could look here for some suggestions, but there are currently no very efficient ways that are not quite restrictive in the choice of points of measurement: