Wrong convergence rates for gradient of custom function

Hello,

I have issues validating the gradient (with a Taylor-test) of some operations based on a simple custom function I implemented. The convergence rates are sometimes correct depending on the operation I am performing before the custom function is called…

Here is first the Custom Function contained in the file: map_dofs.py, which rearranges the degrees of freedom of an input function (according to a specific “map”) to an output function of the same FunctionSpace:

from fenics import *
from fenics_adjoint import *

def map_dofs(func0,dof_map_1in0):             
    func1 = Function(func0.function_space())
    func1.vector()[:] = func0.vector()[dof_map_1in0]
    return func1

and the overloaded version containing the corresponding adjoint derivation (inside file: map_dofs_overloaded.py ):

from fenics import *
from fenics_adjoint import *
from pyadjoint import Block
from pyadjoint.overloaded_function import overload_function

from map_dofs import map_dofs
import numpy as np

backend_map_dofs = map_dofs

class MapDofsBlock(Block):
    def __init__(self, func0, dof_map_1in0, **kwargs):
        super(MapDofsBlock, self).__init__() 
        self.kwargs = kwargs
        self.add_dependency(func0)
        self.dof_map_1in0 = dof_map_1in0 # Store the map
        
    def __str__(self): 
        return 'MapDofsBlock'

    def evaluate_adj_component(self, inputs, adj_inputs, block_variable, idx, prepared=None):
        adj_input = adj_inputs[0] 
        x = inputs[0].vector() 
        output = adj_input
        output[:] = output[self.dof_map_1in0] 
        return output
    
    def recompute_component(self, inputs, block_variable, idx, prepared):
        return backend_map_dofs(inputs[0],self.dof_map_1in0) 

map_dofs = overload_function(map_dofs, MapDofsBlock)

Here is a minimal working example:

from dolfin import *
from dolfin_adjoint import *
import numpy as np

from map_dofs_overloaded import map_dofs

mesh = UnitSquareMesh(4,4)
V = FunctionSpace(mesh,'CG', 2)
u_0 = interpolate(Expression("x[0]",degree=3), V)

n = V.dim() # Number of dof

list_idx_dof = [x for x in range(n)]
np.random.shuffle(list_idx_dof) # Shuffle randomly the list of indexes for a general example

dof_map_1in0 = list_idx_dof

power_value = 1 # To change to = 3 for error
u_0_pen = project(u_0**power_value,V)
u_1 = map_dofs(u_0_pen,dof_map_1in0) # Use a custom function

m0 = Control(u_0)
J = assemble(((u_1)**2) *dx)**(1/2) # Arbitrary Functional to serve as objective function
Jhat = ReducedFunctional(J,m0)

# Taylor remainder test to verify the computation of the gradient
h = Function(V)
h.vector()[:] = 0.01*(1.0- 0.0)
conv_rate = taylor_test(Jhat,u_0,h)

which runs as expected and gives:

Running Taylor test
Computed residuals: [1.513063333844866e-09, 3.7829600191182635e-10, 9.457792696186402e-11, 2.3644987814699903e-11]
Computed convergence rates: [1.9998849429826506, 1.9999401040323632, 1.9999691216334434]

However if the power_value is superior to 1, like power_value = 3, the Taylor Test fails with convergence rates of ~1.0 instead of ~2.0:

Running Taylor test
Computed residuals: [6.020749139947801e-05, 3.009948012942537e-05, 1.5048673712567212e-05, 7.524070273277936e-06]
Computed convergence rates: [1.0002044381198376, 1.000102226048465, 1.0000511147760685]

Do you have some ideas regarding these wrong convergence rates?
Thank you for your time

Dear thabuis,

If you perform the “power-operation” after your dof mapping, the convergence rates are correct.
I don’t know the reason though …

Replace:

power_value = 3 # To change to = 3 for error
u_0_pen = project(u_0**power_value,V)
u_1 = map_dofs(u_0_pen,dof_map_1in0) # Use a custom function

by:

u_mapped = map_dofs(u_0,dof_map_1in0) # Use a custom function
power_value = 3
u_1 = project(u_mapped**power_value,V)

The taylor_test gives then:

Running Taylor test
Computed residuals: [1.6558765013731467e-08, 4.139580977508732e-09, 1.034881469161677e-09, 2.5871860752526123e-10]
Computed convergence rates: [2.0000384320081404, 2.0000192034580038, 2.000009812961877]

I hope this helps

Thanks a lot @Brad ! Indeed this works.
However, in my code, I am performing multiple mappings of the Degrees Of Freedom (DOF) after the power-operation.

Performing this operation beforehand, permits to avoid repeating it multiple times on the different mapped fields. This would improve the computational time.

So if someone has another idea or an explanation on why the former problem is not working, it would be really much appreciated

The reason you see this is because the adjoint in MapDofsBlock is not correctly implemented. It seems you have the correct gradient in the initial taylor test because you are using a constant perturbation direction which results in the inner product of grad J and h to be the same even if you perform map_dofs on grad J. If you change to a non-constant valued direction, such as with h.vector()[:] = 0.01*np.random.rand(V.dim()), you should see that the taylor tests fails even with power_value = 1.

The problem in your adjoint is that you are using the original shuffled dofmap instead of the correct adjoint one. That is, you need a list that reverses the rearrangement of dofs.

This can be achieved with:

list_idx_dof = np.array([x for x in range(n)])
dof_map_1in0 = list_idx_dof.copy()
np.random.shuffle(dof_map_1in0) # Shuffle randomly the list of indexes for a general example
reverse_dof_map_1in0 = list_idx_dof[np.argsort(dof_map_1in0)]
1 Like

It works! Thank you very much for your answer @sebastkm
For those interested, here is the file map_dofs_overloaded.py with the updated adjoint derivation:

from fenics import *
from fenics_adjoint import *
from pyadjoint import Block
from pyadjoint.overloaded_function import overload_function

from map_dofs import map_dofs
import numpy as np

backend_map_dofs = map_dofs

class MapDofsBlock(Block):
    def __init__(self, func0, dof_map_1in0, **kwargs):
        super(MapDofsBlock, self).__init__() 
        self.kwargs = kwargs
        self.add_dependency(func0)
        list_idx_dof = np.array([x for x in range(len(dof_map_1in0))])
        dof_map_0in1 = list_idx_dof[np.argsort(dof_map_1in0)]
        self.dof_map_0in1 = dof_map_0in1  # Store the map for adjoint computation
        self.dof_map_1in0 = dof_map_1in0  # Store the map for direct computation
        
    def __str__(self): 
        return 'MapDofsBlock'

    def evaluate_adj_component(self, inputs, adj_inputs, block_variable, idx, prepared=None):
        adj_input = adj_inputs[0] 
        x = inputs[0].vector() 
        output = adj_input
        output[:] = output[self.dof_map_0in1] 
        return output
    
    def recompute_component(self, inputs, block_variable, idx, prepared):
        return backend_map_dofs(inputs[0],self.dof_map_1in0) 

map_dofs = overload_function(map_dofs, MapDofsBlock)