Calculating Gradients/Sensitivities for Optimization

In adjoint analysis we set out to calculate design sensitivities for optimizing the objective function \(C\) and for fulfilling the constraints \(\boldsymbol{g},\boldsymbol{h}\). As the procedure for \(C\) and the constraints is exactly the same, the calculation will only be demonstrated for the objective function.

Notation

Before starting a short note on notation: \(\frac{d y}{d x}\) denotes the total derivative and \(\frac{\partial y}{\partial x}\) the partial derivative which only accounts for the direct (explicit) dependence of \(y\) on \(x\), treating all other variables as fixed/constants.

Take the function \(g(x)=x^4\) and rewrite it as \(q(u)=u^2\) with \(u=x^2\), then the partial derivative is

\[\frac{\partial q}{\partial x}=0 \]

and the total derivative is

\[\begin{split}\begin{align} \frac{d q}{d x} &=\frac{\partial q}{\partial x} + \frac{\partial q}{\partial u} \cdot \frac{\partial u}{\partial x} \\ \frac{d q}{d x} &= 0 + 2u \cdot 2x = 4x^3 \end{align}\end{split}\]

Direct Method

We differentiate \(C\) with regards to a design variable \(x\) which yields

\[\frac{d C}{d x} = \frac{\partial C}{\partial x} + \nabla_{\boldsymbol{u}}C^T \frac{\partial \boldsymbol{u}}{\partial x}\]

While \(\frac{\partial C}{\partial x}\) and \(\nabla_{\boldsymbol{u}}C\) are easy to evaluate, \(\frac{\partial \boldsymbol{u}}{\partial x}\) is a problem: remembering the physical problem

\[\boldsymbol{K}\boldsymbol{u} = \boldsymbol{f},\]

its solution can be stated as

\[\boldsymbol{u} = \boldsymbol{K}^{-1}\boldsymbol{f},\]

where \(\boldsymbol{K}^{-1}\) is the inverse matrix of \(\boldsymbol{K}\). We assume for the moment the right hand side to be independent of \(x\), therefor we can rewrite

\[\frac{\partial \boldsymbol{u}}{\partial x} = \frac{\partial \boldsymbol{K}^{-1}}{\partial x}\boldsymbol{f},\]

and after looking up the derivative of a matrix with respect to a scalar (https://en.wikipedia.org/wiki/Matrix_calculus#Matrix-by-scalar_identities) this becomes

\[\frac{\partial \boldsymbol{u}}{\partial x} = -\boldsymbol{K}^{-1} \frac{\partial \boldsymbol{K}}{\partial x}\boldsymbol{K}^{-1} \boldsymbol{f},\]

This solution for all practical purposes is impractical as \(\boldsymbol{K}\) is a very large which makes the matrix products and inversion computationally too expensive.

Adjoint Analysis

In adjoint analysis, one rewrites the objective function as

\[\tilde{C} = C + \boldsymbol{\lambda}^T \left( \boldsymbol{K}\boldsymbol{u} - \boldsymbol{f} \right)\]

where \(\boldsymbol{\lambda}\) is an arbitrary vector which we call the adjoint vector or in general adjoint variables. It is arbitrary as we have written \(\boldsymbol{\lambda } \cdot \boldsymbol{0}\), therefor the values of \(\boldsymbol{\lambda}\) are arbitrary. After differentiation

\[\frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \nabla_{\boldsymbol{u}}C^T \frac{\partial \boldsymbol{u}}{\partial x} + \boldsymbol{\lambda}^T \left( \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u} + \boldsymbol{K} \frac{\partial \boldsymbol{u}}{\partial x} - \frac{\partial \boldsymbol{f}}{\partial x} \right)\]

we again assume for sake of clarity that the right hand side of the phys. problem \(\boldsymbol{f}\) is independent of \(x\), therefor \(\frac{\partial \boldsymbol{f}}{\partial x}=\boldsymbol{0}\) and we re-group the terms:

\[\frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \left( \nabla_{\boldsymbol{u}}C^T + \boldsymbol{\lambda}^T \boldsymbol{K}\right)\frac{\partial \boldsymbol{u}}{\partial x} + \boldsymbol{\lambda}^T \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u}\]

We now notice if

\[\nabla_{\boldsymbol{u}}C^T + \boldsymbol{\lambda}^T \boldsymbol{K} = \boldsymbol{0}\]

the troublesome derivative \(\frac{\partial \boldsymbol{u}}{\partial x}\) drops out of the expression for \(\frac{d \tilde{C}}{d x}\). As \(\boldsymbol{\lambda}\) is arbitrary, after re-arranging the terms one can state the adjoint problem

\[\boldsymbol{K}^T \boldsymbol{\lambda} = -\nabla_{\boldsymbol{u}}C\]

which yields the final expression for the sensitivities as

\[\frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \boldsymbol{\lambda}^T \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u}.\]

In adjoint analysis the calculation of gradients therefor amounts to just solving a linear problem which is much cheaper as compared to the direct method.