# Calculating Gradients/Sensitivities for Optimization In adjoint analysis we set out to calculate design sensitivities for optimizing the objective function $C$ and for fulfilling the constraints $\boldsymbol{g},\boldsymbol{h}$. As the procedure for $C$ and the constraints is exactly the same, the calculation will only be demonstrated for the objective function. ## Notation Before starting a short note on notation: $\frac{d y}{d x}$ denotes the total derivative and $\frac{\partial y}{\partial x}$ the partial derivative which only accounts for the **direct (explicit)** dependence of $y$ on $x$, treating all other variables as fixed/constants. Take the function $g(x)=x^4$ and rewrite it as $q(u)=u^2$ with $u=x^2$, then the partial derivative is ```{math} \frac{\partial q}{\partial x}=0 ``` and the total derivative is ```{math} \begin{align} \frac{d q}{d x} &=\frac{\partial q}{\partial x} + \frac{\partial q}{\partial u} \cdot \frac{\partial u}{\partial x} \\ \frac{d q}{d x} &= 0 + 2u \cdot 2x = 4x^3 \end{align} ``` (direct-method)= ## Direct Method We differentiate $C$ with regards to a design variable $x$ which yields ```{math} \frac{d C}{d x} = \frac{\partial C}{\partial x} + \nabla_{\boldsymbol{u}}C^T \frac{\partial \boldsymbol{u}}{\partial x} ``` While $\frac{\partial C}{\partial x}$ and $\nabla_{\boldsymbol{u}}C$ are easy to evaluate, $\frac{\partial \boldsymbol{u}}{\partial x}$ is a problem: remembering the physical problem ```{math} \boldsymbol{K}\boldsymbol{u} = \boldsymbol{f}, ``` its solution can be stated as ```{math} \boldsymbol{u} = \boldsymbol{K}^{-1}\boldsymbol{f}, ``` where $\boldsymbol{K}^{-1}$ is the inverse matrix of $\boldsymbol{K}$. We assume for the moment the right hand side to be independent of $x$, therefor we can rewrite ```{math} \frac{\partial \boldsymbol{u}}{\partial x} = \frac{\partial \boldsymbol{K}^{-1}}{\partial x}\boldsymbol{f}, ``` and after looking up the derivative of a matrix with respect to a scalar (https://en.wikipedia.org/wiki/Matrix_calculus#Matrix-by-scalar_identities) this becomes ```{math} \frac{\partial \boldsymbol{u}}{\partial x} = -\boldsymbol{K}^{-1} \frac{\partial \boldsymbol{K}}{\partial x}\boldsymbol{K}^{-1} \boldsymbol{f}, ``` This solution for all practical purposes is impractical as $\boldsymbol{K}$ is a very large which makes the matrix products and inversion computationally too expensive. (adjoint-analysis)= ## Adjoint Analysis In adjoint analysis, one rewrites the objective function as ```{math} \tilde{C} = C + \boldsymbol{\lambda}^T \left( \boldsymbol{K}\boldsymbol{u} - \boldsymbol{f} \right) ``` where $\boldsymbol{\lambda}$ is an arbitrary vector which we call the adjoint vector or in general adjoint variables. It is arbitrary as we have written $\boldsymbol{\lambda } \cdot \boldsymbol{0}$, therefor the values of $\boldsymbol{\lambda}$ are arbitrary. After differentiation ```{math} \frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \nabla_{\boldsymbol{u}}C^T \frac{\partial \boldsymbol{u}}{\partial x} + \boldsymbol{\lambda}^T \left( \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u} + \boldsymbol{K} \frac{\partial \boldsymbol{u}}{\partial x} - \frac{\partial \boldsymbol{f}}{\partial x} \right) ``` we again assume for sake of clarity that the right hand side of the phys. problem $\boldsymbol{f}$ is independent of $x$, therefor $\frac{\partial \boldsymbol{f}}{\partial x}=\boldsymbol{0}$ and we re-group the terms: ```{math} \frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \left( \nabla_{\boldsymbol{u}}C^T + \boldsymbol{\lambda}^T \boldsymbol{K}\right)\frac{\partial \boldsymbol{u}}{\partial x} + \boldsymbol{\lambda}^T \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u} ``` We now notice if ```{math} \nabla_{\boldsymbol{u}}C^T + \boldsymbol{\lambda}^T \boldsymbol{K} = \boldsymbol{0} ``` the troublesome derivative $\frac{\partial \boldsymbol{u}}{\partial x}$ drops out of the expression for $\frac{d \tilde{C}}{d x}$. As $\boldsymbol{\lambda}$ is arbitrary, after re-arranging the terms one can state the adjoint problem ```{math} \boldsymbol{K}^T \boldsymbol{\lambda} = -\nabla_{\boldsymbol{u}}C ``` which yields the final expression for the sensitivities as ```{math} \frac{d \tilde{C}}{d x} = \frac{\partial C}{\partial x} + \boldsymbol{\lambda}^T \frac{\partial \boldsymbol{K}}{\partial x} \boldsymbol{u}. ``` In adjoint analysis the calculation of gradients therefor amounts to just solving a linear problem which is much cheaper as compared to the direct method.