A descent direction based optimization algorithm where the descent direction is taken to be $$\mathbf{d}^{(k)} = -\frac{\nabla_{\mathbf{x}} \hat{f}(\mathbf{x}^{(k)}) = -\frac{\nabla_\mathbf{x} f(\mathbf{x}^{(k)})}{||\nabla_\mathbf{x} f(\mathbf{x}^{(k)})||}$$. Here we assume that we want to solve the optimization problem of finding $$x$$ such that $$\mathbf{x} = \text{argmin}_\mathbf{x} f : \mathbb{R}^d \rightarrow \mathbb{R}$$.

The update rule is thus $$\mathbf{x}^{(k+1)} \leftarrow \mathbf{x}^{(k)} - \alpha^{(k)}\mathbf{d}^{(k)}$$ with $$\alpha^{(k)} = \text{argmin}_\alpha f(\mathbf{x}^{(k)}+\alpha \mathbf{d}^{(k)})$$. For the optimal $$\alpha^{(k)}$$, we have that $$0=\frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \alpha^{(k)}} = \frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \mathbf{x}^{(k+1)}}\frac{\partial \mathbf{x}^{(k+1)}}{\partial \alpha^{(k)}}$$ which implies the directions taken in step $$k, k+1$$ are orthogonal to each other.

## Thoughts

• Why do the authors in (Kochenderfer and Wheeler 2019) separate descent direction based methods and gradient descent, would the optimal direction for descent be the direction given the gradient?
• (Kochenderfer and Wheeler 2019) Chapter 5, maybe it would be more clear to mention what the gradient is taken with respect to in Equation 5.1?
• TODO P70 (Kochenderfer and Wheeler 2019) Orthogonality of directions were not discussed in any material prior, when choosing the best step size, its derivative with respect to $$\alpha$$ is 0, by the chain rule we will see that in this case the directions are orthogonal. Better to also write down under directional derivatives.
• TODO Analysis based on eigen-decomposition, first of convex functions then non-convex functions based on convex approximation.

## References

Kochenderfer, Mykel J., and Tim A. Wheeler. 2019. Algorithms for Optimization. The MIT Press.

Created: 2022-04-04 Mon 23:40

Validate