Gradient Descent

A descent direction based optimization algorithm where the descent direction is taken to be \(\mathbf{d}^{(k)} = -\frac{\nabla_{\mathbf{x}} \hat{f}(\mathbf{x}^{(k)}) = -\frac{\nabla_\mathbf{x} f(\mathbf{x}^{(k)})}{||\nabla_\mathbf{x} f(\mathbf{x}^{(k)})||}\). Here we assume that we want to solve the optimization problem of finding \(x\) such that \(\mathbf{x} = \text{argmin}_\mathbf{x} f : \mathbb{R}^d \rightarrow \mathbb{R}\).

The update rule is thus \(\mathbf{x}^{(k+1)} \leftarrow \mathbf{x}^{(k)} - \alpha^{(k)}\mathbf{d}^{(k)}\) with \(\alpha^{(k)} = \text{argmin}_\alpha f(\mathbf{x}^{(k)}+\alpha \mathbf{d}^{(k)})\). For the optimal \(\alpha^{(k)}\), we have that \(0=\frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \alpha^{(k)}} = \frac{\partial f(\mathbf{x}^{(k+1)})}{\partial \mathbf{x}^{(k+1)}}\frac{\partial \mathbf{x}^{(k+1)}}{\partial \alpha^{(k)}}\) which implies the directions taken in step \(k, k+1\) are orthogonal to each other.

Thoughts

Why do the authors in (Kochenderfer and Wheeler 2019) separate descent direction based methods and gradient descent, would the optimal direction for descent be the direction given the gradient?
(Kochenderfer and Wheeler 2019) Chapter 5, maybe it would be more clear to mention what the gradient is taken with respect to in Equation 5.1?
TODO P70 (Kochenderfer and Wheeler 2019) Orthogonality of directions were not discussed in any material prior, when choosing the best step size, its derivative with respect to \(\alpha\) is 0, by the chain rule we will see that in this case the directions are orthogonal. Better to also write down under directional derivatives.
TODO Analysis based on eigen-decomposition, first of convex functions then non-convex functions based on convex approximation.

References

Kochenderfer, Mykel J., and Tim A. Wheeler. 2019. Algorithms for Optimization. The MIT Press.