Matrix Calculus

Differentiating Scalars

Let $f : \mathbb{R}^n \rightarrow \mathbb{R}$.

With respect to vectors

Let $\mathbf{x} \in \mathbb{R}^n$. Then, $\frac{\partial f}{\partial \mathbf{x}} = [\frac{\partial f}{\partial x_1}, \cdots, \frac{\partial f}{\partial x_n}]$ or the transpose of the gradient, i.e $\frac{\partial f}{\partial \mathbf{x}} = (\nabla f)^T$.

Some common examples are given below:

If $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T\mathbf{A}\mathbf{x}$ then $f'(\mathbf{x})=\frac{1}{2}(\mathbf{A}+\mathbf{A}^T)\mathbf{x}$ and $f''(\mathbf{x})=\frac{1}{2}(\mathbf{A}+\mathbf{A}^T)$.
If $f(\mathbf{w}) = \frac{1}{2}||\mathbf{y}-\mathbf{X}\mathbf{w}||^2$ then $f'(\mathbf{w})=\mathbf{X}^T(\mathbf{X}\mathbf{w}-\mathbf{y})$ and $f''(\mathbf{w})=\mathbf{X}^T\mathbf{X}$.

With respect to matrices

Let $\mathbf{W} \in \mathbb{R}^{m \times n}$, then $\frac{\partial f}{\partial \mathbf{W}} = \begin{bmatrix} \frac{\partial f}{\partial \mathbf{W}_{11}} & \dots & \frac{\partial f}{\partial \mathbf{W}_{1n}}\\ \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial \mathbf{W}_{m1}} & \dots & \frac{\partial f}{\partial \mathbf{W}_{mn}} \end{bmatrix}$

Differentiating Vectors

Let $\mathbf{f} : \mathbb{R}^m \rightarrow \mathbb{R}^n$.

With respect to vectors

This gives the Jacobian.

With respect to matrices

Let, $\mathbf{W} \in \mathbb{R}^{n\times m}$, then $\frac{∂ \mathbf{f}}{∂ \mathbf{W}} ∈ \mathbb{R}^{n × m × n}$*, in practice we compute $\frac{\partial \mathbf{f}_k}{\partial \mathbf{W}_{ij}$ separately.

Thoughts

Read about the pushforward for vector by vector differentiation.
* From Kevin Clark’s “Computing Neural Network Gradients”, find textbook with more details on this.