Matrix Calculus

Differentiating Scalars

Let \(f : \mathbb{R}^n \rightarrow \mathbb{R}\).

With respect to vectors

Let \(\mathbf{x} \in \mathbb{R}^n\). Then, \(\frac{\partial f}{\partial \mathbf{x}} = [\frac{\partial f}{\partial x_1}, \cdots, \frac{\partial f}{\partial x_n}]\) or the transpose of the gradient, i.e \(\frac{\partial f}{\partial \mathbf{x}} = (\nabla f)^T\).

Some common examples are given below:

  • If \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T\mathbf{A}\mathbf{x}\) then \(f'(\mathbf{x})=\frac{1}{2}(\mathbf{A}+\mathbf{A}^T)\mathbf{x}\) and \(f''(\mathbf{x})=\frac{1}{2}(\mathbf{A}+\mathbf{A}^T)\).
  • If \(f(\mathbf{w}) = \frac{1}{2}||\mathbf{y}-\mathbf{X}\mathbf{w}||^2\) then \(f'(\mathbf{w})=\mathbf{X}^T(\mathbf{X}\mathbf{w}-\mathbf{y})\) and \(f''(\mathbf{w})=\mathbf{X}^T\mathbf{X}\).

With respect to matrices

Let \(\mathbf{W} \in \mathbb{R}^{m \times n}\), then \(\frac{\partial f}{\partial \mathbf{W}} = \begin{bmatrix} \frac{\partial f}{\partial \mathbf{W}_{11}} & \dots & \frac{\partial f}{\partial \mathbf{W}_{1n}}\\ \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial \mathbf{W}_{m1}} & \dots & \frac{\partial f}{\partial \mathbf{W}_{mn}} \end{bmatrix}\)

Differentiating Vectors

Let \(\mathbf{f} : \mathbb{R}^m \rightarrow \mathbb{R}^n\).

With respect to vectors

This gives the Jacobian.

With respect to matrices

Let, \(\mathbf{W} \in \mathbb{R}^{n\times m}\), then $\frac{∂ \mathbf{f}}{∂ \mathbf{W}} ∈ \mathbb{R}n × m × n$*, in practice we compute \(\frac{\partial \mathbf{f}_k}{\partial \mathbf{W}_{ij}\) separately.


  • Read about the pushforward for vector by vector differentiation.
  • * From Kevin Clark’s “Computing Neural Network Gradients”, find textbook with more details on this.

Author: Nazaal

Created: 2022-04-04 Mon 23:39