Importance Sampling

Let \(X\) be a random variable whose distribution \(\pi\) we call the target distribution, and \(\varphi\) a function, called the target function. Importance sampling is used to compute integrals of the form:

\[ \mathbb{E}[\varphi(x)] = \int \varphi(x)\pi(x) \: dx\]

In general it is assumed that generating samples from \(\pi\) is hard, so let \(\pi_p\) be a proposal distribution which has positive support over \(\pi\) and we can easily generate samples from. One idea to compute the expectation above is to rewrite the integral above as: \[ \mathbb{E}[\varphi(x)] = \int \varphi(x)\frac{\pi(x)}{\pi_p(x)}\pi_p(x) \: dx = \int \varphi(x)w(x)\pi_p(x) \: dx \] Since we can sample from \(\pi_p\) we can compute a Monte Carlo estimate, the importance sampling estimator, of the above as \(\mathbb{E}[\varphi(x)] \frac{1}{N}\approx \sum_{n=1}^N w(x_n)\varphi(x_n)\) where \(x_n\) are sampled from \(\pi_p\) and \(w(x_n) = \frac{\pi(x_n)}{\pi_p(x_n)}\) are called the importance weights.

Above, we assumed that we can evaluate the target distribution \(\pi\) exactly, however in practice we may only be able to evaluate an unnormalized density \(\tilde{\pi}\) such that \(\pi(x) = \frac{\tilde{\pi}(x)}{Z_\pi} = \frac{\tilde{\pi}(x)}{\int \tilde{\pi}(x)\: dx}\). The resulting estimator is called the self-normalized importance sampling estimator. By substitution, we see that we have: \[ \mathbb{E}[\varphi(x)] = \int \varphi(x)\pi(x) \: dx = \int \frac{\varphi(x)\tilde{\pi}(x)}{\int \tilde{\pi}(x)\: dx} \: dx = \frac{\int \varphi(x)\tilde{\pi}(x)\:dx}{\int \tilde{\pi}(x)\: dx} = \frac{\int \varphi(x)\frac{\tilde{\pi}(x)}{\pi_p(x)}\pi_p(x) \:dx}{\int \frac{\tilde{\pi}(x)}{\pi_p(x)}\pi_p(x)\: dx} \] And the final Monte Carlo estimate is given as \(\mathbb{E}[\varphi(x)] \approx \frac{\frac{1}{N}\sum_{n=1}^N w(x_n)\varphi(x_n)}{\frac{1}{N}\sum_{n=1}^N w(x_n)}\) where \(x_n\) are sampled from \(\pi_p\) and \(w(x_n) = \frac{\tilde{\pi}(x_n)}{\pi_p(x_n)}\) are called the unnormalized weights. Unlike the direct version, the self-normalized estimator is biased, but consistent i.e. tends to the true value as \(N\rightarrow \infty\). Note that the self-normalized estimator approximates \(Z_\pi = \int \tilde{\pi}(x) \: dx \approx \frac{1}{N}\sum_{n=1}^N w(x_n)\) as a byproduct.

Thoughts

Put more details on the proposal distribution.