Iterative policy evaluation

Given a policy \(\pi\), to compute its value function \(V_\pi(s)\), start with an arbitrary value \(V_0(s) \forall s\) then iterate the following: \(V_{i+1}(s) = \mathbb{E}_\pi \mathbb{E}[R_{t+1} + \gamma V(S_{t+1})]= \sum_a \pi(a|s) \sum_{s'} p(s'|s,a)[r(s,a,s') + \gamma V_i(s')]\)

Iterative policy evaluation

Thoughts