# Value iteration

An algorithm to solve Markov Decision Processes (MDPs). Start from the value function at iteration 0 to be 0 at all states i.e. $$\forall s \; V_0^*(s)=0$$. Then, iterate as follows: $$V_{i+1}^*(s) = \max_a \mathbb{E}[R_{t+1}+\gamma V(S_{t+1})] = \max_a \sum_{s'} p(s'|s,a)[r(s,a,s')+\gamma V_i^*(s')]$$.

In the above formulation the reward is assumed to be deterministic.

This algorithm converges to $$V^*(s)$$.

## Thoughts

• Look into convergence details, and uniqueness of $$V^*(s)$$.

Created: 2022-03-13 Sun 21:45

Validate