# Monte-Carlo policy evaluation

To evaluate a policy $$\pi$$, given complete episodes from $$\pi$$, do the following for each episode:

• Each time state $$s$$ is visited, update $$N(s)=N(s)+1$$, $$S(s)=S(s)+G_t$$, $$V(s) = S(s)/N(s)$$.

As number of episodes approach infinity, $$V(s) \rightarrow V_\pi(s)$$.

## Thoughts

• Clearly differentiate between first-visit MC and every-visit MC as mentioned in Sutton-Barto p.92.

Created: 2022-03-13 Sun 21:44

