# Monte-Carlo policy evaluation

To evaluate a policy \(\pi\), given complete episodes from \(\pi\), do the following for each episode:

- Each time state \(s\) is visited, update \(N(s)=N(s)+1\), \(S(s)=S(s)+G_t\), \(V(s) = S(s)/N(s)\).

As number of episodes approach infinity, \(V(s) \rightarrow V_\pi(s)\).

## Thoughts

- Clearly differentiate between first-visit MC and every-visit MC as mentioned in Sutton-Barto p.92.