Monte-Carlo policy evaluation

To evaluate a policy \(\pi\), given complete episodes from \(\pi\), do the following for each episode:

As number of episodes approach infinity, \(V(s) \rightarrow V_\pi(s)\).

Thoughts

  • Clearly differentiate between first-visit MC and every-visit MC as mentioned in Sutton-Barto p.92.

Author: Nazaal

Created: 2022-03-13 Sun 21:44

Validate