Markov Decision Processes (MDPs)

A Markov Decision Process (MDP) is a 5 tuple \((S,A,P_a,R_a,\gamma)\) where


  • Value functions deserve their own node, mention how they induce a partial ordering for policies.
  • Starting out with Markov processes, then Markov reward processes then MDPs, which I think is a great way to lay out the ideas.
  • Need to look at precise sigma-algebras and formal details when decomposing the value functions into immediate and future components by the Bellman expectation equation.
  • Sutton-Barto (Sutton and Barto 2018) p.26 states \(R_t\) as the reward from action in at time \(t\) i.e. \(A_t\), yet in p.48, \(R_t\) is denoted the reward from the action taken in time \(t-1\) i.e. \(A_{t-1}\).
Sutton, Richard S, and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.

Author: Nazaal

Created: 2022-03-13 Sun 21:44