Partially Observable Markov Decision Processes (POMDPs)

A Partially Observable Markov Decision Process (POMDP) is a 7 tuple $$(S,A,O,P_a,R_a,Z,\gamma)$$ where

• $$S$$ is the state space.
• $$A$$ is the action space.
• $$O$$ is the observation space.
• $$P_a(s,s'):S \times A \times S \rightarrow [0,1] =\mathbb{P}(S_{t+1}=s'|S_t=s,A_t=a)$$ is the transition probability for the next possible state $$s'$$ given the current state $$s$$ under action $$a$$, which obeys the Markov property.
• $$R_a(s) : S \times A \rightarrow \mathbb{R} = \mathbb{E}[R_{t+1}|S_t=s,A_t=a]$$ is the immediate or expected immediate reward for transitioning to the new state $$s'$$ given the current state $$s$$ under action $$a$$.
• $$Z_a(o,s'): O \times S \times A \rightarrow [0,1] = \mathbb{P}(O_{t+1}=o|S_{t+1}=s', A_t=a)$$ is the observation model.
• $$\gamma \in [0,1]$$ is the discount factor for rewards.

It is an MDP with hidden states, or equivalently a Hidden Markov Model (HMM) with actions.

Thoughts

TODO Put POMDP algorithms

Created: 2022-03-13 Sun 21:45

Validate