# Partially Observable Markov Decision Processes (POMDPs)

A Partially Observable Markov Decision Process (POMDP) is a 7 tuple \((S,A,O,P_a,R_a,Z,\gamma)\) where

- \(S\) is the
**state space**. - \(A\) is the
**action space**. - \(O\) is the
**observation space**. - \(P_a(s,s'):S \times A \times S \rightarrow [0,1] =\mathbb{P}(S_{t+1}=s'|S_t=s,A_t=a)\) is the
**transition probability**for the next possible state \(s'\) given the current state \(s\) under action \(a\), which obeys the Markov property. - \(R_a(s) : S \times A \rightarrow \mathbb{R} = \mathbb{E}[R_{t+1}|S_t=s,A_t=a]\) is the immediate or expected immediate
**reward**for transitioning to the new state \(s'\) given the current state \(s\) under action \(a\). - \(Z_a(o,s'): O \times S \times A \rightarrow [0,1] = \mathbb{P}(O_{t+1}=o|S_{t+1}=s', A_t=a)\) is the observation model.
\(\gamma \in [0,1]\) is the discount factor for rewards.

It is an MDP with hidden states, or equivalently a Hidden Markov Model (HMM) with actions.