# Policy evaluation

In an MDP, the problem of policy evaluation refers to getting the state-value function for each state given a fixed policy $$\pi$$. This is done using the following Bellman Equation:

$$V^{\pi}(s) = \sum_{s'}\sum_{a}p(s',a|s)[r(s,a,s')+\gamma V^{\pi}(s')]=\sum_{s'}\sum_{a}p(s'|s,a)\pi(a|s)[r(s,a,s')+\gamma V^{\pi}(s')]$$

Where $$\pi(a|s) = p(s|a)$$ and $$r(s,a,s')$$ is the expected reward at state $$s'$$ after taking action $$a$$ at state $$s$$.

## Thoughts

• Mention linear system involved here.

Created: 2022-03-13 Sun 21:44

Validate