ε-greedy exploration

A method for choosing actions given a Q-function which ensures exploration. Such a policy is defined as: \(\pi(a|s) = argmax_{a'}Q(s,a)\) with probability \(1-\epsilon\), and a random action (assuming finite action space) with probability \(\epsilon\).

To converge to an optimal policy, \(\epsilon\) can be reduced over time, for e.g. in episodic environments, the Greedy in Limit with Infinite Exploration (GLIE) strategy chooses \(\epsilon = \frac{b}{b+k}\) for episode \(k\) where \(b\) is some constant.

ε-greedy exploration

Thoughts