Interestingness elements for explainable reinforcement learning: Understanding agents’ capabilities and limitations

Details

Title : Interestingness elements for explainable reinforcement learning: Understanding agents’ capabilities and limitations Author(s): Pedro Sequeira, Melinda Gervasio

Summary

Authors develop a task and model agnostic framework to extract "interestingness" (term borrowed from the association rule mining literature) elements from the agents interaction to see if such elements explain the agents behaviour well. The framework involves passing interaction data for introspection, from which interestingness elements are extracted and then a visual summary is generated from them which is shown to the end user. The interestingness elements are
- Frequency, to determine frequent and infrequent situations.
- Certainty, to determine how certain the agent was of taken actions.
- Transition values, to analyze how the value attributed to some state changes with respect to possible states visited at the next time-step.
- Sequences, where agent starts from a (local) minima and goes to a (local) maxima.
The authors do a user study, where different agents are given different state spaces and different reward functions (unknown to the users/subjects) in the game of Frogger, and to see whether the elements above help the user to infer the information hidden to them on the capabilities and objectives of the agent in the game. One interesting result was that showing all elements result in greater divergence between subjects' responses and true agent performance.

Thoughts

Particularly liked the literature review on this paper.