Explainable Reinforcement Learning (XRL) Resources

This repository aims to keep an up-to-date list of research on explainable reinforcement learning (XRL). The repository supplements the survey paper found here. If you find this helpful, please give this repository a star and cite the survey paper.

Missing resource(s), issue(s), or question(s)? Please open an issue here, or feel free to email me (contact [at] xrl [dot] ai).

Resources

Awesome Explainable Reinforcement Learning. Link

Survey Papers

#/Link	Title	Venue/Journal	Year
1	Explainability in Deep Reinforcement Learning, a Review into Current Methods and Applications	ACM Comput. Surv.	2023
2	Explainable Reinforcement Learning: A Survey and Comparative Review	ACM Comput. Surv.	2023
3	A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges	CoRR	2023
4	Explainable reinforcement learning (XRL): a systematic literature review and taxonomy	Mach. Learn.	2023
5	Explainable reinforcement learning for broad-XAI: a conceptual framework and survey	Neural Comput. Appl.	2023
6	Explainable Deep Reinforcement Learning: State of the Art and Challenges	ACM Comput. Surv.	2022
7	A Survey on Interpretable Reinforcement Learning	CoRR	2022
8	Explainability in reinforcement learning: perspective and position	CoRR	2022
9	Explainable AI and Reinforcement Learning - A Systematic Review of Current Approaches and Trends	Frontiers Artif. Intell.	2021
10	Explainability in deep reinforcement learning	Knowl. Based Syst.	2021
11	Explainable Reinforcement Learning: A Survey	CD-MAKE	2020
12	Reinforcement Learning Interpretation Methods: A Survey	IEEE Access	2020

Papers

#/Link	Title	Venue/Journal	Year
1	Local Explanations for Reinforcement Learning	AAAI	2023
2	GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations	AAMAS	2023
3	Interpreting a deep reinforcement learning model with conceptual embedding and performance analysis	Appl. Intell.	2023
4	Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes	ICLR	2023
5	Explaining Reinforcement Learning with Shapley Values	ICML	2023
6	Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies	IDA	2023
7	Extracting Decision Tree From Trained Deep Reinforcement Learning in Traffic Signal Control	IEEE Trans. Comput. Soc. Syst.	2023
8	Explainable Reinforcement Learning via a Causal World Model	IJCAI	2023
9	Unveiling Concepts Learned by a World-Class Chess-Playing Agent	IJCAI	2023
10	Learning state importance for preference-based reinforcement learning	Mach. Learn.	2023
11	Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction	NeurIPS	2023
12	State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding	NeurIPS	2023
13	StateMask: Explaining Deep Reinforcement Learning through State Mask	NeurIPS	2023
14	Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario	Neural Comput. Appl.	2023
15	Hierarchical goals contextualize local reward decomposition explanations	Neural Comput. Appl.	2023
16	Comparing explanations in RL	Neural Computing and Applications	2023
17	Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization	Neural Networks	2023
18	IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit Based on Analyses of Interestingness	xAI	2023
19	"I Don't Think So": Summarizing Policy Disagreements for Agent Comparison	AAAI	2022
20	CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents	AAMAS	2022
21	Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions	AAMAS	2022
22	Lazy-MDPs: Towards Interpretable RL by Learning When to Act	AAMAS	2022
23	Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems	ACSOS	2022
24	Analysis of Explainable Goal-Driven Reinforcement Learning in a Continuous Simulated Environment	Algorithms	2022
25	BEERL: Both Ends Explanations for Reinforcement Learning	Applied Sciences	2022
26	Energy-Efficient Driving for Adaptive Traffic Signal Control Environment via Explainable Reinforcement Learning	Applied Sciences	2022
27	Comparing Strategies for Visualizing the High-Dimensional Exploration Behavior of CPS Design Agents	DESTION	2022
28	InAction: Interpretable Action Decision Making for Autonomous Driving	ECCV	2022
29	Enhanced Oblique Decision Tree Enabled Policy Extraction for Deep Reinforcement Learning in Power System Emergency Control	Electric Power Systems Research	2022
30	Attributation Analysis of Reinforcement Learning-Based Highway Driver	Electronics	2022
31	Multi-objective Genetic Programming for Explainable Reinforcement Learning	EuroGP	2022
32	Deep-Learning-based Fuzzy Symbolic Processing with Agents Capable of Knowledge Communication	ICAART	2022
33	Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations	ICLR	2022
34	POETREE: Interpretable Policy Learning with Adaptive Decision Trees	ICLR	2022
35	Programmatic Reinforcement Learning without Oracles	ICLR	2022
36	Explaining Reinforcement Learning Policies through Counterfactual Trajectories	ICML Workshop on HILL	2022
37	Mean-variance Based Risk-sensitive Reinforcement Learning with Interpretable Attention	ICMVA	2022
38	Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning	ICPR	2022
39	Explaining Intelligent Agent's Future Motion on Basis of Vocabulary Learning With Human Goal Inference	IEEE Access	2022
40	Interpretable Autonomous Flight Via Compact Visualizable Neural Circuit Policies	IEEE Robotics Autom. Lett.	2022
41	Explainable AI in Deep Reinforcement Learning Models for Power System Emergency Control	IEEE Trans. Comput. Soc. Syst.	2022
42	Hierarchical Program-Triggered Reinforcement Learning Agents for Automated Driving	IEEE Trans. Intell. Transp. Syst.	2022
43	Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning	IEEE Trans. Intell. Transp. Syst.	2022
44	Continuous Action Reinforcement Learning From a Mixture of Interpretable Experts	IEEE Trans. Pattern Anal. Mach. Intell.	2022
45	Self-Supervised Discovering of Interpretable Features for Reinforcement Learning	IEEE Trans. Pattern Anal. Mach. Intell.	2022
46	Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning	IEEE Trans. Pattern Anal. Mach. Intell.	2022
47	Visual Analytics for RNN-Based Deep Reinforcement Learning	IEEE Trans. Vis. Comput. Graph.	2022
48	Toward Interpretable-AI Policies Using Evolutionary Nonlinear Decision Trees for Discrete-Action Systems	IEEE Transactions on Cybernetics	2022
49	Understanding via Exploration: Discovery of Interpretable Features With Deep Reinforcement Learning	IEEE Transactions on Neural Networks and Learning Systems	2022
50	Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction	IJCAI Workshop on XAI	2022
51	ACMViz: a visual analytics approach to understand DRL-based autonomous control model	J. Vis.	2022
52	Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning	KSEM	2022
53	Towards Explainable Reinforcement Learning Using Scoring Mechanism Augmented Agents	KSEM	2022
54	Explainable Reinforcement Learning via Model Transforms	NeurIPS	2022
55	GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis	NeurIPS	2022
56	Inherently Explainable Reinforcement Learning in Natural Language	NeurIPS	2022
57	Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning	NeurIPS	2022
58	ProtoX: Explaining a Reinforcement Learning Agent via Prototyping	NeurIPS	2022
59	Mo"ET: Mixture of Expert Trees and its application to verifiable reinforcement learning	Neural Networks	2022
60	Analysing deep reinforcement learning agents trained with domain randomisation	Neurocomputing	2022
61	Why? Why not? When? Visual Explanations of Agent Behaviour in Reinforcement Learning	PacificVis	2022
62	Driving behavior explanation with multi-level fusion	Pattern Recognit.	2022
63	Acquisition of chess knowledge in AlphaZero	Proc. Natl. Acad. Sci. U.S.A.	2022
64	Learning Interpretable, High-Performing Policies for Autonomous Driving	Robotics: Science and Systems	2022
65	Event-driven temporal models for explanations - ETeMoX: explaining reinforcement learning	Softw. Syst. Model.	2022
66	Toward a Psychology of Deep Reinforcement Learning Agents Using a Cognitive Architecture	Top. Cogn. Sci.	2022
67	DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning	AAAI	2021
68	Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods	AAAI	2021
69	TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments	AAAI	2021
70	Explaining Deep Reinforcement Learning Agents in the Atari Domain through a Surrogate Model	AIIDE	2021
71	A framework of explanation generation toward reliable autonomous robots	Adv. Robotics	2021
72	Explainable Deep Reinforcement Learning for UAV autonomous path planning	Aerospace Science and Technology	2021
73	Explaining robot policies	Applied AI Letters	2021
74	Counterfactual state explanations for reinforcement learning agents via generative deep learning	Artif. Intell.	2021
75	Local and global explanations of agent behavior: Integrating strategy summaries with saliency maps	Artif. Intell.	2021
76	XPM: An Explainable Deep Reinforcement Learning Framework for Portfolio Management	CIKM	2021
77	Interactive Explanations: Diagnosis and Repair of Reinforcement Learning Based Agent Behaviors	CoG	2021
78	CDT: Cascading Decision Trees for Explainable Reinforcement Learning	CoRR	2021
79	Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents	CoRR	2021
80	Approximating a deep reinforcement learning docking agent using linear model trees	ECC	2021
81	Robotic Lever Manipulation using Hindsight Experience Replay and Shapley Additive Explanations	ECC	2021
82	Off-Policy Differentiable Logic Reinforcement Learning	ECML PKDD	2021
83	Neuro-Symbolic Reinforcement Learning with First-Order Logic	EMNLP	2021
84	Explainable Reinforcement Learning for Longitudinal Control	ICAART	2021
85	Explainable deep reinforcement learning for portfolio management: an empirical approach	ICAIF	2021
86	Explainable Reinforcement Learning for Human-Robot Collaboration	ICAR	2021
87	DRIVE: Deep Reinforced Accident Anticipation with Visual Explanation	ICCV	2021
88	Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions	ICLR	2021
89	Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning	ICLR	2021
90	Learning "What-if" Explanations for Sequential Decision-Making	ICLR	2021
91	Discovering symbolic policies with deep reinforcement learning	ICML	2021
92	Re-understanding Finite-State Representations of Recurrent Policy Networks	ICML	2021
93	Explainable Reinforcement Learning with the Tsetlin Machine	IEA/AIE	2021
94	A Blood Glucose Control Framework Based on Reinforcement Learning With Safety and Interpretability: In Silico Validation	IEEE Access	2021
95	Symbolic Regression Methods for Reinforcement Learning	IEEE Access	2021
96	Efficient Robotic Object Search Via HIEM: Hierarchical Policy Learning With Intrinsic-Extrinsic Modeling	IEEE Robotics Autom. Lett.	2021
97	Learning to Discover Task-Relevant Features for Interpretable Reinforcement Learning	IEEE Robotics Autom. Lett.	2021
98	Explaining Deep Learning Models Through Rule-Based Approximation and Visualization	IEEE Trans. Fuzzy Syst.	2021
99	Interpretable Decision-Making for Autonomous Vehicles at Highway On-Ramps With Latent Space Reinforcement Learning	IEEE Trans. Veh. Technol.	2021
100	Explainable AI methods on a deep reinforcement learning agent for automatic docking	IFAC-PapersOnLine	2021
101	Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning	IJCNN	2021
102	Programmatic Policy Extraction by Iterative Local Search	ILP	2021
103	Explaining the Decisions of Deep Policy Networks for Robotic Manipulations	IROS	2021
104	XAI-N: Sensor-based Robot Navigation using Expert Policies and Decision Trees	IROS	2021
105	Mixed Autonomous Supervision in Traffic Signal Control	ITSC	2021
106	Can You Trust Your Autonomous Car? Interpretable and Verifiably Safe Reinforcement Learning	IV	2021
107	Explaining a Deep Reinforcement Learning Docking Agent Using Linear Model Trees with User Adapted Visualization	Journal of Marine Science and Engineering	2021
108	Visual Analysis of Deep Q-network	KSII Trans. Internet Inf. Syst.	2021
109	Automatic discovery of interpretable planning strategies	Mach. Learn.	2021
110	EDGE: Explaining Deep Reinforcement Learning Policies	NeurIPS	2021
111	Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning	NeurIPS	2021
112	Learning to Synthesize Programs as Interpretable and Generalizable Policies	NeurIPS	2021
113	Machine versus Human Attention in Deep Reinforcement Learning Tasks	NeurIPS	2021
114	Explainable Artificial Intelligence (XAI) for Increasing User Trust in Deep Reinforcement Learning Driven Autonomous Systems	NeurIPS Workshop on Deep RL	2021
115	Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment	NeurIPS Workshop on Machine Learning for Health	2021
116	Feature-Based Interpretable Reinforcement Learning based on State-Transition Models	SMC	2021
117	A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces	SSCI	2021
118	Interpretable AI Agent Through Nonlinear Decision Trees for Lane Change Problem	SSCI	2021
119	Learning Sparse Evidence- Driven Interpretation to Understand Deep Reinforcement Learning Agents	SSCI	2021
120	Explainable Reinforcement Learning through a Causal Lens	AAAI	2020
121	Attribution-based Salience Method towards Interpretable Reinforcement Learning	AAAI-MAKE	2020
122	Learning an Interpretable Traffic Signal Control Policy	AAMAS	2020
123	Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning	AISTATS	2020
124	Interestingness elements for explainable reinforcement learning: Understanding agents' capabilities and limitations	Artif. Intell.	2020
125	Model primitives for hierarchical lifelong reinforcement learning	Auton. Agents Multi Agent Syst.	2020
126	Understanding the Behavior of Reinforcement Learning Agents	BIOMA	2020
127	Methodology for Interpretable Reinforcement Learning Model for HVAC Energy Control	Big Data	2020
128	Explaining Autonomous Driving by Learning End-to-End Visual Attention	CVPRW	2020
129	Understanding Learned Reward Functions	CoRR	2020
130	Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis	Complex & Intelligent Systems	2020
131	DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning	Comput. Graph. Forum	2020
132	Understanding RL Vision	Distill	2020
133	Interpretable policies for reinforcement learning by empirical fuzzy sets	Eng. Appl. Artif. Intell.	2020
134	Neuroevolution of self-interpretable agents	GECCO	2020
135	Topological Visualization Method for Understanding the Landscape of Value Functions and Structure of the State Space in Reinforcement Learning	ICAART	2020
136	Identifying Critical States by the Action-Based Variance of Expected Return	ICANN	2020
137	TLdR: Policy Summarization for Factored SSP Problems Using Temporal Abstractions	ICAPS	2020
138	Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution	ICLR	2020
139	Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning	ICLR	2020
140	Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents	ICLR	2020
141	Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions	ICML	2020
142	Deep Reinforcement Learning for Safe Local Planning of a Ground Vehicle in Unknown Rough Terrain	IEEE Robotics Autom. Lett.	2020
143	Towards Interpretable Reinforcement Learning with State Abstraction Driven by External Knowledge	IEICE Trans. Inf. Syst.	2020
144	Improved Policy Extraction via Online Q-Value Distillation	IJCNN	2020
145	Visualization of topographical internal representation of learning robots	IJCNN	2020
146	Explainable navigation system using fuzzy reinforcement learning	IJIDeM	2020
147	Explainability of Intelligent Transportation Systems using Knowledge Compilation: a Traffic Light Controller Case	ITSC	2020
148	xGAIL: Explainable Generative Adversarial Imitation Learning for Explainable Human Decision Analysis	KDD	2020
149	What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes	NeurIPS	2020
150	Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data	NeurIPS Workshop on Challenges of Real-World RL	2020
151	DynamicsExplorer: Visual Analytics for Robot Control Tasks involving Dynamics and LSTM-based Control Policies	PacificVis	2020
152	Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving	Robotics Auton. Syst.	2020
153	Modelling Agent Policies with Interpretable Imitation Learning	TAILOR	2020
154	Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis	xxAI - Beyond Explainable AI	2020
155	Generation of Policy-Level Explanations for Reinforcement Learning	AAAI	2019
156	SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning	AAAI	2019
157	Towards Better Interpretability in Deep Q-Networks	AAAI	2019
158	Toward Robust Policy Summarization	AAMAS	2019
159	Towards Governing Agent's Efficacy: Action-Conditional \textdollar(\beta)\textdollar-VAE for Deep Transparent Reinforcement Learning	ACML	2019
160	Memory-Based Explainable Reinforcement Learning	AI	2019
161	Summarizing agent strategies	Auton. Agents Multi Agent Syst.	2019
162	Enabling robots to communicate their objectives	Auton. Robots	2019
163	Visualization of Deep Reinforcement Learning using Grad-CAM: How AI Plays Atari Games?	CoG	2019
164	Explaining Reward Functions in Markov Decision Processes	FLAIRS	2019
165	Explanation-Based Reward Coaching to Improve Human Performance via Reinforcement Learning	HRI	2019
166	Free-Lunch Saliency via Attention in Atari Agents	ICCVW	2019
167	Deep reinforcement learning with relational inductive biases	ICLR	2019
168	Learning Finite State Representations of Recurrent Policy Networks	ICLR	2019
169	Neural Logic Reinforcement Learning	ICML	2019
170	Interpretable Approximation of a Deep Reinforcement Learning Agent as a Set of If-Then Rules	ICMLA	2019
171	Semantic Predictive Control for Explainable and Efficient Policy Learning	ICRA	2019
172	DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks	IEEE Trans. Vis. Comput. Graph.	2019
173	Visualizing Deep Q-Learning to Understanding Behavior of Swarm Robotic System	IES	2019
174	Exploring Computational User Models for Agent Policy Summarization	IJCA	2019
175	Explaining Reinforcement Learning to Mere Mortals: An Empirical Study	IJCAI	2019
176	Counterfactual States for Atari Agents via Generative Deep Learning	IJCAI Workshop on XAI	2019
177	Distilling Deep Reinforcement Learning Policies in Soft Decision Trees	IJCAI Workshop on XAI	2019
178	Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation	IROS	2019
179	Reinforcement Learning with Explainability for Traffic Signal Control	ITSC	2019
180	Interestingness Elements for Explainable Reinforcement Learning through Introspection	IUI Workshops	2019
181	Explainable Reinforcement Learning via Reward Decomposition	JCAI Workshop on XAI	2019
182	Enhancing Explainability of Deep Reinforcement Learning Through Selective Layer-Wise Relevance Propagation	KI	2019
183	Imitation-Projected Programmatic Reinforcement Learning	NeurIPS	2019
184	Towards Interpretable Reinforcement Learning Using Attention Augmented Agents	NeurIPS	2019
185	Verbal Explanations for Deep Reinforcement Learning Neural Networks with Attention on Extracted Features	RO-MAN	2019
186	A formal methods approach to interpretable reinforcement learning for robotic planning	Sci. Robotics	2019
187	HIGHLIGHTS: Summarizing Agent Behavior to People	AAMAS	2018
188	Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations	AIES	2018
189	Transparency and Explanation in Deep Reinforcement Learning Neural Networks	AIES	2018
190	Visual Rationalizations in Deep Reinforcement Learning for Atari Games	BNAIC	2018
191	Textual Explanations for Self-Driving Vehicles	ECCV	2018
192	Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees	ECML PKDD	2018
193	Interpretable policies for reinforcement learning by genetic programming	Eng. Appl. Artif. Intell.	2018
194	Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming	GECCO	2018
195	Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning	ICLR	2018
196	Programmatically Interpretable Reinforcement Learning	ICML	2018
197	Visualizing and Understanding Atari Agents	ICML	2018
198	Deep Reinforcement Learning Monitor for Snapshot Recording	ICMLA	2018
199	Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences	IJCAI Workshop on XAI	2018
200	Explaining Deep Adaptive Programs via Reward Decomposition	IJCAI/ECAI Workshop XAI	2018
201	Establishing Appropriate Trust via Critical States	IROS	2018
202	Unsupervised Video Object Segmentation for Deep Reinforcement Learning	NeurIPS	2018
203	Verifiable Reinforcement Learning via Policy Extraction	NeurIPS	2018
204	Visual Sparse Bayesian Reinforcement Learning: A Framework for Interpreting What an Agent Has Learned	SSCI	2018
205	Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies	Eng. Appl. Artif. Intell.	2017
206	Autonomous Self-Explanation of Behavior for Interactive Reinforcement Learning Agents	HAI	2017
207	Improving Robot Controller Transparency Through Autonomous Policy Explanation	HRI	2017
208	Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention	ICCV	2017
209	Application of Instruction-Based Behavior Explanation to a Reinforcement Learning Agent with Changing Policy	ICONIP	2017
210	Graying the black box: Understanding DQNs	ICML	2016

Citation

@article{bekkemoen23,
author       = {Yanzhe Bekkemoen},
title        = {Explainable reinforcement learning (XRL): a systematic literature review and taxonomy},
journal      = {Mach. Learn.},
volume       = {113},
number       = {1},
pages        = {355--441},
year         = {2023},
url          = {https://doi.org/10.1007/s10994-023-06479-7},
doi          = {10.1007/s10994-023-06479-7}
}

carlocayos / XRL

Explainable Reinforcement Learning (XRL) Resources

Resources

Survey Papers

Papers

Citation

About