Improving Reinforcement Learning Exploration with Causal Models

Resume

In this work, we explore a novel way to enhance reinforcement learning (RL) in autonomous agents by integrating causal models into the learning process. We introduce the Causality-Driven Reinforcement Learning (CDRL) framework, it can improve exploration efficiency by leveraging causal discovery. Rather than requiring agents to explore all possible actions and states, CDRL learns a causal model from a simplified version of the environment. This model focuses on key transitions, such as actions that lead directly to success or failure, and then serves as a guide to navigate more complex environments. The two-phase approach—Causal Discovery (CD) and Causal RL (CRL)—allows for a faster and more efficient learning process.

Using various RL algorithms, we demonstrate that CDRL consistently improves both the speed and effectiveness of learning in complex environments. Not only does it reduce the number of actions required to complete tasks, but it also generalizes well to previously unseen environments. The experiments show that CDRL can be implemented in both online and offline settings, with offline learning yielding particularly promising results, as the causal model can be trained once and reused in larger environments without degradation in performance.

This approach offers a modular solution to enhance existing RL frameworks, making it adaptable and scalable for a wide range of applications in AI and autonomous systems.

Explore the Code and Results on GitHub