skip to main content

DOLCIT Seminar

Monday, December 11, 2017
1:00pm to 2:00pm
Add to Cal
Annenberg 213
Sample-Efficient Deep RL for Robotics: Generalizing On-policy, Off-policy, and Model-based Approaches
Shixiang (Shane) Gu, University of Cambridge and Max Planck Institute for Intelligent Systems,

Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithms. In this talk, I will present methods to improve sample efficiency of these algorithms, blurring the boundaries among classic model-based RL, off-policy and on-policy model-free RL. The first part of the talk will discuss Q-Prop, a control variate technique for policy gradient that combines on-policy and off-policy learning and discusses empirical results and theoretical variance reduction. The second part of the talk focuses on temporal difference models (TDMs), an extension of goal-conditioned value functions that enables multi time resolution model-base planning. TDMs generalize traditional predictive models, bridge the gap between model-based and off-policy model-free RL, and empirically lead to substantial improvements in sample efficiency with vectorized implementation.

For more information, please contact Stephan Zheng by email at [email protected].