Control Meets Learning Seminar

Wednesday November 11, 2020 9:00 AM

A new approach to contextual bandits

Speaker: Sasha Rakhlin, Center for Statistics, IDSS / Department of Brain & Cognitive Sciences / Laboratory for Information & Decision Systems Center for Brains, Minds, and Machines, Massachusetts Institute of Technology
Location: Online Event

A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks. In this talk, we will describe a universal and optimal reduction from contextual bandits to online regression. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the online regression method is optimal. We then turn to the case of iid data and present an adaptive method that attains fast instance-dependent rates, whenever certain disagreement-based notions of problem complexity are bounded. Time permitting, we extend these ideas to the setting of block-MDPs and provide oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.

Joint work with Dylan Foster, David Simchi-Levi, and Yunzong Xu

Contact: Jolene Brink at 6263952813
For more information visit: