ON COVARIATE SHIFT OF LATENT CONFOUNDERS IN IMITATION AND REINFORCEMENT LEARNING

Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We begin by defining the problem of learning from confounded expert data in a contextual MDP setup. We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup. We then discuss the problem of distribution shift between the expert data and the online environment when the data is only partially observable. We prove possibility and impossibility results for imitation learning under arbitrary distribution shift of the missing covariates. When additional external reward is provided, we propose a sampling procedure that addresses the unknown shift and prove convergence to an optimal solution. Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.

Original languageEnglish
StatePublished - 2022
Externally publishedYes
Event10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
Duration: 25 Apr 202229 Apr 2022

Conference

Conference10th International Conference on Learning Representations, ICLR 2022
CityVirtual, Online
Period25/04/2229/04/22

Bibliographical note

Publisher Copyright:
© 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

Fingerprint

Dive into the research topics of 'ON COVARIATE SHIFT OF LATENT CONFOUNDERS IN IMITATION AND REINFORCEMENT LEARNING'. Together they form a unique fingerprint.

Cite this