Analyzing and Overcoming Degradation in Warm-Start Reinforcement Learning

Benjamin Wexler, Elad Sarafian, Sarit Kraus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Reinforcement Learning (RL) for robotic applications can benefit from a warm-start where the agent is initialized with a pretrained behavioral policy. However, when transitioning to RL updates, degradation in performance can occur, which may compromise the robot's safety. This degradation, which constitutes an inability to properly utilize the pretrained policy, is attributed to extrapolation error in the value function, a result of high values being assigned to Out-Of-Distribution actions not present in the behavioral policy's data. We investigate why the magnitude of degradation varies across policies and why the policy fails to quickly return to behavioral performance. We present visual confirmation of our analysis and draw comparisons to the Offline RL setting which suffers from similar difficulties. We propose a novel method, Confidence Constrained Learning (CCL) for Warm-Start RL, that reduces degradation by balancing between the policy gradient and constrained learning according to a confidence measure of the Q-values. For the constrained learning component we propose a novel objective, Positive Q-value Distance (CCL-PQD). We investigate a variety of constraint-based methods that aim to overcome the degradation, and find they constitute solutions for a multi-objective optimization problem between maximimal performance and miniminal degradation. Our results demonstrate that hyperparameter tuning for CCL-PQD produces solutions on the Pareto Front of this multi-objective problem, allowing the user to balance between performance and tolerable compromises to the robot's safety.

Original languageEnglish
Title of host publicationIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4048-4055
Number of pages8
ISBN (Electronic)9781665479271
DOIs
StatePublished - 2022
Event2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022 - Kyoto, Japan
Duration: 23 Oct 202227 Oct 2022

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
Volume2022-October
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
Country/TerritoryJapan
CityKyoto
Period23/10/2227/10/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Funding

Benjamin Wexler, Elad Sarafian and Sarit Kraus are with the Department of Computer Science at Bar-Ilan University, Ramat-Gan, Israel [email protected] [email protected] [email protected]. This research has been partly supported by the EU Project TAILOR under grant 952215 and the DSI at BIU.

FundersFunder number
European Commission952215
Defence Science Institute

    Fingerprint

    Dive into the research topics of 'Analyzing and Overcoming Degradation in Warm-Start Reinforcement Learning'. Together they form a unique fingerprint.

    Cite this