Fast and Safe Policy Adaptation via Alignment-based Transfer

Abstract: Applying deep reinforcement learning to physical systems, as opposed to learning in simulation, presents additional challenges in terms of sample efficiency and safety. Collecting large amounts of hardware demonstration data is time-consuming and the exploratory behavior of reinforcement learning algorithms may lead the system into dangerous states, especially during the early stages of training. To address these challenges, we apply transfer learning to reuse a previously learned policy instead of learning from scratch. In this paper, we propose a method where given a source policy, policy adaptation is performed via transfer learning to produce a target policy suitable for real-world deployment. For policy adaptation, alignment-based transfer learning is applied to trajectories generated by the source policy and their corresponding safe target trajectories. We apply this method to manipulators and show that the proposed method is applicable to both inter-task and inter-robot transfer whilst considering safety. We also show that the resulting target policy is robust and can be further improved with reinforcement learning.

Bibtex

@inproceedings{kim2019fast,
  title={Fast and Safe Policy Adaptation via Alignment-based Transfer},
  author={Kim, Jigang and Choi, Seungwon and Kim, H Jin},
  booktitle={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={990--996},
  year={2019},
  organization={IEEE}
}