Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
Subject
Reinforcement Learning from Human Feedback (RLHF)
Predicate
introduced_in_paper
Object
Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017)
Primary source · preprint · 2017-06-12
Deep Reinforcement Learning from Human Preferences — arXiv (Christiano, Leike, Brown, Martic, Legg, Amodei)