Tag
1 verified claim carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 3 sources · 100% confidence