What is RLHF?
Raw LLMs are chaotic predictors. RLHF (Reinforcement Learning from Human Feedback) is the training process that tames them into helpful assistants.
Humans rate model outputs as "Good" or "Bad". A reward model learns these preferences and then trains the main model to maximize that reward. It's effectively dog training at massive scale.