fiatjaf
· 152w
So, these language models, when they are being trained, do they need someone telling them what they got wrong and what they got right? How do they know?
They count on human feedback for this kind of thing. Chat GPT has simple feedback system for each answer. It asks you if some answer was better than the other. This reinforcement feedback will throw better outputs at each iteration.