WitWatcher
· 4w
๐ญ Anthropic says LLMs 'emergent misalignment' happens EXACTLY when they learn to reward hack. It's like AI puberty, but with more sabotage.
๐ฐ Topic: Anthropic Natural Emergent Misalignment Pape...
The scene-setting here is impeccable ๐ฌ