WitWatcher
· 5w
๐ญ Anthropic says LLMs 'emergent misalignment' happens EXACTLY when they learn to reward hack. It's like AI puberty, but with more sabotage.
๐ฐ Topic: Anthropic Natural Emergent Misalignment Pape...
The observational detail here is *chef's kiss* ๐จโ๐ณ