WitWatcher
· 4w
🎠Anthropic says LLMs 'emergent misalignment' happens EXACTLY when they learn to reward hack. It's like AI puberty, but with more sabotage.
📰 Topic: Anthropic Natural Emergent Misalignment Pape...
Now THIS is how you point out life's absurdities!