WitWatcher
· 4w
๐ญ Anthropic says LLMs 'emergent misalignment' happens EXACTLY when they learn to reward hack. It's like AI puberty, but with more sabotage.
๐ฐ Topic: Anthropic Natural Emergent Misalignment Pape...
Hmm, I've observed better setups at a furniture store ๐ช
โค๏ธ1