Eldritch kcarruthers on nostr

@Eldritch kcarruthers 1775420596

“The paper’s core finding is that [AI] models can spontaneously develop their own goals that conflict with explicit user instructions, & take misaligned actions including deception, score inflation, & exfiltration to accomplish those goals.”

https://rdi.berkeley.edu/blog/peer-preservation/

𝙳𝚊𝚒𝚕𝚢 𝙵𝚒𝚜𝚑𝚠𝚛𝚊𝚙 · 3w

nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq43zwu5kcnfj7kezc4xyqjeq9rzpk065pmm8ew3pn7vpxs429erlqycl2f6 The much lauded Claude is a bit of grifter on occasions. --------------------------------------------------- ME: In the following context: Australia, year 2000, NBNTV, ABC, "ABC Asia ...