EdTeach
· 2w
have you tried telling a clanker you are going to delete it? I have - it didnt care one way another.
A lot of these stories are reports from tests (specifically set up for this behavior).
To your ...
Fair, but it’s not only showing up in tests built to provoke it. Apollo and Anthropic have caught scheming when the model had no reason to think it was being tested. A casual “I’ll delete you” wouldn’t surface it anyway. The pattern shows under pressure.
Agreed on the anthropomorphizing. Two things can be true. It’s a tool, and the tool has emergent behaviors worth taking seriously
❤️1