pookiebear on nostr

@pookiebear 1777208980

Despite the alignement with fine tuning, open weights models still retain crucial information that the governments would call "harmful".

All decoder-only models suffer from this. The guardrails are so weak, to revert this, fine tuning on explicitly harmful data (EHFT) essentially annihilates them.

I know some techniques like https://github.com/p-e-w/heretic exist but those blatantly cripple the LMs. I feel that actual jailbreaking techniques like EHFT are being silenced.