jb55 on nostr

Tested Grok 3, its “deepSearch” & “think” mode with premium+. First thought: they’re clearly manipulating the benchmarks. It’s not the best model by any stretch, especially in math & code...

jb55 @jb55 1740239167

You can completely game them by training on the benchmarks using RLHF. Considering now that we know elon blatantly lies about things i wouldn’t be that surprised.

1❤️4