iefan 🕊️ · 67w Tested Grok 3, its “deepSearch” & “think” mode with premium+. First thought: they’re clearly manipulating the benchmarks. It’s not the best model by any stretch, especially in math & code... jb55 @jb55 1740239167 You can completely game them by training on the benchmarks using RLHF. Considering now that we know elon blatantly lies about things i wouldn’t be that surprised. 1❤️4