jack on nostr

not sure how well the benchmarks reflect everyday use, but the fact that a model as small as a 4b one can get higher than GPT-4o (ChatGPT’s “main” model for a while in a way) is CRAZY!

gonna try squeeze this onto my Mac mini (M2, 8/256 😅)

[alt 1] Tweet from Simon Willison (@simonw) “Qwen3.5 4B apparently out-scores GPT-4o on some of the classic benchmarks (!)” quoting a tweet showing that 5 of 7 benchmark results score Qwen 3.5 4 billion parameter model as higher than ChatGPT 4o.