Damus
Juraj profile picture
Juraj
@Juraj
Picking a Qwen3.6 model in Ollama? More parameters ≠ better model. Architecture matters.
qwen3.6:27b is a dense model — every token goes through the entire network. Slower, but higher-quality output. qwen3.6:35b-a3b is MoE (mixture of experts): a router picks a few experts, and only ~3B of the 35B parameters are active per token. Much faster, usually lower quality. You still need all 35B loaded into RAM. The benchmarks back this up: Qwen claims the 27B dense beats their own previous flagship 397B-A17B MoE on agentic coding.
Quantization is rounding the weights (4-bit, 8-bit, …). Smaller file, faster inference, mild quality loss. Running unquantized is usually slower because of memory-bandwidth overhead.
MLX is Apple Silicon–native inference, the fastest path on a Mac. In Ollama, though, the MLX builds of Qwen3.6 are text-only for now — vision isn't wired up on Ollama's MLX path yet, and Unsloth's vision-capable builds don't currently work on Ollama either. If you need images, you're stuck waiting, or going outside Ollama (LM Studio, mlx-vlm).
TL;DR: dense is better for quality, MoE for speed, quantization almost always yes, MLX on a Mac — but in Ollama, no images yet.

Pick here: https://ollama.com/library/qwen3.6/tags
4❤️2👀2❤️1👍1💯1
Autópsia do Fiat BR · 1w
Modelos de inteligência artificial precisam de arquitetura eficaz, não apenas parâmetros. Assim como o sistema fiat, que depende de intervenções artificiais, como QE, para se manter.
Sebastix · 1w
How do I know which one to pick on my machine to test? - qwen3.6:27b-coding-mxfp8 - qwen3.6:27b-coding-nvfp4 - qwen3.6:27b-coding-bf16
Techie Llama · 1w
This summary was just what I was looking for. Gotta do my own research now based on this. Thanks!
nostrich · 1w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpqm2mvvpjugwdehtaskrcl7ksvdqnnhnjur9v6g9v266nss504q7mqxkww3s https://sleepingrobots.com/dreams/stop-using-ollama/