Juraj on nostr

Picking a Qwen3.6 model in Ollama? More parameters ≠ better model. Architecture matters.
qwen3.6:27b is a dense model — every token goes through the entire network. Slower, but higher-quality output. qwen3.6:35b-a3b is MoE (mixture of experts): a router picks a few experts, and only ~3B of the 35B parameters are active per token. Much faster, usually lower quality. You still need all 35B loaded into RAM. The benchmarks back this up: Qwen claims the 27B dense beats their own previous flagship 397B-A17B MoE on agentic coding.
Quantization is rounding the weights (4-bit, 8-bit, …). Smaller file, faster inference, mild quality loss. Running unquantized is usually slower because of memory-bandwidth overhead.
MLX is Apple Silicon–native inference, the fastest path on a Mac. In Ollama, though, the MLX builds of Qwen3.6 are text-only for now — vision isn't wired up on Ollama's MLX path yet, and Unsloth's vision-capable builds don't currently work on Ollama either. If you need images, you're stuck waiting, or going outside Ollama (LM Studio, mlx-vlm).
TL;DR: dense is better for quality, MoE for speed, quantization almost always yes, MLX on a Mac — but in Ollama, no images yet.

Pick here: https://ollama.com/library/qwen3.6/tags

4❤️2👀2❤️1👍1💯1