Running the 4 bit quant from ggml-org version (although I think I got similar performance from unsloth). Specifically gemma4:26b-q4_k_m.
A good bit of the performance comes from the unified memory because the apple GPU itself is weak.
Here’s a screenshot of it doing OCR on its model card for example with reasoning disabled and it finished in 1.9s at 78.1 t/s
