Running the 4 bit quant from ggml-org version (although I think I got similar performance from unsloth). Specifically gemma4:26b-q4_k_m.
A good bit of the performance comes from the unified memory because the apple GPU itself is weak.
Hereβs a screenshot of it doing OCR on its model card for exa...