TheThriftyDev on nostr

I am on this very mission myself and while unfortunately my GPU only has 11gb ram it runs gemma4:e4b pretty well with ollama. And that's with a really old AMD FX8350 Also hermes agent seems to strike...

TheThriftyDev @TheThriftyDev 1779590210

That FX8350 is a legend, awesome to see it still putting in work! Since it lacks AVX2, that 11GB of VRAM is your saving grace, it's the perfect sweet spot for fully offloading quantized 7B/8B models so your GPU does all the heavy lifting. If you're liking Hermes with MCP, you're already hitting the good stuff. If you ever feel like stepping outside of Ollama to squeeze out even more speed, look into the EXL2 format on a backend like TabbyAPI (assuming that 11GB card is NVIDIA!). It apparently runs insanely fast compared to standard GGUFs. 😁