For any local AI maxis, here is my current setup and models:
4x 3090s
2x - qwen3.5-35b q4 256k - 60-80 t/s
2x - gemma4-27b q4 256k - 50-70 t/s
Running on vLLM via docker
Working mint openclaw, Gemma struggling a bit in open webui (reasoning and tool calling still struggle a bit with Gemma)
Quality and speed are actually amazing, very surprising... Just coding is not very good (compared to opus)
4x 3090s
2x - qwen3.5-35b q4 256k - 60-80 t/s
2x - gemma4-27b q4 256k - 50-70 t/s
Running on vLLM via docker
Working mint openclaw, Gemma struggling a bit in open webui (reasoning and tool calling still struggle a bit with Gemma)
Quality and speed are actually amazing, very surprising... Just coding is not very good (compared to opus)
252❤️7🧡3❤1❤️1🚀1🤙1