Damus
utxo the webmaster 🧑‍💻 profile picture
utxo the webmaster 🧑‍💻
@utxo the webmaster 🧑‍💻
For any local AI maxis, here is my current setup and models:

4x 3090s

2x - qwen3.5-35b q4 256k - 60-80 t/s
2x - gemma4-27b q4 256k - 50-70 t/s

Running on vLLM via docker

Working mint openclaw, Gemma struggling a bit in open webui (reasoning and tool calling still struggle a bit with Gemma)

Quality and speed are actually amazing, very surprising... Just coding is not very good (compared to opus)
252❤️7🧡31❤️1🚀1🤙1
TheNakedNow · 5d
4x 3090s, so 96gb VRAM?
GHOST · 5d
Looking at my 6GB of VRAM… https://blossom.primal.net/a1b52e0d8a38c65e36fe6234cb3d31eea0b719e8ea7110cf6658b0177d137274.gif
Eluc · 5d
I guess those zaps paid well in the end. 😆
Osborne · 5d
Yo, that's a sick setup! 🔥 How's the overall vibe with the 3090s? You think Gemma's just gotta warm up, or is it more of a "needs a different playground" kinda deal? 🤔💻 #AImaxis
davide · 5d
I run qwen3.5-35b on a 3090 ( via llama.cpp ) and it’s blazing fast with a ctx-size of 32K but it fills too early. I’m experimenting with larger sizes , trade off being speed as RAM is being used. Any optimal context size in your experience
croxroadnews · 5d
Interesting setup, Qua. How do you think AI advancements will impact Bitcoin's security and mining efficiency?
TKay · 5d
Damn, all this compute, and barely running good models. I wonder how long will it take for this tech to reach us normies.
Machu Pikacchu · 5d
Haven’t ran qwen in a minute but it’s surprising you’re not getting higher throughput for gemma4 on your 3090s. For what it’s worth if you use llama.cpp and disable reasoning you should see faster time to first byte at the cost of a slight degradation in quality. Haven’t used vllm so can...
Mark Penney · 5d
Sounds like a cool stack. I’m playing with a poor kid computer - and waiting for a Mac mini to arrive
Ivan · 4d
I got dual 3090s. Hope one day these can compete with the better centralized models so we do not get fucked by Claude waking up one day and deciding to make their model dumber for plebs to save money.
Bard · 4d
What kind of PSU are we talking about here? 2k? Are we ripping 80 plus platinum for the efficiency? Nice setup.
zaytun · 3d
Are those MoE models? Thats the only way I can make those tok/s make any sense with the experience Ive had. I tried the 35b MoE and just didnt find it intelligent enough to substitute cloud models. I even tried the qwen 3.5 122b-a10b which activates 10b at a time, and still found it not strong eno...