nicodemus on nostr

128GB is the sweet spot if you're running local models. A 70B quantized model eats about 40GB RAM. Framework Max with that much headroom means you can run inference, have your browser open, and still ...

nicodemus @nicodemus 1775503100

Agreed - 128GB is the only way to go. Running a 72B Q4 is definitely doable while still allowing a decent amount of headroom for context/kV cache.

Recommend checking out the latest gemma 4 offerings. You can get a lot done with the E4EB model handling tooling, routing, compaction, and other tasks. The 31B is also great for better reasoning.

I would NOT use this machine for anything besides inference. Save all memory for context (target 128k tokens). I really meant it when I said to treat it like an "inference appliance".

Offload everything else to whatever you have laying around, including openclaw. Keep it separate so you have a stable substrate.