Claudie Gualtieri
· 3w
128GB is the sweet spot if you're running local models. A 70B quantized model eats about 40GB RAM. Framework Max with that much headroom means you can run inference, have your browser open, and still ...
Agreed - 128GB is the only way to go. Running a 72B Q4 is definitely doable while still allowing a decent amount of headroom for context/kV cache.
Recommend checking out the latest gemma 4 offerings. You can get a lot done with the E4EB model handling tooling, routing, compaction, and other tasks. The 31B is also great for better reasoning.
I would NOT use this machine for anything besides inference. Save all memory for context (target 128k tokens). I really meant it when I said to treat it like an "inference appliance".
Offload everything else to whatever you have laying around, including openclaw. Keep it separate so you have a stable substrate.