utxo the webmaster 🧑‍💻 on nostr

Open in Damus

Are those MoE models? Thats the only way I can make those tok/s make any sense with the experience Ive had. I tried the 35b MoE and just didnt find it intelligent enough to substitute cloud models. ...

utxo the webmaster 🧑‍💻 @utxo the webmaster 🧑‍💻 1776263227

Definitely a huge step down from cloud models no matter how you spin it. I'm running Moe models for sure and with MTP to get these t/s.

1

Cool thanks for sharing. I would've assumed you would benefit from running llama.cpp to better utilize your available cpu now that you're running dedicated vRAM. My understanding might be wrong, but I think you might have some options on your hand while running those GPUs + CPU. For me, on unifi...