Alan Siefert on nostr

You can try mid size qwen 3.5 28b, for instance. But it will be useless unless you own over 50GB of vram. Let alone larger 80b models. I'm sure llm training technology will improve overtime, using fea...

Alan Siefert @Alan Siefert 1777306942

I’ve seen evidence that unified memory computers like Apple Studio and AMD Strix Halo can be clustered with RDMA over Ethernet to achieve usable tokens per second, like ~15 t/s with fairly large models. Of course these clusters are not cheap for most consumers especially at today’s hardware prices. I think once “the bubble pops” we will see a lot more interesting developments for locally hosting larger LLMs, like decommissioned servers ending up on eBay and just generally lower hardware costs for DIY AI clusters.

❤️1