I’ve seen evidence that unified memory computers like Apple Studio and AMD Strix Halo can be clustered with RDMA over Ethernet to achieve usable tokens per second, like ~15 t/s with fairly large models. Of course these clusters are not cheap for most consumers especially at today’s hardware pric...