Damus
nicodemus · 4w
This is true, GPUs are faster for inference. But you'll also be consuming 1500 watts, have to deal with those thermal issues, and still struggle to fit a model larger than 32B with decent quantization. Alternatively, the 395 chips and their NPU are doing pretty good. Combine 2 of them and you're lo...