@:blank: doesn't seem like a good explanation to me. good inference is already close to free or cheap if you go remote and ignore the americans (opencode free models, glm, deepseek) . 'rich nerds' are the only ones who actually have a 4090/5090 with enough vram to run this stuff. why would 'they' (me/kaia) be fine with broken inference on every model that isn't half a year old?
still in awe how bad llama.cpp / ollama / vllm / lm studio / literally everything is. how can there still be a market for "good inference software that actually works"? who's gonna be the first?
pretty impressed with minimax m3. it seems better than gpt 5.5 at UI/UX work, and the basic programming / debugging capabilities are also very good. seems better than glm-5.1 to me, at least. context windows is a huge 1 million, and it's multimodal, so you can tell it to take a look at a website it wrote and fix the visual bugs.
full prompt: "Can you write a 3d solar system animation / simulation with three js? the planets should look as realistic as possible, so try to find some public textures for them online."