Crazy man.. its all moving so fast. DDTree coming up too, whatever the heck that means.
Speculative decoding that optimizes guessing which draft a model has created as response to a prompt that will best answer the question.
My ELI5 sentence there doesnt even make sense but yeah i think thats somewhat what it is!
Its crazy to feel the speed at which the software behind local LLM is being optimized. It means buying a machine today that can run x billion parameter models should be able to do x+y billion parameters in two months.