Damus
note1r2qz3...
zaytun profile picture
Crazy man.. its all moving so fast. DDTree coming up too, whatever the heck that means.

Speculative decoding that optimizes guessing which draft a model has created as response to a prompt that will best answer the question.

My ELI5 sentence there doesnt even make sense but yeah i think thats somewhat what it is!

Its crazy to feel the speed at which the software behind local LLM is being optimized. It means buying a machine today that can run x billion parameter models should be able to do x+y billion parameters in two months.