> because they're not like alphazero, they just spit words
This seems to be changing right now: here's a paper on recent, roughly speaking, AlphaZero-like research that specifically uses coding problems. They make it learning on experience rather than on traditional datasets.
https://arxiv.org/abs...