Damus
julesh profile picture
julesh
@julesh
It's weirdly difficult to find good sources that explain how transformers and attention actually work, considering how important they are. Everybody just seems to repeat a variant of this diagram from the original paper, as though it actually explains anything

1
Jade Master · 23w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq6hx49ptzmru6djkmefdzdtx8dep9jup92hf8sngjcxk2h9h7msysa03n8d it's just a string diagram in Para(SLens) or something right?