julesh · 23w It's weirdly difficult to find good sources that explain how transformers and attention actually work, considering how important they are. Everybody just seems to repeat a variant of this diagram from... Jade Master @Jade Master 1768392745 @nprofile1q... it's just a string diagram in Para(SLens) or something right? 1