Damus

Recent Notes

julesh profile picture
More than 6 years after noticing it I have *finally* understood the Bayesian chain rule (*) intuitively

If you have conditional distributions P(y | x) and P(z | y), the Bayesian chain rule really just says
P(x | z) = ∫dy P(y | z) P(x | y)
Only if you expand P(z | y) using Bayes' law you get the term P(y), which is the pushforward of the prior. It's exactly like writing the differential chain rule in Leibniz notation,
dz/dx = dy/dx dz/dy
which similarly notationally suppresses the pushforward compared to the Lagrange/Jacobi notation
(g.f)'(x) = f'(x) g'(f(x))

* https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2023.24
julesh profile picture
Downside of LLM culture that I have not seen mentioned anywhere before: The best way to understand something is to explain it to someone else, and people don't ask people to explain things anymore
Bartosz Milewski · 19w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq6hx49ptzmru6djkmefdzdtx8dep9jup92hf8sngjcxk2h9h7msysa03n8d Unfortunately Macron is very unpopular in France and people predict that RN, the party of Marine Le Pen, may win the next elections. The main problem with unifying Europe is the lopsi...
julesh profile picture
The hardest problem in computer science is naming things, therefore Nicolaas de Bruijn solved the hardest problem in computer science
julesh profile picture
It's weirdly difficult to find good sources that explain how transformers and attention actually work, considering how important they are. Everybody just seems to repeat a variant of this diagram from the original paper, as though it actually explains anything

1
Jade Master · 22w
nostr:nprofile1qy2hwumn8ghj7un9d3shjtnyd968gmewwp6kyqpq6hx49ptzmru6djkmefdzdtx8dep9jup92hf8sngjcxk2h9h7msysa03n8d it's just a string diagram in Para(SLens) or something right?