More than 6 years after noticing it I have *finally* understood the Bayesian chain rule (*) intuitively
If you have conditional distributions P(y | x) and P(z | y), the Bayesian chain rule really just says
P(x | z) = ∫dy P(y | z) P(x | y)
Only if you expand P(z | y) using Bayes' law you get the term P(y), which is the pushforward of the prior. It's exactly like writing the differential chain rule in Leibniz notation,
dz/dx = dy/dx dz/dy
which similarly notationally suppresses the pushforward compared to the Lagrange/Jacobi notation
(g.f)'(x) = f'(x) g'(f(x))
*
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2023.24