Damus
Fabian Giesen · 2w
That paper was never formally published (that I can track down anyhow) but note it has both regular and reciprocal square root versions (the latter on the way to computing a square root) as well as in...
Fabian Giesen profile picture
That paper was concerned with doubles and getting the initial approximation to slightly under 8 bits was sufficient to get a correctly rounded double sqrt within 3 iterations.

Tweaking the magic value to get rid of the initial LUT altogether is reasonable for float32, for the correctly rounded float64 sqrt-from-rsqrt they cared about it would've needed an extra iter which they tried to avoid.
1
Fabian Giesen · 2w
Although the original thing never got published, a different paper analyzing the algorithm (and confirming it gives correctly rounded results) was, in 1992: https://bpb-us-e1.wpmucdn.com/websites.uta.edu/dist/7/5059/files/2021/06/csd-94-850.pdf it also gives more of an explanation why it works, the...