That paper was concerned with doubles and getting the initial approximation to slightly under 8 bits was sufficient to get a correctly rounded double sqrt within 3 iterations.
Tweaking the magic value to get rid of the initial LUT altogether is reasonable for float32, for the correctly rounded floa...