Talk:Fast inverse square root
Fast inverse square root is a former featured article candidate. Please view the links under Article milestones below to see why the nomination was archived. For older candidates, please check the archive. | ||||||||||||||||||||||
Fast inverse square root has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. | ||||||||||||||||||||||
|
This article is rated GA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
Re-organization of the article
editI have rewritten a significant fraction of the article, and shifted some other pieces around. The main reason for doing this is that I felt that, as it was, the article had too many long and hard to read equations which obscured the very essential point of the algorithm. Namely, that aliasing an int to a float is a quick and dirty approximation of its logarithm. From this, everything else derives trivially. I tried give this point a more central place in the article, and also provided a graph comparing Ix with a scaled and shifted log2(x). The other significant changes I made are:
- I removed the figure about two's complement numbers, as it is essentially irrelevant to the point.
- I moved the worked example up: This example gives absolutely no clue of why this works, it only serves to give some “feeling” of how it works. It is then probably better to have it before we go into providing insights on the math.
- I move the Accuracy section after Newton's method, because it discusses the accuracy of the whole algorithm (including the Newton iteration), not only of the first steps.
— Edgar.bonet (talk) 12:05, 25 July 2014 (UTC)
- LGTM, thanks. Protonk (talk) 18:25, 25 July 2014 (UTC)
Poor figures
editThere are two figures in this article that do not have axis labels and no description of the axes is given in the figure captions. As such, they are essentially useless because you cannot tell what they mean. They should be corrected or removed, and the the perpetrators should be rapped across the knuckles, then tarred and feathered for their shoddy work. http://xkcd.com/833/
128.174.127.111 (talk) 15:21, 30 September 2014 (UTC)
- You should add some! The first one is literally just numbers against numbers. There's no real meaningful axis label unless you want to say "this here is an integer" and "this here is the floating point representation". The second one does need axis labels, as it is measuring error, but you can certainly add them to the caption (or the figure if you'd like to update the file itself). Protonk (talk) 15:32, 30 September 2014 (UTC)
I committed the first of these graphs, and I am aware of the missing axis labels. There are two curves in the graph, as identified by the key in the top-left quadrant:
- Ix
- L log2(x) + L (B − σ)
I did not label the y-axis because it can represent two different things. I could put a label like “Ix or L log2(x) + L (B − σ)”, but I really do not believe this would have any usefulness. As for the x-axis, the appropriate label would be just “x”, as this is the name given to the independent variable in the two expressions above. Would such a label be useful? — Edgar.bonet (talk) 12:53, 1 October 2014 (UTC)
The other 2 constants
editThis article goes into considerable depth over the optimization of the "magic constant", but ignores the other 2. Jan Kadlec wrote a program to see if he could optimize all three. After an exhaustive search:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F3759DFul - (y.u >> 1);
return 0.5f * y.f * (3.0f - x * y.f * y.f); }
becomes:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFFF9ul - (y.u >> 1);
return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f); }
With all 3 parameters optimized, you can get min max error = 0.0006501967. All the details can be found here. — Preceding unsigned comment added by Slacka123 (talk • contribs) 06:02, 7 April 2015 (UTC) ; edited 09:26, 7 April 2015 (UTC)
- Except youre mistaken to call these "other two magic constants" parameters. They're not; they're quite literally constants. They are derived from rigorous mathematics and no one would consider changing them. It is only the first one that is more-or-less arbitrary. Yes, it can be computed exactly, but only after first deciding which definition of optimization one is using (e.g. sum of square errors, sum of absolute errors, etc.)... and that decision is merely a human decision. What definition was used to derive your code and its error? The same cannot be said for the "other two", which are unambiguous products of the calculus involved. Whether or not the code you offer improves performance is kind of beside the point, but worthy of study and consideration nonetheless. I just wouldnt use the term "parameters" to describe all of the values involved. 50.34.41.50 (talk) 16:04, 8 May 2021 (UTC)
The algorithm that explains the magic number
editI revised my document (referenced in this article), http://www.geometrictools.com/Documentation/FastInverseSqrt.pdf, which now describes a minimax algorithm that almost generates the magic number. By "almost" I mean: whoever wrote the original code probably estimated the minimax value using sampling. The code's magic number and the value I computed by implementing the minimax algorithm are nearly the same number. The algorithm is more general in that the initial guess to Newton's method is y0 = a + b*x. The first iterate is y1 = y0*(3-x*(a + b*x)^2)/2, which can be thought of as a function of 'a' and 'b'. For each (a,b), the relative error in the approximation is E(a,b) = max_{x in [1,2)}|sqrt(x)*y1-1|. If you minimize this for all a > 0 and b < 0, you obtain an error bound about half that produced by the Quake3 code, but the implementation is slower (but still faster than 1/sqrt(x)). If you constrain the problem by choosing b = -1/4 and compute the minimum of E(a,-1/4), you get an 'a' that generates a magic number nearly the same as the Quake3 code, but the global error bound is slightly smaller (which argues the code writer estimated the minimizer for 'a'). Lomont's document attempts to explain with a minimax algorithm, but for y0 fitted to 1/sqrt(x); however, the goal is for y1 to be a good estimate. Just thought I'd mention this in case whoever likes updating the main page can feel free to do so. Daveeberly (talk) 20:26, 9 May 2015 (UTC)
- I've gone ahead and updated the reference, especially since the document at the link is the new one. Other than that, was there another reference in the article you thought needed updating? Rwessel (talk) 04:32, 10 May 2015 (UTC)
Earliest reference
editIn the 1990 Graphics Gems 1 book, Paul Lalonde and Robert Dawson of Dalhousie University, Halifax, Nova Scotia share code for the fast sqrt and inverse sqrt. The method they describe is more verbose but effectively the same as the modern code, though they don't use the exact optimized magic constant. Clearly the constant is just a refinement from the basic structure they discribe. https://imcs.dvfu.ru/lib.int/docs/Programming/Graphics/Graphics%20Gems%20I.pdf See the chapter "A HIGH SPEED HIGH SPEED, LOW PRECISION PRECISION SQUARE ROOT QUARE ROOT"
http://www.realtimerendering.com/resources/GraphicsGems/gemsiii/sqrt.c — Preceding unsigned comment added by 76.102.116.190 (talk) 02:31, 9 October 2016 (UTC)
- The code your are referring to is based on a table lookup. It has no relationship whatsoever with the algorithm described in this article. — Edgar.bonet (talk) 09:43, 10 October 2016 (UTC)
- The code is now attributed to Greg Walsh (Ardent), who was inspired by Cleve Moler (MathWorks). Greg consulted at Kubota with Gary Tarolli.
References
"the general method to compute the inverse square root was to calculate an approximation for 1/√x" ?
editIn the section "Overview of the code", there is the expression:
At the time, the general method to compute the inverse square root was to calculate an approximation for 1/√x, then revise that approximation via another method until it came within an acceptable error range of the actual result.
The word "at the time" looks to mean the method before fast inverse square root. However, the expression also matches the fast method. The method just uses special way to calculate an approximation for 1/√x. I think the past method was to calculate the multiplicative inverse and then to calculate the square root of it (it can be reversed). It should be clarified in the article. Gjhdvhd (talk) 09:44, 23 October 2016 (UTC)
Point of the Method?
editThis article starts motivating the algorithm by stating
At the time, it was generally computationally expensive to compute the reciprocal of a floating-point number
This suggests that the point of the algorithm is to avoid the cost of computing the inverse (of the square root). However, it rather seems that the point of the algorithm is to avoid having more than one Newton iteration. — Preceding unsigned comment added by 79.210.119.29 (talk) 02:21, 28 April 2017 (UTC)
- Yes and no. They are not exclusive. Both are costly. And both amount to doing the same thing. The purpose of the algorithm is to get a good first guess for Newton, but you really dont even need one iteration if youre need for precision is low enough. Newton converges fast but not fast enough... there are too many operations going on. This is slow, especially when computing thousands of them per second. The same is true of computing the inverse square root directly. The algorithm was created at a time when computer graphics and early physics engines needed to normalize vectors quickly but the processors were too slow to render in real time. The original code even included a commented-out second iteration, so is not strictly being avoided. 50.34.41.50 (talk) 16:34, 8 May 2021 (UTC)
Standard square root?
editIs there a similar method to calculate normal square root? — Preceding unsigned comment added by 2A01:119F:21D:7900:A5CD:B2BA:D621:91CF (talk) 09:38, 2 December 2017 (UTC)
- Good question. Honestly I dont know. But you could potentially just deduct 1 from the exponent of a floating point. This would be as easy as converting the floating point to an integer using the same type-assignment trickery, to that you would deduct 1 offset by 23 bits (for single precision) to align it with the exponent, without touching the mantissa at all, and then convert back to float. This should suffice as a decent first guess, and you can go on to perform the appropriate Newton iteration. 50.34.41.50 (talk) 16:27, 8 May 2021 (UTC)
- Yes, there is a similar method to calculate the normal square root. The start of the algorithm is to shift the bit representation of the non-negative floating point number number one to the right and then add a constant. As with the inverse square root, there is some flexibility in the value of the constant, but a key observation is that √1 = 1 so
float one = 1.0; float constant = (*(uint32_t *)&one) >> 1; uint32_t temp = (*(uint32_t *)&x) >> 1 + constant; float sqrt = *(float *)&temp;
- will be pretty reasonable. —Quantling (talk | contribs) 23:00, 11 May 2021 (UTC)
Proposal to Protect Page
editI'd like to propose protecting this page under "Pending changes" (WP:PEND). Looking at the edit history, almost every "Undo" for the past year is reverting someone's removal of the profanity in the algorithm's code comments. Further, many of the undone edits are by new or unregistered users. Ignoring these, I see about 2 accepted edits per month, which seems infrequent. The page protection guidelines state that WP:PEND is appropriate for "[i]nfrequently edited articles with high levels of vandalism ... from unregistered and new users", which seems appropriate for this page. I realize the profanity edits are made in good faith and aren't actual vandalism, but they're still disruptive. If there is some way to protect just the code block, that would be better, but I don't know of one. Decka7 (talk) 15:55, 20 November 2018 (UTC)
- I'd agree, but protecting a portion only is not facilitated. However, instead of presenting the notorious original as text, which is easily edited, it could be presented as an image of such text which is not so easy to edit in part. The image file might be protected from deletion itself. So, does someone have access to an image of the original source file, as printed then? Or will an image of the current source do, via a screen grab? NickyMcLean (talk) 02:08, 13 March 2019 (UTC)
- I disagree in principle. Although I oppose censorship in all forms, provided they are not justifiably illegal, I do think that the purpose of a wiki, by its original intended definition, is to facilitate community edits and be a place of evolving encyclopedic information catered by the people. To deny people the ability to edit is to change the entire platform from a wikipedia to an encyclopedia britanica instead. And hand "authority" to edit to a select few... and who is to decide who these elites are? Do you think you'll be one of them? And in doing so, you are censoring those would-be editors in kind. So either way there is censorship of someone. The question is whether or not you want to negate the founding principle of the website itself. If you desire a static encyclopedia, might I suggest the Britannica.
- You admit the edits are done in good faith but call it disruptive. How? What does the profanity add to the discussion? If the article is about the algorithm and the mathematics, perhaps you can justify the inclusion of profanity?! I dont see your criticism as legitimate. If you want to criticize "hypersensitive censors", Im right there with you. If you want to criticize those who would edit someone elses original work, which I call vandalism, Id be right there with you. But youre not. Youre calling it "disruptive" editing. How? I dont see what the profanity adds to a discussion on coding, or how its removal diminishes anyones ability to learn and understand the algorithm. If that truly is the purpose of this page, and not just an excuse for programmers to slip profanity into the wiki and feel justified for doing so, like some adolescent child giggling at a bad word. In fact, it is the profantiy that serves no purpose for the article and ends up being disruptive to the learning of those very hyper-sensitive people.
- NickyMcLean offers a good suggestion. We can include the censored code as a code block and simultaneously embed an original image adjacent. The latter may be hidden in a drop down. 50.34.41.50 (talk) 16:16, 8 May 2021 (UTC)
Accuracy claim
editUnder accuracy it claims a worse case error of 1.7% When I run the algorithm I get a worse case relative error of 0.00174915. Ie 0.17% Has the article simply shifted the decimal place? It would be nice is someone else could double check.
Bill W102102 (talk) 09:57, 12 March 2019 (UTC)
- One of the features of the much-vaunted metric system is the ease with which powers of ten or factors of a thousand are miscounted. Here, although not involving Systeme International, by chance, the example used is the rather boring number 0·01 whose square root is 0·1, and so the reciprocal is 10. The article is quoting the actual difference (x - rsqrt(x)) which is indeed 0·017478 and which, as a percentage of one is 1·7478% But the value is ten, and so the relative error is a percentage of ten and thus 0·17478% A factor of ten has been overlooked, as you say. NickyMcLean (talk) 02:08, 13 March 2019 (UTC)
Thank you BillW102102 (talk) 10:40, 13 March 2019 (UTC)
Picture of John Carmack
editIs this the only Wikipedia article that's illustrated by a picture of a person who definitely did not do the work? How about deleting the picture? Zaslav (talk) 23:49, 5 June 2019 (UTC)
Agreed, it's nonsense to have Carmack's mugshot here. He already has his own WikiPedia page and that image makes good sense there but here it is just noise. — Preceding unsigned comment added by 213.127.87.1 (talk) 10:02, 14 November 2019 (UTC)
- so too is the f-bomb also just a distraction from the discussion of the algorithm.
Moroz Cite ref
editThe "Moroz 2018." citation reference doesn't seem to link properly, but I'm not sure why. —Fishrock123 (talk) 18:59, 29 January 2020 (UTC)
What the fuck comment
editIs the comment present in the original code? Is the profanity merely gratuitous? Is its presence in this article necessary? 87.75.117.183 (talk) 18:36, 10 January 2021 (UTC)
- After a little researching, it seems the comment was indeed present in the Quake 3 source code. See here. 87.75.117.183 (talk) 19:32, 10 January 2021 (UTC)
@David Eppstein: Wikipedia:Offensive material says "Material that would be considered vulgar or obscene by typical Wikipedia readers should be used if and only if its omission would cause the article to be less informative, relevant, or accurate, and no equally suitable alternative is available". I propose for debate that the article should state that a swear word has been edited and that the use of the f-bomb within the material should be replaced by f***. —Quantling (talk | contribs) 18:26, 17 April 2021 (UTC)
- Agree: I agree with my own proposal. I believe that the modifications that I suggest are a less offensive alternative that suitably conveys 100% of the information, relevancy, and accuracy. —Quantling (talk | contribs) 18:26, 17 April 2021 (UTC)
- Keep the comment as-is, as many past discussions have agreed. It's a direct quote. We should not be altering direct quotes. WP:NOTCENSORED applies. And more, my strong impression of the censored versions is that they come across as condescending and insulting in their assumption that readers are unable to handle a swear word that nowadays is so often-heard that it cannot be particularly shocking. Further, your proposed revision is a direct violation of Wikipedia:Offensive material, which tells us "words should never be minced by replacing letters with dashes, asterisks, or other symbols". —David Eppstein (talk) 18:30, 17 April 2021 (UTC)
- You are allowed to omit portions of direct quotes, particularly when it distracts from the greater point. This is done all the time in academia, research, news reports, etc., and there are easily looked up grammatical rules for doing so. Especially if the portion omitted has nothing to do with the grander topic of discussion, argument or article for which the quote is used. To deny this fact is arguably a sign of ignorance at best, or at worse a perverse and adolescent ad hoc justification to use profanity. Preservation of original work is not a legitimate criticism on its own because this is not an original work anyway (its a wikipedia article discussing an original work), and aside from that it contributes nothing to the conversation (the work in question is the algorithm, not the commentary). The innovation and creativity is in the algorithm, not in the commentary. The original work can be cited. I would argue that demonstrating some proof of the necessity of profanity in an article about an algorithm is in order here. 50.34.41.50 (talk) 16:38, 8 May 2021 (UTC)
Inconsistent Magic Constant
editAs pointed out in the article, the constant 0x5f3759df gives rise to the σ of 0.0450466. But the article also clearly states just prior that optimum σ as calculated by the uniform norm definition is 0.0430357, which implies a constant of 0x5f37bcb6 or 0x5f37bcb5. There is a fair deviation here that is unjustified. However insignificant it may be there is no explanation for the inconsistency in the article.
As a side note, Id like to see the derivation of either number from the uniform norm. I attempted to find my own optimal with the minimal square error approach and the error is actually larger. Some explanation as to why the uniform norm is the best one, or the better one at least, would be reasonable for this article. 50.34.41.50 (talk) 16:55, 8 May 2021 (UTC)
- This bit confused me, too. I think the pitfall is to be clear on what you want to optimize for. The article correctly states 0.0430357 as the optimal value w.r.t the uniform norm, that is ,
- where and . However, this value is not optimal if you include the subsequent steps, i.e. floating point conversion and Newton's method, and want to minimize the relative error w.r.t to the inverse square root. In fact, as stated later in the article "It is not known precisely how the exact value for the magic number was determined.", although the referenced PDF https://web.archive.org/web/20150511044204/http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf comes very close. Mayby this hint should be added to the article in order to avoid confusion.
- Now for the derivation: since we only consider the closed unit interval and all functions are continous, the extreme value theorem guarantees the existence of a maximum and a minimum. This is either a local extremum, which we derive the usual way by differentiating:
- The sign part comes from the chain rule applied to . Sovling for x gives and the absolute difference at this point is
- ,
- where . The other possible maxima lie at the boundaries of the unit interval (which conincide):
- Thus , which is minimal for . Notice that is also tangent in (both have slope 1 and ), which is used in the pdf mentioned. There, another possible choice is derived, where is chosen, such that errors average out to zero: . The first part of the integral is , the second one is , solving for gives
- One can also compute exact values for when minimizing to different norms
- : First, split the integral into three parts: from to , to and to , where and are the intersections of f and g. Then differentiate by using the Leibniz integral rule. One arrives at the condition , which is only true for . This finally yields .
- : the integral can be calculated directy to be , which has a minimum at and coincides whith from above!
- In the end, what you choose always depends on your use case.Limpaldu (talk) 09:02, 7 March 2022 (UTC)
Name of the function
editThe inverse of the square root function is surely the square. The function in this article might be called the "Reciprocal of the Square Root" but "inverse square root" is used elsewhere. --82.3.131.192 (talk) 18:02, 24 May 2021 (UTC)
- In fact no. There is nothing wrong with using the term "inverse" here. The context is multiplication (finding the norm of a vector by multiplying that vector by the calculated value, which serves to normalize the vector). The inverse function (of multiplication) is division, or if you prefer the multiplication by a multiplicative inverse. Its valid terminology if you understand the mathematical context. Read up on some of the abstract algebra. 50.34.41.50 (talk) 20:40, 27 June 2021 (UTC)
- I thought about using "reciprocal" more in the article when I started it but the main sources and commentary refers to it as a (multiplicative) inverse. Protonk (talk) 18:07, 4 November 2021 (UTC)
Blanket revert by David Eppstein on 2021-10-22
edit- Undo multiple disimprovements (to address two: youtube is not a reliable source, and Wikipedia is not a code repository; we don't need yet another implementation here
Your revert reintroduces a grammatical error, an error that "The following code follows C standards" but then the code that was given still used illegal unions. I had restructured the text to make this more clear. Reintroduces the error that that memcpy would be slower or was more memory infefficient which is not true. And reintroduces the error that an extra variable would incur additional cost. That was at least misleading.
If you are of the opinion that there is too much bloat then we should remove the sample code with the union, since that is not actually an improvement over the original code of Quake III Arena.
While the Youtube video is not a reliable source, the author is a well known expert and has published best practices books on C++, and the same information as in the source can be found on Wikipedia itself Type punning. I encourage you to find a better link instead of blanket reverting 5 different constructive edits. Kwinzman (talk) 08:41, 22 October 2021 (UTC)
- Why was my edition of the algorithm removed though? I was the one who added the type-punning version which is valid C and is the recommended fashion of doing such type-punning in standard C. ܐܵܬܘܿܪܵܝܵܐ 02:44, 4 November 2021 (UTC)
- The article now appears self-contradictory. It says that union-based type-punning produces undefined behavior, and then presents a union-based type-punning formulation as if it were a more well-defined replacement for the original code. Is that contradiction included in the intent of your edits? If the union-based type-punning is just as bad as the original, what is the point of spelling it out in code rather than just saying so in text? If the intent is something like "this works in C but not in C++" it needs to be spelled out more clearly. Anyway, do we really need the detailed code for three different minor variations of the same algorithm? —David Eppstein (talk) 07:59, 4 November 2021 (UTC)
- In C++, union-based type-punning is undefined. In C however, union-based type-punning is legal and defined behavior. The reason for the union-based type-punning is to showcase the code in a standards-compliant fashion. ܐܵܬܘܿܪܵܝܵܐ 04:38, 12 November 2021 (UTC)
- Good point! However the memcpy version is legal in both C and C++ and closer to the original code. So I would prefer the memcpy version. I don't have a strong opinion on this though. Kwinzman (talk) 23:23, 17 November 2021 (UTC)
- In C++, union-based type-punning is undefined. In C however, union-based type-punning is legal and defined behavior. The reason for the union-based type-punning is to showcase the code in a standards-compliant fashion. ܐܵܬܘܿܪܵܝܵܐ 04:38, 12 November 2021 (UTC)
- The article now appears self-contradictory. It says that union-based type-punning produces undefined behavior, and then presents a union-based type-punning formulation as if it were a more well-defined replacement for the original code. Is that contradiction included in the intent of your edits? If the union-based type-punning is just as bad as the original, what is the point of spelling it out in code rather than just saying so in text? If the intent is something like "this works in C but not in C++" it needs to be spelled out more clearly. Anyway, do we really need the detailed code for three different minor variations of the same algorithm? —David Eppstein (talk) 07:59, 4 November 2021 (UTC)
- Why was my edition of the algorithm removed though? I was the one who added the type-punning version which is valid C and is the recommended fashion of doing such type-punning in standard C. ܐܵܬܘܿܪܵܝܵܐ 02:44, 4 November 2021 (UTC)
- Why should the article talk about C++ implementations in the first place? Protonk (talk) 18:11, 4 November 2021 (UTC)
Paul Kinney's method
editI've removed reference to a similar approximation developed simultaneously to Kahan and Ng's in 1986 because I can't seem to find a reference which exists that actually talks about it. I also cannot verify that the cited reference in the article (A New Generation of Hardware and Software for the FPS T Series) supports the claim in text.
There is a modern mention by Kinney in a submission to a conference. The submission was not accepted, but more importantly for us if you read the section it doesn't say anything more than "In 1986, one of the authors of this paper originally invented this method, which was called “The Kinney Method,” to implement vector square root for the production FPS T Series Hypercube Supercomputer (Gustafson, 1986). William Kahan and K.C. Ng at Berkeley also independently discovered this around 1986." The Gustafson 1986 cite is to a white-paper which doesn't mention square roots (reciprocal or otherwise) or use of floating point / integer storage to gain an approximate logarithm. I am not willing to say I've exhausted all the published literature but I've made a good faith effort to find any contemporary account of it and I can't find any.
Accordingly I have removed the material. If you want to restore it please consider posting as to why in this discussion. Thank you. Protonk (talk) 19:42, 17 January 2022 (UTC)
Undef behaviout
editUsing of union dosent remove undef behaviour from the code. The trick of using unions as typecast is comonly used in code, nearly all C programers use it, but the C standart defines it as undefined behaviout, only reading from the same type you wrote before is defined for enums. Plus the required space to save enum is equal to the bigest size among used variables/structs in the enum. Saving to one and reading from second member of enum is NOT defined - we only assune the compiler is clever and dont do some tricks with it. There is no need for compiler to do this, it will harm the efectivity of it, so obviously this trick works... But it is NOT defined by C standart, so it dosent remove the undef vehaviour from the code. 2A00:102A:5004:A4B6:F584:6CC9:A4F0:4CE6 (talk) 09:51, 26 January 2022 (UTC)
How Halley's method can be faster than Newton method?
editHere https://en.wiki.x.io/wiki/Fast_inverse_square_root#Zero_finding about Halley's method you can read that it converges cubicly while newton method converges quadraticly (square). But in Halley's method need each time one division operation. So basicly for one Halley's method iteration need two iterations... Say first number chosen with precision 0.001. Then with Newton's method after two iterations precision will be (0.001^2)^2= 0.000000000001. And with Halley's method after one iteration (which takes time like two Newton's method iterations) precision will be 0.001^3=0.000000001. Also I think that what is written here "Subsequent additions by hardware manufacturers have made this algorithm redundant for the most part. For example, on x86, Intel introduced the SSE instruction rsqrtss in 1999. In a 2009 benchmark on the Intel Core 2, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error." is comparing SSE single precision with x87 FPU double precision. But I recently realise, that from here https://en.wiki.x.io/wiki/Division_algorithm#Newton–Raphson_division division operation needs about the same time as calulating inverse square root with Newton's method. So basicly it's very unclear from where comes this Intel SSE instruction ~4x speed by using Halley's method? Paraboloid (talk) 09:58, 10 May 2024 (UTC)
- Update 1. Never mind. I read here https://web.archive.org/web/20210208132927/http://assemblyrequired.crashworks.org/timing-square-root/ and it apears that SSE using probably Newton method (or Goldschmidt’s algorithm, which is almost the same as Newton method). And "Compiler sqrt(x)/x87 FPU FSQRT" calculates square root per 24ns and "SSE intrinsic ssqrts" calculates square root per 11.9ns. So SSE two times faster than x87 FPU. The only diference, that x87 FPU calculates in Double Extended precision (80 bit) and SSE seems to calculate in single precision (32 bit). Paraboloid 10 May 2024