:Fast inverse square root

{{Short description|Root-finding algorithm}}

File:OpenArena-Rocket.jpg OpenArena, use the fast inverse square root code to compute angles of incidence and reflection.]]

Fast inverse square root, sometimes referred to as {{mono|Fast InvSqrt()}} or by the hexadecimal constant {{mono|0x5F3759DF}}, is an algorithm that estimates \frac{1}{\sqrt{x}}, the reciprocal (or multiplicative inverse) of the square root of a 32-bit floating-point number x in IEEE 754 floating-point format. The algorithm is best known for its implementation in 1999 in Quake III Arena, a first-person shooter video game heavily based on 3D graphics. With subsequent hardware advancements, especially the x86 SSE instruction rsqrtss, this algorithm is not generally the best choice for modern computers, though it remains an interesting historical example.

The algorithm accepts a 32-bit floating-point number as the input and stores a halved value for later use. Then, treating the bits representing the floating-point number as a 32-bit integer, a logical shift right by one bit is performed and the result subtracted from the number {{mono|0x5F3759DF}}, which is a floating-point representation of an approximation of \sqrt{2^{127}}. This results in the first approximation of the inverse square root of the input. Treating the bits again as a floating-point number, it runs one iteration of Newton's method, yielding a more precise approximation.

History

William Kahan and K.C. Ng at Berkeley wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations. In the late 1980s, Cleve Moler at Ardent Computer learned about this technique and passed it along to his coworker Greg Walsh. Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Gary Tarolli was consulting for Kubota, the company funding Ardent at the time, and likely brought the algorithm to 3dfx Interactive circa 1994.

Jim Blinn demonstrated a simple approximation of the inverse square root in a 1997 column for IEEE Computer Graphics and Applications.{{Sfn|Blinn|1997|pp=80–84}} Reverse engineering of other contemporary 3D video games uncovered a variation of the algorithm in Activision's 1997 Interstate '76.{{Cite web |last=Peelar |first=Shane |date=1 June 2021 |title=Fast reciprocal square root... in 1997?! |url=https://inbetweennames.net/blog/2021-05-06-i76rsqrt/}}

Quake III Arena, a first-person shooter video game, was released in 1999 by id Software and used the algorithm. Brian Hook may have brought the algorithm from 3dfx to id Software. A discussion of the code appeared on the Chinese developer forum CSDN in 2000, and Usenet and the gamedev.net forum spread the code widely in 2002 and 2003.{{Sfn|Lomont|2003|p=1-2}} Speculation arose as to who wrote the algorithm and how the constant was derived; some guessed John Carmack. Quake III{{'}}s full source code was released at QuakeCon 2005, but provided no answers. The authorship question was resolved in 2006 when Greg Walsh, the original author, contacted Beyond3D after their speculation gained popularity on Slashdot.

In 2007 the algorithm was implemented in some dedicated hardware vertex shaders using field-programmable gate arrays (FPGA).{{Cite book |last1=Zafar |first1=Saad |last2=Adapa |first2=Raviteja |title=2014 International Conference on Advances in Electrical Engineering (ICAEE) |chapter=Hardware architecture design and mapping of 'Fast Inverse Square Root' algorithm |date=January 2014 |chapter-url=https://ieeexplore.ieee.org/document/6838433 |pages=1–4 |doi=10.1109/ICAEE.2014.6838433|isbn=978-1-4799-3543-7 |s2cid=2005623 }}{{Sfn|Middendorf|2007|pp=155–164}}

Motivation

File:Surface_normals.svgs are used extensively in lighting and shading calculations, requiring the calculation of norms for vectors. A field of vectors normal to a surface is shown here.]]

File:Reflection for Semicircular Mirror.svg

The inverse square root of a floating point number is used in digital signal processing to normalize a vector, scaling it to length 1 to produce a unit vector.{{Sfn|Blinn|2003|p=130}} For example, computer graphics programs use inverse square roots to compute angles of incidence and reflection for lighting and shading. 3D graphics programs must perform millions of these calculations every second to simulate lighting. When the code was developed in the early 1990s, most floating point processing power lagged the speed of integer processing. This was troublesome for 3D graphics programs before the advent of specialized hardware to handle transform and lighting. Computation of square roots usually depends upon many division operations, which for floating point numbers are computationally expensive. The fast inverse square generates a good approximation with only one division step.

The length of the vector is determined by calculating its Euclidean norm: the square root of the sum of squares of the vector components. When each component of the vector is divided by that length, the new vector will be a unit vector pointing in the same direction. In a 3D graphics program, all vectors are in three-dimensional space, so \boldsymbol v would be a vector (v_1, v_2, v_3). Then,

:\|\boldsymbol{v}\| = \sqrt{v_1^2+v_2^2+v_3^2}

is the Euclidean norm of the vector, and the normalized (unit) vector is

:\begin{align}

\boldsymbol{\hat{v}} &= \frac{1}{\left\|\boldsymbol{v}\right\|}\boldsymbol{v}\\

&= \frac{1}\sqrt{v_1^2+v_2^2+v_3^2}\,\boldsymbol{v},

\end{align}

where the fraction term is the inverse square root of v_1^2+v_2^2+v_3^2.

At the time, floating-point division was generally expensive compared to multiplication; the fast inverse square root algorithm bypassed the division step, giving it its performance advantage.

Overview of the code

The following C code is the fast inverse square root implementation from Quake III Arena, stripped of C preprocessor directives, but including the exact original comment text:

float Q_rsqrt( float number )

{

long i;

float x2, y;

const float threehalfs = 1.5F;

x2 = number * 0.5F;

y = number;

i = * ( long * ) &y; // evil floating point bit level hacking

i = 0x5f3759df - ( i >> 1 ); // what the fuck?

y = * ( float * ) &i;

y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

return y;

}

At the time, the general method to compute the inverse square root was to calculate an approximation for \frac{1}{\sqrt{x}}, then revise that approximation via another method until it came within an acceptable error range of the actual result. Common software methods in the early 1990s drew approximations from a lookup table.{{Sfn|Eberly|2001|p=504}} The key of the fast inverse square root was to directly compute an approximation by utilizing the structure of floating-point numbers, proving faster than table lookups. The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division.{{Sfn|Lomont|2003|p=1}} The algorithm was designed with the IEEE 754-1985 32-bit floating-point specification in mind, but investigation from Chris Lomont showed that it could be implemented in other floating-point specifications.{{Sfn|Lomont|2003}}

The advantages in speed offered by the fast inverse square root trick came from treating the 32-bit floating-point wordUse of the type long reduces the portability of this code on modern systems. For the code to execute properly, sizeof(long) must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems, sizeof(long) is 8 bytes. The more portable replacement is int32_t. as an integer, then subtracting it from a "magic" constant, {{mono|0x5F3759DF}}.{{Sfn|Lomont|2003|p=3}}{{Sfn|McEniry|2007|p=2, 16}}{{Sfn|Eberly|2001|p=2}} This integer subtraction and bit shift results in a bit pattern which, when re-defined as a floating-point number, is a rough approximation for the inverse square root of the number. One iteration of Newton's method is performed to gain some accuracy, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for Newton's method; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.{{r|ruskin|agner}}

=Worked example=

As an example, the number x=0.15625 can be used to calculate \frac{1}{\sqrt{x}} \approx 2.52982. The first steps of the algorithm are illustrated below:

0011_1110_0010_0000_0000_0000_0000_0000 Bit pattern of both x and i

0001_1111_0001_0000_0000_0000_0000_0000 Shift right one position: (i >> 1)

0101_1111_0011_0111_0101_1001_1101_1111 The magic number 0x5F3759DF

0100_0000_0010_0111_0101_1001_1101_1111 The result of 0x5F3759DF - (i >> 1)

Interpreting as IEEE 32-bit representation:

0_01111100_01000000000000000000000 1.25 × 2−3

0_00111110_00100000000000000000000 1.125 × 2−65

0_10111110_01101110101100111011111 1.432430... × 263

0_10000000_01001110101100111011111 1.307430... × 21

Reinterpreting this last bit pattern as a floating point number gives the approximation y=2.61486, which has an error of about 3.4%. After one iteration of Newton's method, the final result is y=2.52549, an error of only 0.17%.

=Avoiding undefined behavior=

According to the C standard, reinterpreting a floating point value as an integer by casting then dereferencing the pointer to it is not valid (undefined behavior). Another way would be to place the floating point value in a union containing an additional 32-bit unsigned integer member, and accesses to that integer provides a bit level view of the contents of the floating point value.

  1. include // uint32_t

float Q_rsqrt(float number)

{

union {

float f;

uint32_t i;

} conv = { .f = number };

conv.i = 0x5f3759df - (conv.i >> 1);

conv.f *= 1.5F - (number * 0.5F * conv.f * conv.f);

return conv.f;

}

For C++, however, type punning through a union is also undefined behavior. In modern C++, the recommended method for implementing this function's casts is through C++20's std::bit_cast on C++23's std::float32_t types. This also allows the function to work in constexpr context:

import std;

constexpr std::float32_t Q_rsqrt(std::float32_t number) noexcept

{

const auto y = std::bit_cast(

0x5f3759df - (std::bit_cast(number) >> 1));

return y * (1.5f32 - (number * 0.5f32 * y * y));

}

Algorithm

The algorithm computes \frac{1}{\sqrt{x}} by performing the following steps:

  1. Alias the argument x to an integer as a way to compute an approximation of the binary logarithm \log_{2}(x)
  2. Use this approximation to compute an approximation of \log_{2}\left(\frac{1}{\sqrt{x}}\right) = -\frac{1}{2} \log_{2}(x)
  3. Alias back to a float, as a way to compute an approximation of the base-2 exponential
  4. Refine the approximation using a single iteration of Newton's method.

=Floating-point representation=

{{Main|Single-precision floating-point format}}

Since this algorithm relies heavily on the bit-level representation of single-precision floating-point numbers, a short overview of this representation is provided here. To encode a non-zero real number x as a single precision float, the first step is to write x as a normalized binary number:{{Sfn|Goldberg|1991|p=7}}

:\begin{align}

x &= \pm 1.b_1b_2b_3\ldots \times 2^{e_x}

\end{align}

where the exponent e_x is an integer, and 1.b_1b_2b_3\ldots is the binary representation of the significand. Since the single bit before the point in the significand is always 1, it does not need be stored. The equation can be rewritten as:

:\begin{align}

x &= (-1)^{S_x} \cdot 2^{e_x} (1 + m_x)

\end{align}

where m_x means 0.b_1b_2b_3\ldots, so m_x \in [0, 1). From this form, three unsigned integers are computed:{{Sfn|Goldberg|1991|pp=15–20}}

  • S_x, the "sign bit", is 0 if x is positive and 1 negative or zero (1 bit)
  • E_x = e_x + B is the "biased exponent", where B = 127 is the "exponent bias"E_x should be in the range [1, 254] for x to be representable as a normal number. (8 bits)
  • M_x = m_x \times L, where L = 2^{23}The only real numbers that can be represented exactly as floating point are those for which M_x is an integer. Other numbers can only be represented approximately by rounding them to the nearest exactly representable number. (23 bits)

Thus: e_x = E_x -B and m_x = \frac{M_x}{L}.

These fields are then packed, left to right, into a 32-bit container.{{Sfn|Goldberg|1991|p=16}}

As an example, consider again the number x = 0.15625 = 0.00101_2. Normalizing x yields:

:x = (-1)^{0} \cdot 2^{-3}(1 + 0.25) = +2^{-3}(1 + 0.25)

and thus, the three unsigned integer fields are:

  • S = 0
  • E = -3 + 127 = 124 = 0111\ 1100_2
  • M = 0.25 \times 2^{23} = 2\ 097\ 152 = 0010\ 0000\ 0000\ 0000\ 0000\ 0000_2

these fields are packed as shown in the figure below:

centerThe number is represented in binary as: I_x = S_x \cdot 2^{31} + E_xL + M_x

Also, since this algorithm works on real numbers, \sqrt{x} is only defined for x \geq 0. The code thus assumes x \geq 0 and S_x = 0.

The number, given to calculate the square root, could be rewritten as:

:I_x = E_xL +M_x

:x = (-1)^0 \cdot 2^{e_x} (1 + m_x) = + 2^{e_x} (1 + m_x)

=Aliasing to an integer as an approximate logarithm=

If \frac{1}{\sqrt{x}} were to be calculated without a computer or a calculator, a table of logarithms would be useful, together with the identity \log_b\left(\frac{1}{\sqrt{x}}\right) = \log_b\left(x^{-\frac{1}{2}}\right) = -\frac{1}{2} \log_b(x), which is valid for every base b. The fast inverse square root is based on this identity, and on the fact that aliasing a float32 to an integer gives a rough approximation of its logarithm. Here is how:

If x is a positive normal number:

:x = 2^{e_x} (1 + m_x)

then

:\log_2(x) = e_x + \log_2(1 + m_x)

and since m_x \in [0, 1), the logarithm on the right-hand side can be approximated by{{Sfn|McEniry|2007|p=3}}

:\log_2(1 + m_x) \approx m_x + \sigma

where \sigma is a free parameter used to tune the approximation. For example, \sigma = 0 yields exact results at both ends of the interval, while \sigma = \frac{1}{2} - \frac{1+\ln(\ln(2))}{2\ln(2)} \approx 0.0430357 yields the optimal approximation (the best in the sense of the uniform norm of the error). However, this value is not used by the algorithm as it does not take subsequent steps into account.

File:Log by aliasing to int.svg

Thus there is the approximation

:\log_2(x) \approx e_x + m_x + \sigma.

Interpreting the floating-point bit-pattern of x as an integer I_x yieldsSince x is positive, S_x = 0.

:\begin{align}

I_x &= E_x L + M_x\\

&= L (e_x + B + m_x)\\

&= L (e_x + m_x + \sigma + B - \sigma)\\

&\approx L \log_2(x) + L (B - \sigma).

\end{align}

It then appears that I_x is a scaled and shifted piecewise-linear approximation of \log_2(x), as illustrated in the figure on the right. In other words, \log_2(x) is approximated by

:\log_2(x) \approx \frac{I_x}{L} - (B - \sigma).

=First approximation of the result=

The calculation of y=\frac{1}{\sqrt{x}} is based on the identity

:\log_2(y) = - \tfrac{1}{2}\log_2(x)

Using the approximation of the logarithm above, applied to both x and y, the above equation gives:

:\frac{I_y}{L} - (B - \sigma) \approx - \frac{1}{2}\left(\frac{I_x}{L} - (B - \sigma)\right)

Thus, an approximation of I_y is:

:I_y \approx \tfrac{3}{2} L (B - \sigma) - \tfrac{1}{2} I_x

which is written in the code as

:

i = 0x5f3759df - ( i >> 1 );

The first term above is the magic number

:\tfrac{3}{2} L (B - \sigma) = \mathtt{0x5F3759DF}

from which it can be inferred that \sigma \approx 0.0450466. The second term, \frac{1}{2}I_x, is calculated by shifting the bits of I_x one position to the right.{{sfn|Hennessey|Patterson|1998|p=305}}

=Newton's method=

{{Main|Newton's method}}

{{Multiple image

| align =

| direction = vertical

| total_width =

| image1 = First initial approximate value.png

| image2 = Relative error between Fast inverse square root and 1-sqrt().png

| image3 = 2nd-iter.png

| image4 = 3rd-iter.png

| image5 = 4th-iter.png

| image_gap = 0

| footer = Relative error between direct calculation and fast inverse square root carrying out 0, 1, 2, 3, and 4 iterations of Newton's root-finding method. Note that double precision is adopted and the smallest representable difference between two double precision numbers is reached after carrying out 4 iterations.

}}

The number y=\tfrac{1}{\sqrt{x}} is a solution of the equation \tfrac{1}{y^2}-x=0. The approximation yielded by the earlier steps can be refined by using a root-finding method, a method that finds the zero of a function. The algorithm uses Newton's method: if there is an approximation, y_n for y, then a better approximation y_{n+1} can be calculated by taking y_n - \tfrac{f(y_n)}{f'(y_n)}, where f'(y_n) is the derivative of f(y) at y_n.{{Sfn|Hardy|1908|p=323}} Applied to the equation \tfrac{1}{y^2}-x=0, Newton's method gives

:\begin{align}

f(y) &= \frac{1}{y^2} - x\\

f'(y) &= -\frac{2}{y^3}\\

y_{n+1} &= y_n - \frac{f(y_n)}{f'(y_n)}\\

&= y_n + \frac{y_n^3}{2} \left(\frac{1}{y_n^2} - x\right)\\

&= y_n \left(\frac{3}{2} - \frac{x}{2}y_n^2\right),

\end{align}

which is written in the code as y = y * ( threehalfs - ( x2 * y * y ) );.

By repeating this step, using the output of the function (y_{n+1}) as the input of the next iteration, the algorithm causes y to converge to the inverse square root.{{Sfn|McEniry|2007|p=6}} For the purposes of the Quake III engine, only one iteration was used. A second iteration remained in the code but was commented out.{{Sfn|Eberly|2001|p=2}}

=Accuracy=

As noted above, the approximation is very accurate. The single graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, and InvSqrt() gives 9.982522, making the relative difference 0.0017478, or 0.175% of the true value, 10. The absolute error only drops from then on, and the relative error stays within the same bounds across all orders of magnitude.

Subsequent improvements

=Magic number=

It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize approximation error by choosing the magic number R over a range. He first computed the optimal constant for the linear approximation step as {{mono|0x5F37642F}}, close to {{mono|0x5F3759DF}}, but this new constant gave slightly less accuracy after one iteration of Newton's method.{{Sfn|Lomont|2003|p=10}} Lomont then searched for a constant optimal even after one and two Newton iterations and found {{mono|0x5F375A86}}, which is more accurate than the original at every iteration stage.{{Sfn|Lomont|2003|p=10}} He concluded by asking whether the exact value of the original constant was chosen through derivation or trial and error.{{Sfn|Lomont|2003|pp=10–11}} Lomont said that the magic number for 64-bit IEEE754 size type double is {{mono|0x5FE6EC85E7DE30DA}}, but it was later shown by Matthew Robertson to be exactly {{mono|0x5FE6EB50C7B537A9}}.

Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well,{{cite web |last=Kadlec |first=Jan |year=2010 |title=Řrřlog::Improving the fast inverse square root |url=http://rrrola.wz.cz/inv_sqrt.html |access-date=2020-12-14 |type=personal blog |archive-date=2018-07-09 |archive-url=https://web.archive.org/web/20180709021629/http://rrrola.wz.cz/inv_sqrt.html |url-status=dead }} arriving after an exhaustive search at

conv.i = 0x5F1FFFF9 - ( conv.i >> 1 );

conv.f *= 0.703952253f * ( 2.38924456f - x * conv.f * conv.f );

return conv.f;

A complete mathematical analysis for determining the magic number is now available for single-precision floating-point numbers.{{Sfn|Moroz|Walczyk|Hrynchyshyn|Holimath|2018}}{{Cite journal|last=Muller|first=Jean-Michel|date=December 2020|title=Elementary Functions and Approximate Computing|url=https://ieeexplore.ieee.org/document/9106347|journal=Proceedings of the IEEE|volume=108|issue=12|pages=2146|doi=10.1109/JPROC.2020.2991885|s2cid=219047769 |issn=0018-9219}}

=Zero finding=

Intermediate to the use of one vs. two iterations of Newton's method in terms of speed and accuracy is a single iteration of Halley's method. In this case, Halley's method is equivalent to applying Newton's method with the starting formula f(y) = \frac{1}{y^{1/2}} - xy^{3/2} = 0. The update step is then

:y_{n+1} = y_{n} - \frac{f(y_n)}{f'(y_n)} = y_n \left(\frac{3 + xy_n^2}{1 + 3xy_n^2}\right),

where the implementation should calculate xy_n^2 only once, via a temporary variable.

=Obsolescence=

Subsequent additions by hardware manufacturers have made this algorithm redundant for the most part. For example, on x86, Intel introduced the SSE instruction rsqrtss in 1999. In a 2009 benchmark on the Intel Core 2, this instruction took 0.85ns per float compared to 3.54ns for the fast inverse square root algorithm, and had less error.

Some low-cost embedded systems do not have specialized square root instructions. However, manufacturers of these systems usually provide trigonometric and other math libraries, based on algorithms such as CORDIC.

See also

  • {{format link|Methods of computing square roots#Approximations that depend on the floating point representation}}
  • Magic number

Notes

{{Reflist|group=note}}

References

{{Reflist|refs=

{{cite web|url=http://www.netlib.org/fdlibm/e_sqrt.c|title=sqrt implementation in fdlibm - See W. Kahan and K.C. Ng's discussion in comments in lower half of this code}}

{{cite web|last1=Moler|first1=Cleve|title=Symplectic Spacewar|url=http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13|website=MATLAB Central - Cleve's Corner|date=19 June 2012 |publisher=MATLAB|access-date=2014-07-21}}

{{cite web|url=http://www.beyond3d.com/content/articles/8/|title=Origin of Quake3's Fast InvSqrt()|last=Sommefeldt|first=Rys|date=2006-11-29|work=Beyond3D|access-date=2009-02-12}}

{{cite web|url=https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c|title=quake3-1.32b/code/game/q_math.c|work=Quake III Arena|publisher=id Software|access-date=2017-01-21|archive-url=https://web.archive.org/web/20170729072505/https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c#L552|archive-date=2017-07-29|url-status=bot: unknown}}

{{cite web|url=http://www.beyond3d.com/content/articles/15/|title=Origin of Quake3's Fast InvSqrt() - Part Two|access-date=2008-04-19|author=Sommefeldt, Rys|date=2006-12-19|publisher=Beyond3D}}

{{cite web|url=http://assemblyrequired.crashworks.org/timing-square-root/ |title=Timing square root |work=Some Assembly Required |first=Elan |last=Ruskin |date=2009-10-16 |access-date=2015-05-07 |archive-url=https://web.archive.org/web/20210208132927/http://assemblyrequired.crashworks.org/timing-square-root/ | archive-date=2021-02-08}}

{{cite web|url=https://github.com/z88dk/z88dk/tree/master/libsrc/_DEVELOPMENT/math/float/math32#sqrt-and-invsqrt |title=z88dk is a collection of software development tools that targets the 8080 and z80 computers. |author=feilipu|website=GitHub }}

{{cite web|url=http://www.agner.org/optimize/instruction_tables.pdf |title=Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs |access-date=2017-09-08 |first=Agner |last=Fog}}

{{cite web|url=https://mrober.io/papers/rsqrt.pdf|title=A Brief History of InvSqrt|author=Matthew Robertson|date=2012-04-24|publisher=UNBSJ}}

{{cite web|url=http://bbs.csdn.net/topics/41888 |title=Discussion on CSDN |archive-url=https://web.archive.org/web/20150702180504/http://bbs.csdn.net/topics/41888 |archive-date=2015-07-02 |url-status=dead }}

{{cite web|url=https://mrob.com/pub/math/numbers-16.html#le009_16|title=Notable Properties of Specific Numbers|last=Munafo|first=Robert|website=mrob.com|archive-url=https://web.archive.org/web/20181116074733/https://mrob.com/pub/math/numbers-16.html#le009_16|archive-date=16 November 2018|url-status=live}}

}}

=Bibliography=

{{Refbegin|30em}}

  • {{cite journal

|last=Blinn

|first= Jim

|title=Floating Point Tricks

|journal=IEEE Computer Graphics & Applications

|date=July 1997

|volume=17

|issue=4

|doi=10.1109/38.595279

|page=80}}

  • {{cite book

|last=Blinn

|first=Jim

|title=Jim Blinn's Corner: Notation, notation notation

|publisher=Morgan Kaufmann

|year=2003

|isbn=1-55860-860-5}}

  • {{cite book

|last=Eberly

|first=David

|title=3D Game Engine Design

|publisher=Morgan Kaufmann

|year=2001

|isbn=978-1-55860-593-0

|url-access=registration

|url=https://archive.org/details/isbn_9781558605930

}}

  • {{cite journal

|last1=Goldberg

|first1=David

|title=What every computer scientist should know about floating-point arithmetic

|journal=ACM Computing Surveys

|date=1991

|volume=23

|issue=1

|pages=5–48

|doi=10.1145/103162.103163

|s2cid=222008826

}}

  • {{cite book|last1=Hardy|first1=Godfrey|title=A Course of Pure Mathematics|date=1908|publisher=Cambridge University Press|location=Cambridge, UK|isbn=0-521-72055-9|url=http://www.gutenberg.org/files/38769}}
  • {{cite book

|ref={{harvid|Hennessey|Patterson|1998}}

|last=Hennessey

|first=John

|author2=Patterson, David A.

|title=Computer Organization and Design

|url=https://archive.org/details/computerorganiz000henn

|url-access=registration

|publisher=Morgan Kaufmann Publishers

|location=San Francisco, CA

|year=1998

|edition=2nd

|isbn=978-1-55860-491-9}}

  • {{cite web

|url=http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf

|title=Fast Inverse Square Root

|last=Lomont

|first=Chris

|date=February 2003

|access-date=2009-02-13}}

  • {{cite web

|url=http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf

|title=The Mathematics Behind the Fast Inverse Square Root Function Code

|last=McEniry

|first=Charles

|date=August 2007

|archive-url=https://web.archive.org/web/20150511044204/http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf

|archive-date=2015-05-11}}

  • {{cite conference

| ref = {{harvid|Middendorf|2007}}

| last1 =Middendorf | first1 =Lars

| last2 = Mühlbauer |first2=Felix

| last3 = Umlauf |first3=George

| last4 = Bodba |first4=Christophe

| date = June 1, 2007

| title = Embedded Vertex Shader in FPGA

| conference = IFIP TC10 Working Conference:International Embedded Systems Symposium (IESS)

| book-title = Embedded System Design: Topics, Techniques and Trends

| editor = Rettberg, Achin

| others = et al.

| publisher = Springer

| location = Irvine, California

| isbn = 978-0-387-72257-3

| doi = 10.1007/978-0-387-72258-0_14

| url = https://link.springer.com/content/pdf/10.1007/978-0-387-72258-0_14.pdf

| url-status = live

| archive-url = https://web.archive.org/web/20190501235316/https://link.springer.com/content/pdf/10.1007/978-0-387-72258-0_14.pdf

| archive-date = 2019-05-01

| doi-access = free

}} {{open access}}

  • {{cite web

|archive-url=https://web.archive.org/web/20090215020337/http://www.hackszine.com/blog/archive/2008/12/quakes_fast_inverse_square_roo.html

|url=http://www.hackszine.com/blog/archive/2008/12/quakes_fast_inverse_square_roo.html

|title=Quake's fast inverse square root

|last=Striegel

|first=Jason

|date=2008-12-04

|work=Hackszine

|publisher=O'Reilly Media

|archive-date=2009-02-15

|access-date=2013-01-07}}

  • {{Cite web

|last=IEEE Computer Society

|author-link=IEEE Computer Society

|date=1985

|title=754-1985 - IEEE Standard for Binary Floating-Point Arithmetic

|url=https://standards.ieee.org/ieee/754/993/ |publisher=Institute of Electrical and Electronics Engineers

|ref=ieee754}}

  • {{cite journal

|last1=Moroz

|first1=Leonid V.

|last2=Walczyk|first2= Cezary J.

|last3=Hrynchyshyn|first3= Andriy

|last4=Holimath|first4= Vijay

|last5=Cieslinski|first5= Jan L.

|title=Fast calculation of inverse square root with the use of magic constant analytical approach

|journal=Applied Mathematics and Computation

|publisher=Elsevier Science Inc.

|date=January 2018

|volume=316

|issue=C

|doi=10.1016/j.amc.2017.08.025

|pages=245–255|arxiv=1603.04483

|s2cid=7494112}}

{{Refend}}

Further reading

  • {{cite journal

|last=Kushner

|first=David

|date=August 2002

|title=The wizardry of Id

|journal=IEEE Spectrum

|volume=39

|issue=8

|pages=42–47

|doi=10.1109/MSPEC.2002.1021943}}