Pollard's rho algorithm

{{Short description|Integer factorization algorithm}}

{{about|the integer factorization algorithm|the discrete logarithm algorithm|Pollard's rho algorithm for logarithms}}

Pollard's rho algorithm is an algorithm for integer factorization. It was invented by John Pollard in 1975.{{cite journal |last=Pollard |first=J. M. |year=1975 |title=A Monte Carlo method for factorization |url=https://www.cs.cmu.edu/~avrim/451f11/lectures/lect1122_Pollard.pdf |journal=BIT Numerical Mathematics |volume=15 |issue=3 |pages=331–334 |doi=10.1007/bf01933667 |s2cid=122775546}} It uses only a small amount of space, and its expected running time is proportional to the square root of the smallest prime factor of the composite number being factorized.

Core ideas

The algorithm is used to factorize a number n = pq, where p is a non-trivial factor. A polynomial modulo n, called g(x) (e.g., g(x) = (x^2 + 1) \bmod n), is used to generate a pseudorandom sequence. It is important to note that g(x) must be a polynomial. A starting value, say 2, is chosen, and the sequence continues as x_1 = g(2), x_2 = g(g(2)), x_3 = g(g(g(2))), etc. The sequence is related to another sequence \{x_k \bmod p\}. Since p is not known beforehand, this sequence cannot be explicitly computed in the algorithm. Yet in it lies the core idea of the algorithm.

Because the number of possible values for these sequences is finite, both the \{x_k\} sequence, which is mod n, and \{x_k \bmod p\} sequence will eventually repeat, even though these values are unknown. If the sequences were to behave like random numbers, the birthday paradox implies that the number of x_k before a repetition occurs would be expected to be O(\sqrt N), where N is the number of possible values. So the sequence \{x_k \bmod p\} will likely repeat much earlier than the sequence \{x_k\}. When one has found a k_1,k_2 such that x_{k_1}\neq x_{k_2} but x_{k_1}\equiv x_{k_2}\bmod p, the number |x_{k_1}-x_{k_2}| is a multiple of p, so a non-trivial divisor has been found.

Once a sequence has a repeated value, the sequence will cycle, because each value depends only on the one before it. This structure of eventual cycling gives rise to the name "rho algorithm", owing to similarity to the shape of the Greek letter ρ when the values x_1 \bmod p, x_2 \bmod p, etc. are represented as nodes in a directed graph.

File:Pollard rho cycle.svg

This is detected by Floyd's cycle-finding algorithm: two nodes i and j (i.e., x_i and x_j) are kept. In each step, one moves to the next node in the sequence and the other moves forward by two nodes. After that, it is checked whether \gcd(x_i - x_j, n) \ne 1. If it is not 1, then this implies that there is a repetition in the \{x_k \bmod p\} sequence (i.e. x_i \bmod p = x_j \bmod p). This works because if the x_i \bmod p is the same as x_j \bmod p, the difference between x_i and x_j is necessarily a multiple of p. Although this always happens eventually, the resulting greatest common divisor (GCD) is a divisor of n other than 1. This may be n itself, since the two sequences might repeat at the same time. In this (uncommon) case the algorithm fails, it can be repeated with a different parameter.

Algorithm

The algorithm takes as its inputs {{mvar|n}}, the integer to be factored; and {{tmath|g(x)}}, a polynomial in {{mvar|x}} computed modulo {{mvar|n}}. In the original algorithm, g(x) = (x^2 - 1) \bmod n, but nowadays it is more common to use g(x) = (x^2 + 1) \bmod n. The output is either a non-trivial factor of {{mvar|n}}, or failure.

It performs the following steps:{{cite book |last1=Cormen |first1=Thomas H. |authorlink=Thomas H. Cormen |last2=Leiserson |first2=Charles E. |authorlink2=Charles E. Leiserson |last3=Rivest |first3=Ronald L. |authorlink3=Ronald L. Rivest |last4=Stein |first4=Clifford |authorlink4=Clifford Stein |name-list-style=amp |chapter=Section 31.9: Integer factorization |title=Introduction to Algorithms |year=2009 |edition=third |publisher=MIT Press |location=Cambridge, MA |pages=975–980|isbn=978-0-262-03384-8 }} (this section discusses only Pollard's rho algorithm).

Pseudocode for Pollard's rho algorithm

x ← 2 // starting value

y ← x

d ← 1

while d = 1:

x ← g(x)

y ← g(g(y))

d ← gcd(|x - y|, n)

if d = n:

return failure

else:

return d

Here {{mvar|x}} and {{mvar|y}} corresponds to {{tmath|x_i}} and {{tmath|x_j}} in the previous section. Note that this algorithm may fail to find a nontrivial factor even when {{mvar|n}} is composite. In that case, the method can be tried again, using a starting value of x other than 2 (0 \leq x < n) or a different {{tmath|g(x)}}, g(x) = (x^2 + b) \bmod n, with 1 \leq b < n-2.

Example factorization

Let n = 8051 and g(x) = (x^2 + 1) \bmod 8051.

File:Rho-example-animated.gif.]]

class="wikitable" style="text-align:right"

! width=30 | {{mvar|i}}

width=60 | {{mvar|x}}width=60 | {{mvar|y}}{{math|gcd({{abs|xy}}, 8051)}}
15261
22674741
367787197
4747414811

Now 97 is a non-trivial factor of 8051. Starting values other than {{math|1=x = y = 2}} may give the cofactor (83) instead of 97. One extra iteration is shown above to make it clear that {{mvar|y}} moves twice as fast as {{mvar|x}}. Note that even after a repetition, the GCD can return to 1.

Variants

In 1980, Richard Brent published a faster variant of the rho algorithm. He used the same core ideas as Pollard but a different method of cycle detection, replacing Floyd's cycle-finding algorithm with the related Brent's cycle finding method.{{cite journal |last=Brent |first=Richard P. |authorlink=Richard Brent (scientist) |year=1980 |title=An Improved Monte Carlo Factorization Algorithm |journal=BIT |volume=20 |issue=2 |pages=176–184 |url=https://maths-people.anu.edu.au/~brent/pub/pub051.html |doi=10.1007/BF01933190|s2cid=17181286 }} CLRS gives a heuristic analysis and failure conditions (the trivial divisor n is found).

A further improvement was made by Pollard and Brent. They observed that if \gcd(a,n) > 1, then also \gcd(ab,n) > 1 for any positive integer {{tmath|b}}. In particular, instead of computing \gcd (|x-y|,n) at every step, it suffices to define {{tmath|z}} as the product of 100 consecutive |x-y| terms modulo {{tmath|n}}, and then compute a single \gcd(z,n). A major speed up results as 100 {{math|gcd}} steps are replaced with 99 multiplications modulo {{tmath|n}} and a single {{math|gcd}}. Occasionally it may cause the algorithm to fail by introducing a repeated factor, for instance when {{tmath|n}} is a square. But it then suffices to go back to the previous {{math|gcd}} term, where \gcd(z,n)=1, and use the regular ρ algorithm from there.Exercise 31.9-4 in CLRS

Application

The algorithm is very fast for numbers with small factors, but slower in cases where all factors are large. The ρ algorithm's most remarkable success was the 1980 factorization of the Fermat number {{math|F8}} = 1238926361552897 × 93461639715357977769163558199606896584051237541638188580280321.{{cite journal |last1=Brent |first1=R.P. |last2=Pollard |first2=J. M. |year=1981 |title=Factorization of the Eighth Fermat Number |journal=Mathematics of Computation |volume=36 |issue=154 |pages=627–630 |doi=10.2307/2007666|jstor=2007666 |doi-access=free }} The ρ algorithm was a good choice for {{math|F8}} because the prime factor {{mvar|p}} = 1238926361552897 is much smaller than the other factor. The factorization took 2 hours on a UNIVAC 1100/42.

Example: factoring {{mvar|n}} = 10403 = 101 · 103

The following table shows numbers produced by the algorithm, starting with x=2 and using the polynomial g(x) = (x^2 + 1) \bmod 10403.

The third and fourth columns of the table contain additional information not known by the algorithm.

They are included to show how the algorithm works.

class="wikitable" style="text-align:right;"

! {{tmath|x}} !! {{tmath|y }} !! {{tmath|x \bmod 101}} !! {{tmath|y \bmod 101}} !! step

22220
52521
2622622
6772671263
5982693264
39032665265
34182685266
156341855857
35313418{{rh|align=right}}| 97858
5168341817859
37243418888510
9783418698511
98123418158512
59833418248513
99703418728514
2369970347215
36829970467216
20169970{{rh|align=right}}| 977217
70879970177218
102899970887219
25949970697220
84999970157221
49739970247222
27999970{{rh|align=right}}| 727223

The first repetition modulo 101 is 97 which occurs in step 17. The repetition is not detected until step 23, when x \equiv y \pmod{101}. This causes \gcd (x - y, n) = \gcd (2799 - 9970, n) to be p = 101, and a factor is found.

Complexity

If the pseudorandom number x = g(x) occurring in the Pollard ρ algorithm were an actual random number, it would follow that success would be achieved half the time, by the birthday paradox in O(\sqrt p)\le O(n^{1/4}) iterations. It is believed that the same analysis applies as well to the actual rho algorithm, but this is a heuristic claim, and rigorous analysis of the algorithm remains open.{{cite book|title=Mathematics of Public Key Cryptography|first=Steven D.|last=Galbraith|publisher=Cambridge University Press|year=2012|isbn=9781107013926|contribution=14.2.5 Towards a rigorous analysis of Pollard rho|pages=272–273|url=https://books.google.com/books?id=owd76BElvosC&pg=PA272}}.

== See also ==

Notes

{{reflist|group=note}}

References

{{reflist}}

Further reading

  • {{cite conference |first1=Shi |last1=Bai |first2=Richard P. |last2=Brent |authorlink2=Richard P. Brent |title=On the Efficiency of Pollard's Rho Method for Discrete Logarithms |conference=The Australasian Theory Symposium (CATS2008) |location=Wollongong |date=January 2008 |book-title=Conferences in Research and Practice in Information Technology, Vol. 77 |pages=125–131 |url=https://maths-people.anu.edu.au/~brent/pub/pub231.html}} Describes the improvements available from different iteration functions and cycle-finding algorithms.
  • {{cite book |last1=Katz |first1=Jonathan |last2=Lindell |first2=Yehuda |chapter=Chapter 8 |title=Introduction to Modern Cryptography | year=2007 |publisher=CRC Press}}
  • {{cite book | author =Samuel S. Wagstaff, Jr. | title=The Joy of Factoring | publisher=American Mathematical Society | location=Providence, RI | year=2013 | isbn=978-1-4704-1048-3 |url=https://www.ams.org/bookpages/stml-68 |author-link=Samuel S. Wagstaff, Jr. |pages= 135–138 }}