Dixon's factorization method

In number theory, Dixon's factorization method (also Dixon's random squares method{{cite book |first=Thorsten |last=Kleinjung |chapter=Factorization of a 768-Bit RSA Modulus |title=Advances in Cryptology – CRYPTO 2010 |series=Lecture Notes in Computer Science |year=2010 |volume=6223 |pages=333–350 |doi=10.1007/978-3-642-14623-7_18 |display-authors=etal|isbn=978-3-642-14622-0 |s2cid=11556080 }} or Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method. Unlike for other factor base methods, its run-time bound comes with a rigorous proof that does not rely on conjectures about the smoothness properties of the values taken by a polynomial.

The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.{{cite journal |last=Dixon |first=J. D. |author-link=John D. Dixon |year=1981 |title=Asymptotically fast factorization of integers |url=https://www.ams.org/journals/mcom/1981-36-153/S0025-5718-1981-0595059-1/S0025-5718-1981-0595059-1.pdf |journal=Math. Comp. |volume=36 |issue=153 |pages=255–260 |doi=10.1090/S0025-5718-1981-0595059-1 |jstor=2007743 |doi-access=free}}

Basic idea

Dixon's method is based on finding a congruence of squares modulo the integer N which is intended to factor. Fermat's factorization method finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x2 mod N is a perfect square (in the integers):

:x^2\equiv y^2\quad(\hbox{mod }N),\qquad x\not\equiv\pm y\quad(\hbox{mod }N).

For example, if {{nowrap|N {{=}} 84923}}, (by starting at 292, the first number greater than {{radic|N}} and counting up) the {{nowrap|5052 mod 84923}} is 256, the square of 16. So {{nowrap|(505 − 16)(505 + 16) {{=}} 0 mod 84923}}. Computing the greatest common divisor of {{nowrap|505 − 16}} and N using Euclid's algorithm gives 163, which is a factor of N.

In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are only {{radic|N}} squares less than N.

Dixon's method replaces the condition "is the square of an integer" with the much weaker one "has only small prime factors"; for example, there are 292 squares smaller than 84923; 662 numbers smaller than 84923 whose prime factors are only 2,3,5 or 7; and 4767 whose prime factors are all less than 30. (Such numbers are called B-smooth with respect to some bound B.)

If there are many numbers a_1 \ldots a_n whose squares can be factorized as a_i^2 \mod N = \prod_{j=1}^m b_j^{e_{ij}} for a fixed set b_1 \ldots b_m of small primes, linear algebra modulo 2 on the matrix e_{ij} will give a subset of the a_i whose squares combine to a product of small primes to an even power — that is, a subset of the a_i whose squares multiply to the square of a (hopefully different) number mod N.

Method

Suppose the composite number N is being factored. Bound B is chosen, and the factor base is identified (which is called P), the set of all primes less than or equal to B. Next, positive integers z are sought such that z2 mod N is B-smooth. Therefore we can write, for suitable exponents ai,

: z^2 \text{ mod } N = \prod_{p_i\in P} p_i^{a_i}

When enough of these relations have been generated (it is generally sufficient that the number of relations be a few more than the size of P), the methods of linear algebra, such as Gaussian elimination, can be used to multiply together these various relations in such a way that the exponents of the primes on the right-hand side are all even:

: {z_1^2 z_2^2 \cdots z_k^2 \equiv \prod_{p_i\in P} p_i^{a_{i,1}+a_{i,2}+\cdots+a_{i,k}}\ \pmod{N}\quad (\text{where } a_{i,1}+a_{i,2}+\cdots+a_{i,k} \equiv 0\pmod{2}) }

This yields a congruence of squares of the form {{nowrap|a2b2 (mod N),}} which can be turned into a factorization of N, {{nowrap|N {{=}} gcd(a + b, N) × (N/gcd(a + b, N)).}} This factorization might turn out to be trivial (i.e. {{nowrap|N {{=}} N × 1}}), which can only happen if {{nowrap|a ≡ ±b (mod N),}} in which case another try must be made with a different combination of relations; but if a nontrivial pair of factors of N is reached, the algorithm terminates.

Pseudocode

This section is taken directly from Dixon(1981).

Dixon's Algorithm

Initialization. Let L be a list of integers in the range [1, n], and let P = {p₁, ..., pₕ} be the list of the h primes ≤ v. Let B and Z be initially empty lists (Z will be indexed by B).

Step 1. If L is empty, exit (algorithm unsuccessful). Otherwise, take the first term z from L, remove it from L, and proceed to Step 2.

Step 2. Compute w as the least positive remainder of z² mod n. Factor w as:

w = w' \prod_i p_i^{a_i}

where w′ has no factor in P. If w′ = 1, proceed to Step 3; otherwise, return to Step 1.

Step 3. Let a ← (a₁, ..., aₕ). Add a to B and z to Z. If B has at most h elements, return to Step 1; otherwise, proceed to Step 4.

Step 4. Find the first vector c in B that is linearly dependent (mod 2) on earlier vectors in B. Remove c from B and z_c from Z. Compute coefficients f_b such that:

\mathbf{c} \equiv \sum_{b \in B} f_b \mathbf{b} \pmod{2}

Define:

\mathbf{d} = (d_1, \dots, d_n) \gets \frac{1}{2} \left(\mathbf{c} + \sum f_b \mathbf{b} \right)

Proceed to Step 5.

Step 5. Compute:

x \gets z_c \prod_b z_b^{f_b}, \quad y \gets \prod_i p_i^{d_i}

so that:

x^2 \equiv \prod_i p_i^{2d_i} = y^2 \mod n.

If x \equiv y or x \equiv -y \pmod{n}, return to Step 1. Otherwise, return:

\gcd(n, x + y)

which provides a nontrivial factor of n, and terminate successfully.

Step-by-step example : factorizing ( n = 84923) using Dixon’s Algorithm

This example is lifted directly from the LeetArxiv substack.Kibicho, Murage (2025). [https://leetarxiv.substack.com/p/hand-written-paper-implementation Hand-Written Paper Implementation]: Asymptotically Fast Factorization of Integers. Credit is given to the original author.

  • Initialization:
  • Define a list of numbers L, ranging from 1 to 84923:

::: L = \{1, \dots, 84923\}

:* Define a value v, which is the smoothness factor:

::: v = 7

:* Define a list P containing all the prime numbers less than or equal to v:

::: P = {2, 3, 5, 7}

:* Define B and Z, two empty lists. B is a list of powers, while Z is a list of accepted integers:

::: B = [ ]

::: Z = [ ]

  • Step 1: Iterating z values
  • Write a for loop that indexes the list L. Each element in L is indexed as z. The for loop exits at the end of the list.

::

int n = 84923;

for(int i = 1; i <= n; i++)

{

int z = i;

}

  • Step 2: Computing z^2 \mod n and v-smooth Prime Factorization
  • To proceed, compute z^2 \mod 84923 for each z, then express the result as a prime factorization.

::

1^2 \mod 84923 \equiv 1 \mod 84923 = 2^0 \cdot 3^0 \cdot 5^0 \cdot 7^0 \mod 84923

::

\vdots

::

513^2 \mod 84923 = 8400 \mod 84923 = 2^4 \cdot 3^1 \cdot 5^2 \cdot 7^1 \mod 84923

::

\vdots

::

537^2 \mod 84923 = 33600 \mod 84923 = 2^6 \cdot 3^1 \cdot 5^2 \cdot 7^1 \mod 84923

::

538^2 \mod 84923 = 34675 \mod 84923 = 5^2 \cdot 19^1 \cdot 73^1 \mod 84923

::This step continues for all values of z in the range.

  • Step 3: If z^2 \mod 84923 is 7-smooth, then append its powers to list B and append z to list Z.

::Z = \{1, 513, 537\}

::B = \{ [0, 0, 0, 0], [4, 1, 2, 1], [6, 1, 2, 1] \}

  • Step 4: This step is split into two parts.

:: Part 1: Find B modulo 2.

::

B =

\begin{pmatrix}

0 & 0 & 0 & 0 \\

4 & 1 & 2 & 1 \\

6 & 1 & 2 & 1

\end{pmatrix}

\mod 2

\equiv

B =

\begin{pmatrix}

0 & 0 & 0 & 0 \\

0 & 1 & 0 & 1 \\

0 & 1 & 0 & 1

\end{pmatrix}

:: Part 2: Check if any row combinations of B sum to even numbers

:::For example, summing Row 2 and Row 3 gives us a vector of even numbers.

:::

R_2 = \{0, 1, 0, 1\} and R_3 = \{0, 1, 0, 1\}

:::then

:::R_2 + R_3 = \{0, 1, 0, 1\} + \{0, 1, 0, 1\}

:::R_2 + R_3 = \{0, 2, 0, 2\}.

  • Step 5 : This step is split into four parts.
  • Part 1. (Finding x): Multiply the corresponding z values for the rows found in Step 4. Then find the square root. This gives us x.
  • For Row 2, we had 2^4 * 3^1 * 5^2 * 7^1.
  • For Row 3, we had 2^6 * 3^1 * 5^2 * 7^1.
  • Thus, we find x :

::::

\begin{array}{ll}

(513 \cdot 537) ^ 2 \mod 84923 = y ^ 2 \\

\\

\text{where} \quad x^2 \mod 84923 = (513 \cdot 537) ^ 2 \mod 84923 \\

\\

\text{thus} \quad x = (513 \cdot 537) \mod 84923 \\

\\

\text{so} \quad x = 275481 \mod 84923\\

\\

\text{Finally} \quad x = 20712 \mod 84923\\

\end{array}

:* Part 2. (Finding y): Multiply the corresponding smooth factorizations for the rows found in Step 4. Then find the square root. This gives us y.

:::

\begin{array}{ll}

y^2 = (2^4 \cdot 3^1 \cdot 5^2 \cdot 7^1) \times (2^6 \cdot 3^1 \cdot 5^2 \cdot 7^1) \\

\\

\text{By the multiplication law of exponents,} \\

y^2 = 2^{(4+6)} \cdot 3^{(1+1)} \cdot 5^{(2+2)} \cdot 7^{(1+1)} \\

\\

\text{Thus,} \\

y^2 = 2^{10} \cdot 3^2 \cdot 5^4 \cdot 7^2 \\

\\

\text{Taking square roots on both sides gives} \\

y = 2^5 \cdot 3^1 \cdot 5^2 \cdot 7^1 \\

\\

\text{Therefore,} \\

y = 32 \times 3 \times 25 \times 7 \\

\\

\text{Finally,} \\

y = 16800

\end{array}

:* Part 3. (Find x + y and x - y) where x = 20712 and y = 16800.

::: x + y = 20712 + 16800 = 37512

::: x - y = 20712 - 16800 = 3912

:* Part 4. Compute GCD(x+y, n) and GCD(x-y, n), where n = 84923, x+y = 292281 and x-y = 258681

:::

\begin{array}{ll}

\gcd(37512, 84923) = 521 \\

\gcd(3912, 84923) = 163

\end{array}

Quick check shows 84923 = 521 \times 163.

Optimizations

The quadratic sieve is an optimization of Dixon's method. It selects values of x close to the square root of {{var|N}} such that x2 modulo N is small, thereby largely increasing the chance of obtaining a smooth number.

Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number z cannot have more than \log_2 z factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.

A more sophisticated analysis, using the approximation that a number has all its prime factors less than N^{1/a} with probability about a^{-a} (an approximation to the Dickman–de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of \exp\left(\sqrt{\log N \log \log N}\right).

The optimal complexity of Dixon's method is

:O\left(\exp\left(2 \sqrt 2 \sqrt{\log n \log \log n}\right)\right)

in big-O notation, or

:L_n [1/2, 2 \sqrt 2]

in L-notation.

References