Remez algorithm

{{Short description|Algorithm to approximate functions}}

The Remez algorithm or Remez exchange algorithm, published by Evgeny Yakovlevich Remez in 1934, is an iterative algorithm used to find simple approximations to functions, specifically, approximations by functions in a Chebyshev space that are the best in the uniform norm L sense.{{cite journal |author-link=Evgeny Yakovlevich Remez |first=E. Ya. |last=Remez |title=Sur la détermination des polynômes d'approximation de degré donnée |journal=Comm. Soc. Math. Kharkov |volume=10 |pages=41 |date=1934 }}
{{cite journal |author-mask=1 |first=E. |last=Remes (Remez) |title=Sur un procédé convergent d'approximations successives pour déterminer les polynômes d'approximation |journal=Compt. Rend. Acad. Sci. |volume=198 |pages=2063–5 |language=fr |date=1934 |url=https://gallica.bnf.fr/ark:/12148/bpt6k31506/f2063.item}}
{{cite journal |author-mask=1 |first=E. |last=Remes (Remez) |title=Sur le calcul effectif des polynomes d'approximation de Tschebyschef |journal=Compt. Rend. Acad. Sci. |volume=199 |issue= |pages=337–340 |language=fr |date=1934 |url=https://gallica.bnf.fr/ark:/12148/bpt6k3151h/f337.item}}
It is sometimes referred to as Remes algorithm or Reme algorithm.{{Cite journal |last=Chiang |first=Yi-Ling F. |date=November 1988 |title=A Modified Remes Algorithm |url=https://epubs.siam.org/doi/10.1137/0909072 |journal=SIAM Journal on Scientific and Statistical Computing |volume=9 |issue=6 |pages=1058–1072 |doi=10.1137/0909072 |issn=0196-5204}}

A typical example of a Chebyshev space is the subspace of Chebyshev polynomials of order n in the space of real continuous functions on an interval, C[a, b]. The polynomial of best approximation within a given subspace is defined to be the one that minimizes the maximum absolute difference between the polynomial and the function. In this case, the form of the solution is precised by the equioscillation theorem.

Procedure

The Remez algorithm starts with the function f to be approximated and a set X of n + 2 sample points x_1, x_2, ...,x_{n+2} in the approximation interval, usually the extrema of Chebyshev polynomial linearly mapped to the interval. The steps are:

  • Solve the linear system of equations

: b_0 + b_1 x_i+ ... +b_n x_i ^ n + (-1)^ i E = f(x_i) (where i=1, 2, ... n+2 ),

:for the unknowns b_0, b_1...b_n and E.

  • Use the b_i as coefficients to form a polynomial P_n.
  • Find the set M of points of local maximum error |P_n(x) - f(x)| .
  • If the errors at every m \in M are of equal magnitude and alternate in sign, then P_n is the minimax approximation polynomial. If not, replace X with M and repeat the steps above.

The result is called the polynomial of best approximation or the minimax approximation algorithm.

A review of technicalities in implementing the Remez algorithm is given by W. Fraser.{{cite journal |doi=10.1145/321281.321282 |first=W. |last=Fraser |title=A Survey of Methods of Computing Minimax and Near-Minimax Polynomial Approximations for Functions of a Single Independent Variable |journal=J. ACM |volume=12 |pages=295–314 |year=1965 |issue=3 |s2cid=2736060 |doi-access=free }}

=Choice of initialization=

The Chebyshev nodes are a common choice for the initial approximation because of their role in the theory of polynomial interpolation. For the initialization of the optimization problem for function f by the Lagrange interpolant Ln(f), it can be shown that this initial approximation is bounded by

:\lVert f - L_n(f)\rVert_\infty \le (1 + \lVert L_n\rVert_\infty) \inf_{p \in P_n} \lVert f - p\rVert

with the norm or Lebesgue constant of the Lagrange interpolation operator Ln of the nodes (t1, ..., tn + 1) being

:\lVert L_n\rVert_\infty = \overline{\Lambda}_n(T) = \max_{-1 \le x \le 1} \lambda_n(T; x),

T being the zeros of the Chebyshev polynomials, and the Lebesgue functions being

:\lambda_n(T; x) = \sum_{j = 1}^{n + 1} \left| l_j(x) \right|, \quad l_j(x) = \prod_{\stackrel{i = 1}{i \ne j}}^{n + 1} \frac{(x - t_i)}{(t_j - t_i)}.

Theodore A. Kilgore,{{cite journal |doi=10.1016/0021-9045(78)90013-8 |first=T. A. |last=Kilgore |title=A characterization of the Lagrange interpolating projection with minimal Tchebycheff norm |journal=J. Approx. Theory |volume=24 |pages=273–288 |year=1978 |issue=4 |doi-access= }} Carl de Boor, and Allan Pinkus{{cite journal |doi=10.1016/0021-9045(78)90014-X |first1=C. |last1=de Boor |first2=A. |last2=Pinkus |title=Proof of the conjectures of Bernstein and Erdös concerning the optimal nodes for polynomial interpolation |journal=Journal of Approximation Theory |volume=24 |pages=289–303 |year=1978 |issue=4 |doi-access=free }} proved that there exists a unique ti for each Ln, although not known explicitly for (ordinary) polynomials. Similarly, \underline{\Lambda}_n(T) = \min_{-1 \le x \le 1} \lambda_n(T; x), and the optimality of a choice of nodes can be expressed as \overline{\Lambda}_n - \underline{\Lambda}_n \ge 0.

For Chebyshev nodes, which provides a suboptimal, but analytically explicit choice, the asymptotic behavior is known as{{cite journal |first1=F. W. |last1=Luttmann |first2=T. J. |last2=Rivlin |title=Some numerical experiments in the theory of polynomial interpolation |journal=IBM J. Res. Dev. |volume=9 |pages=187–191 |year=1965 |issue=3 |doi= 10.1147/rd.93.0187}}

:\overline{\Lambda}_n(T) = \frac{2}{\pi} \log(n + 1) + \frac{2}{\pi}\left(\gamma + \log\frac{8}{\pi}\right) + \alpha_{n + 1}

({{math|γ}} being the Euler–Mascheroni constant) with

:0 < \alpha_n < \frac{\pi}{72 n^2} for n \ge 1,

and upper bound{{cite book |first=T.J. |last=Rivlin |chapter=The lebesgue constants for polynomial interpolation |chapter-url=https://link.springer.com/chapter/10.1007/BFb0063594 |doi=10.1007/BFb0063594 |series=Lecture Notes in Mathematics |volume=399 |editor-last=Garnir |editor-first=H.G. |editor2-last=Unni |editor2-first=K.R. |editor3-last=Williamson |editor3-first=J.H. |title=Functional Analysis and its Applications |publisher=Springer |date=1974 |isbn=978-3-540-37827-3 |pages=422–437 }}

:\overline{\Lambda}_n(T) \le \frac{2}{\pi} \log(n + 1) + 1

Lev Brutman{{cite journal |doi=10.1137/0715046 |first=L. |last=Brutman |title=On the Lebesgue Function for Polynomial Interpolation |journal=SIAM J. Numer. Anal. |volume=15 |pages=694–704 |year=1978 |issue=4 |bibcode=1978SJNA...15..694B }} obtained the bound for n \ge 3, and \hat{T} being the zeros of the expanded Chebyshev polynomials:

:\overline{\Lambda}_n(\hat{T}) - \underline{\Lambda}_n(\hat{T}) < \overline{\Lambda}_3 - \frac{1}{6} \cot \frac{\pi}{8} + \frac{\pi}{64} \frac{1}{\sin^2(3\pi/16)} - \frac{2}{\pi}(\gamma - \log\pi)\approx 0.201.

Rüdiger Günttner{{cite journal |doi=10.1137/0717043 |first=R. |last=Günttner |title=Evaluation of Lebesgue Constants |journal=SIAM J. Numer. Anal. |volume=17 |pages=512–520 |year=1980 |issue=4 |bibcode=1980SJNA...17..512G }} obtained from a sharper estimate for n \ge 40

:\overline{\Lambda}_n(\hat{T}) - \underline{\Lambda}_n(\hat{T}) < 0.0196.

Detailed discussion

This section provides more information on the steps outlined above. In this section, the index i runs from 0 to n+1.

Step 1: Given x_0, x_1, ... x_{n+1}, solve the linear system of n+2 equations

: b_0 + b_1 x_i+ ... +b_n x_i ^ n + (-1) ^ i E = f(x_i) (where i=0, 1, ... n+1 ),

:for the unknowns b_0, b_1, ...b_n and E.

It should be clear that (-1)^i E in this equation makes sense only if the nodes x_0, ...,x_{n+1} are ordered, either strictly increasing or strictly decreasing. Then this linear system has a unique solution. (As is well known, not every linear system has a solution.) Also, the solution can be obtained with only O(n^2) arithmetic operations while a standard solver from the library would take O(n^3) operations. Here is the simple proof:

Compute the standard n-th degree interpolant p_1(x) to f(x) at the first n+1 nodes and also the standard n-th degree interpolant

p_2(x) to the ordinates (-1)^i

:p_1(x_i) = f(x_i), p_2(x_i) = (-1)^i, i = 0, ..., n.

To this end, use each time Newton's interpolation formula with the divided

differences of order 0, ...,n and O(n^2) arithmetic operations.

The polynomial p_2(x) has its i-th zero between x_{i-1} and x_i,\ i=1, ...,n, and thus no further zeroes between x_n and x_{n+1}: p_2(x_n) and p_2(x_{n+1}) have the same sign (-1)^n.

The linear combination

p(x) := p_1 (x) - p_2(x)\!\cdot\!E is also a polynomial of degree n and

:p(x_i) = p_1(x_i) - p_2(x_i)\!\cdot\! E \ = \ f(x_i) - (-1)^i E,\ \ \ \ i =0, \ldots, n.

This is the same as the equation above for i = 0, ... ,n and for any choice of E.

The same equation for i = n+1 is

:p(x_{n+1}) \ = \ p_1(x_{n+1}) - p_2(x_{n+1})\!\cdot\!E \ = \ f(x_{n+1}) - (-1)^{n+1} E and needs special reasoning: solved for the variable E, it is the definition of E:

:E \ := \ \frac{p_1(x_{n+1}) - f(x_{n+1})}{p_2(x_{n+1}) + (-1)^n}.

As mentioned above, the two terms in the denominator have same sign:

E and thus p(x) \equiv b_0 + b_1x + \ldots + b_nx^n are always well-defined.

The error at the given n+2 ordered nodes is positive and negative in turn because

:p(x_i) - f(x_i) \ = \ -(-1)^i E,\ \ i = 0, ... , n\!+\!1.

The theorem of de La Vallée Poussin states that under this condition no polynomial of degree n exists with error less than E. Indeed, if such a polynomial existed, call it \tilde p(x), then the difference

p(x)-\tilde p(x) = (p(x) - f(x)) - (\tilde p(x) - f(x)) would still be positive/negative at the n+2 nodes x_i and therefore have at least n+1 zeros which is impossible for a polynomial of degree n.

Thus, this E is a lower bound for the minimum error which can be achieved with polynomials of degree n.

Step 2 changes the notation from

b_0 + b_1x + ... + b_nx^n to p(x).

Step 3 improves upon the input nodes x_0, ..., x_{n+1} and their errors \pm E as follows.

In each P-region, the current node x_i is replaced with the local maximizer \bar{x}_i and in each N-region x_i is replaced with the local minimizer. (Expect \bar{x}_0 at A, the \bar {x}_i near x_i, and \bar{x}_{n+1} at B.) No high precision is required here,

the standard line search with a couple of quadratic fits should suffice. (See {{cite book |last1=Luenberger |first1=D.G. |last2=Ye |first2=Y. |chapter=Basic Descent Methods |chapter-url=https://link.springer.com/chapter/10.1007/978-0-387-74503-9_8 |title=Linear and Nonlinear Programming |publisher=Springer |edition=3rd |series=International Series in Operations Research & Management Science |volume=116 |date=2008 |isbn=978-0-387-74503-9 |pages=215–262 |doi=10.1007/978-0-387-74503-9_8}})

Let z_i := p(\bar{x}_i) - f(\bar{x}_i). Each amplitude |z_i| is greater than or equal to E. The Theorem of de La Vallée Poussin and its proof also

apply to z_0, ... ,z_{n+1} with \min\{|z_i|\} \geq E as the new

lower bound for the best error possible with polynomials of degree n.

Moreover, \max\{|z_i|\} comes in handy as an obvious upper bound for that best possible error.

Step 4: With \min\,\{|z_i|\} and \max\,\{|z_i|\} as lower and upper bound for the best possible approximation error, one has a reliable stopping criterion: repeat the steps until \max\{|z_i|\} - \min\{|z_i|\} is sufficiently small or no longer decreases. These bounds indicate the progress.

Variants

Some modifications of the algorithm are present on the literature.{{Citation |last1=Egidi |first1=Nadaniela |title=A New Remez-Type Algorithm for Best Polynomial Approximation |date=2020 |url=http://link.springer.com/10.1007/978-3-030-39081-5_7 |work=Numerical Computations: Theory and Algorithms |volume=11973 |pages=56–69 |editor-last=Sergeyev |editor-first=Yaroslav D. |place=Cham |publisher=Springer |doi=10.1007/978-3-030-39081-5_7 |isbn=978-3-030-39080-8 |last2=Fatone |first2=Lorella |last3=Misici |first3=Luciano |s2cid=211159177 |editor2-last=Kvasov |editor2-first=Dmitri E.}} These include:

  • Replacing more than one sample point with the locations of nearby maximum absolute differences.{{Citation needed|date=March 2022}}
  • Replacing all of the sample points with in a single iteration with the locations of all, alternating sign, maximum differences.{{cite journal |last1=Temes |first1=G.C. |last2=Barcilon |first2=V. |last3=Marshall |first3=F.C. |title=The optimization of bandlimited systems |journal=Proceedings of the IEEE |volume=61 |issue=2 |pages=196–234 |date=1973 |doi=10.1109/PROC.1973.9004 |issn=0018-9219}}
  • Using the relative error to measure the difference between the approximation and the function, especially if the approximation will be used to compute the function on a computer which uses floating point arithmetic;
  • Including zero-error point constraints.
  • The Fraser-Hart variant, used to determine the best rational Chebyshev approximation.{{Cite journal |last=Dunham |first=Charles B. |date=1975 |title=Convergence of the Fraser-Hart algorithm for rational Chebyshev approximation |url=https://www.ams.org/mcom/1975-29-132/S0025-5718-1975-0388732-9/ |journal=Mathematics of Computation |language=en |volume=29 |issue=132 |pages=1078–1082 |doi=10.1090/S0025-5718-1975-0388732-9 |issn=0025-5718|doi-access=free }}

See also

{{Portal|Mathematics}}

  • {{annotated link|Hadamard's lemma}}
  • {{annotated link|Laurent series}}
  • {{annotated link|Padé approximant}}
  • {{annotated link|Newton series}}
  • {{annotated link|Approximation theory}}
  • {{annotated link|Function approximation}}

References

{{Reflist}}