repeated median regression

In robust statistics, repeated median regression, also known as the repeated median estimator, is a robust linear regression algorithm.

The estimator has a breakdown point of 50%. Although it is equivariant under scaling, or under linear transformations of either its explanatory variable or its response variable, it is not under affine transformations that combine both variables.Peter J. Rousseeuw, Nathan S. Netanyahu, and David M. Mount, "[https://wis.kuleuven.be/stat/robust/papers/publications-1993/rousseeuwnetanyahumount-computrmline-ndsdar-1993.pdf New Statistical and Computational Results on the Repeated Median Regression Estimator]", in New Directions in Statistical Data Analysis and Robustness, edited by Stephan Morgenthaler, Elvezio Ronchetti, and Werner A. Stahel, Birkhauser Verlag, Basel, 1993, pp. 177-194. It can be calculated in O(n^2) time by brute force, in O(n \log^2 n) time using more sophisticated techniques,{{cite conference

| last1 = Stein | first1 = Andrew

| last2 = Werman | first2 = Michael

| contribution = Finding the repeated median regression line

| contribution-url = https://dl.acm.org/citation.cfm?id=139404.139485

| isbn = 0-89791-466-X

| location = Philadelphia, PA, USA

| pages = 409–413

| publisher = Society for Industrial and Applied Mathematics

| title = Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '92)

| year = 1992}} or in O(n\log n) randomized expected time.{{citation

| last1 = Matoušek | first1 = J. | author1-link = Jiří Matoušek (mathematician)

| last2 = Mount | first2 = D. M. | author2-link = David Mount

| last3 = Netanyahu | first3 = N. S. | author3-link = Nathan Netanyahu

| doi = 10.1007/PL00009190

| issue = 2

| journal = Algorithmica

| mr = 1484533

| pages = 136–150

| title = Efficient randomized algorithms for the repeated median line estimator

| volume = 20

| year = 1998| s2cid = 17362967 }} It may also be calculated using an on-line algorithm with O(n) update time.{{Cite journal|last1=Bernholt|first1=Thorsten|last2=Fried|first2=Roland|title=Computing the update of the repeated median regression line in linear time|journal=Information Processing Letters|volume=88|issue=3|pages=111–117|doi=10.1016/s0020-0190(03)00350-8|year=2003|hdl=2003/5224|url=https://www.econstor.eu/bitstream/10419/77373/2/2002-43.pdf |hdl-access=free}}

Method

The repeated median method estimates the slope of the regression line y = A + Bx for a set of points (X_i, Y_i) as

:\widehat B = \underset{i}{\operatorname{median}} \ \underset{j\,\ne\,i}{\operatorname{median}} \ \operatorname{slope}(i, j)

where \operatorname{slope}(i,j) is defined as (Y_j - Y_i) / (X_j - X_i).{{Cite web|url=http://apps.dtic.mil/dtic/tr/fulltext/u2/a092660.pdf|archive-url=https://web.archive.org/web/20180728183415/http://www.dtic.mil/dtic/tr/fulltext/u2/a092660.pdf|url-status=live|archive-date=July 28, 2018|title=Technical Report No. 172, Series 2 By Department of Statistics Princeton University: Robust Regression Using Repeated Medians|last=Siegel|first=Andrew|date=September 1980|website=|access-date=20 February 2018}}

The estimated Y-axis intercept is defined as

:\widehat A = \underset{i}{\operatorname{median}} \ \underset{j\,\ne\,i}{\operatorname{median}} \ \operatorname{intercept}(i, j)

where \operatorname{intercept}(i, j) is defined as (X_j Y_i - X_i Y_j ) / (X_j - X_i).

A simpler and faster alternative to estimate the intercept \widehat A is to use the value \widehat B just estimated, thus:

:\widehat A = \underset{i}{\operatorname{median}} \ (y_i - \widehat {B} x_i)

Note: The direct and hierarchical methods of estimating \widehat A give slightly different values, with the hierarchical method normally being the best estimate. This latter hierarchical approach is idential to the method of estimating \widehat A in Theil–Sen estimator regression.

See also

References