Cramér–von Mises criterion

{{Short description|Statistical test}}

{{Too technical|date=July 2023}}

In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function F^* compared to a given empirical distribution function F_n, or for comparing two empirical distributions. It is also used as a part of other algorithms, such as minimum distance estimation. It is defined as

:\omega^2 = \int_{-\infty}^{\infty} [F_n(x) - F^*(x)]^2\,\mathrm{d}F^*(x)

In one-sample applications F^* is the theoretical distribution and F_n is the empirically observed distribution. Alternatively the two distributions can both be empirically estimated ones; this is called the two-sample case.

The criterion is named after Harald Cramér and Richard Edler von Mises who first proposed it in 1928–1930.{{cite journal |first=H. |last=Cramér |title=On the Composition of Elementary Errors |journal=Scandinavian Actuarial Journal |volume=1928 |issue=1 |pages=13–74 |year=1928 |doi=10.1080/03461238.1928.10416862 }}{{cite book |first=R. E. |last=von Mises |title=Wahrscheinlichkeit, Statistik und Wahrheit |publisher=Julius Springer |year=1928 }} The generalization to two samples is due to Anderson.{{cite journal

|last= Anderson

|first= T. W.

|author-link = Theodore Wilbur Anderson

|year= 1962

|title= On the Distribution of the Two-Sample Cramer–von Mises Criterion

|journal= Annals of Mathematical Statistics

|volume= 33

|issue= 3

|pages= 1148–1159

|publisher= Institute of Mathematical Statistics

|issn= 0003-4851

|doi= 10.1214/aoms/1177704477

|url= http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aoms/1177704477

|format= PDF

|access-date= June 12, 2009

|doi-access= free

}}

The Cramér–von Mises test is an alternative to the Kolmogorov–Smirnov test (1933).A.N. Kolmogorov, "Sulla determinizione empirica di una legge di distribuzione" Giorn. Ist. Ital. Attuari , 4 (1933) pp. 83–91

Cramér–von Mises test (one sample)

Let x_1,x_2,\ldots,x_n be the observed values, in increasing order. Then the statistic is{{rp|1153}}Pearson, E.S., Hartley, H.O. (1972) Biometrika Tables for Statisticians, Volume 2, CUP. {{ISBN|0-521-06937-8}} (page 118 and Table 54)

:T = n\omega^2 = \frac{1}{12n} + \sum_{i=1}^n \left[ \frac{2i-1}{2n} - F(x_i) \right]^2.

If this value is larger than the tabulated value, then the hypothesis that the data came from the distribution F can be rejected.

=Watson test=

A modified version of the Cramér–von Mises test is the Watson testWatson, G.S. (1961) "Goodness-Of-Fit Tests on a Circle", Biometrika, 48 (1/2), 109-114 {{JSTOR|2333135}} which uses the statistic U2, where

:U^2= T-n( \bar{F}-\tfrac{1}{2} )^2,

where

:\bar{F}=\frac{1}{n} \sum_{i=1}^n F(x_i).

Cramér–von Mises test (two samples)

Let x_1,x_2,\ldots,x_N and y_1,y_2,\ldots,y_M be the observed values in the first and second sample respectively, in increasing order. Let r_1,r_2,\ldots,r_N be the ranks of the xs in the combined sample, and let s_1,s_2,\ldots,s_M be the ranks of the ys in the combined sample. Anderson{{rp|1149}} shows that

:T = \frac{NM}{N+M} \omega^2 = \frac{U}{N M (N+M)} - \frac{4 M N - 1}{6(M+N)}

where U is defined as

:U = N \sum_{i=1}^N (r_i-i)^2 + M \sum_{j=1}^M (s_j-j)^2

If the value of T is larger than the tabulated values,{{rp|1154–1159}} the hypothesis that the two samples come from the same distribution can be rejected. (Some books{{Specify|date=December 2008}} give critical values for U, which is more convenient, as it avoids the need to compute T via the expression above. The conclusion will be the same.)

The above assumes there are no duplicates in the x, y, and r sequences. So x_i is unique, and its rank is i in the sorted list x_1,\ldots,x_N. If there are duplicates, and x_i through x_j are a run of identical values in the sorted list, then one common approach is the midrankRuymgaart, F. H., (1980) "A unified approach to the asymptotic distribution theory of certain midrank statistics". In: Statistique non Parametrique Asymptotique, 1±18, J. P. Raoult (Ed.), Lecture Notes on Mathematics, No. 821, Springer, Berlin. method: assign each duplicate a "rank" of (i+j)/2. In the above equations, in the expressions (r_i-i)^2 and (s_j-j)^2, duplicates can modify all four variables r_i, i, s_j, and j.

References

{{Reflist}}

  • {{Cite book|editor1=D'Agostino, R.B. |editor2=Stephens, M.A. |

year = 1986|

title = Goodness-of-Fit Techniques|

chapter = Tests Based on EDF Statistics|

author = M. A. Stephens|

publisher = Marcel Dekker|

location = New York|

isbn = 0-8247-7487-6}}

Further reading

  • {{cite journal

| last = Xiao

| first = Y.

|author2=A. Gordon |author3=A. Yakovlev

|date=January 2007

| title = A C++ Program for the Cramér–von Mises Two-Sample Test

| journal = Journal of Statistical Software

| volume = 17

| issue = 8

| doi = 10.18637/jss.v017.i08

| s2cid = 54098783

| issn = 1548-7660

| oclc = 42456366

| url = http://www.jstatsoft.org/v17/i08/paper

| format = PDF

| access-date = June 12, 2009

| doi-access= free

}}

{{DEFAULTSORT:Cramer-von Mises criterion}}

Category:Statistical distance

Category:Nonparametric statistics

Category:Normality tests