V-statistic

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947.{{harvtxt|von Mises|1947}} V-statistics are closely related to U-statistics{{harvtxt|Lee|1990}}{{harvtxt|Koroljuk|Borovskich|1994}} (U for "unbiased") introduced by Wassily Hoeffding in 1948.{{harvtxt|Hoeffding|1948}} A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals $T(F_n)$ of the empirical distribution function $(F_n)$ are called statistical functionals.von Mises (1947), p. 309; Serfling (1980), p. 210. Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

= Examples of statistical functions =

The k-th central moment is the functional $T(F)=\int(x-\mu)^k \, dF(x)$ , where $\mu = E[X]$ is the expected value of X. The associated statistical function is the sample k-th central moment,

: $T_n=m_k=T(F_n) = \frac 1n \sum_{i=1}^n (x_i - \overline x)^k.$

The chi-squared goodness-of-fit statistic is a statistical function T(F_n), corresponding to the statistical functional

: $T(F) = \sum_{i=1}^k \frac{(\int_{A_i} \, dF - p_i)^2}{p_i},$

where A_i are the k cells and p_i are the specified probabilities of the cells under the null hypothesis.

The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional

: $T(F) = \int (F(x) - F_0(x))^2 \, w(x;F_0) \, dF_0(x),$

where w(x; F₀) is a specified weight function and F₀ is a specified null distribution. If w is the identity function then T(F_n) is the well known Cramér–von-Mises goodness-of-fit statistic; if $w(x;F_0)=[F_0(x)(1-F_0(x))]^{-1}$ then T(F_n) is the Anderson–Darling statistic.

= Representation as a V-statistic =

Suppose x₁, ..., x_n is a sample. In typical applications the statistical function has a representation as the V-statistic

: $V_{mn} = \frac{1}{n^m} \sum_{i_1=1}^n \cdots \sum_{i_m=1}^n h(x_{i_1}, x_{i_2}, \dots, x_{i_m}),$

where h is a symmetric kernel function. SerflingSerfling (1980, Section 6.5) discusses how to find the kernel in practice. V_mn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x₁, ..., x_n, the corresponding V-statistic is defined

: $V_{2,n} = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n h(x_i, x_j).$

= Example of a V-statistic =

An example of a degree-2 V-statistic is the second central moment m₂.

If h(x, y) = (x − y)²/2, the corresponding V-statistic is

: $V_{2,n} = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n \frac{1}{2}(x_i - x_j)^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2,$

which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:

: $s^2=
{n \choose 2}^{-1} \sum_{i < j} \frac{1}{2}(x_i - x_j)^2 =
\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2$ .

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics.Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3) Let A(m) be the property defined by:

:A(m):

Var(h(X₁, ..., X_k)) = 0 for k < m, and Var(h(X₁, ..., X_k)) > 0 for k = m;

n^m/2R_mn tends to zero (in probability). (R_mn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(F_n) is asymptotically normal.

In the variance example (4), m₂ is asymptotically normal with mean $\sigma^2$ and variance $(\mu_4 - \sigma^4)/n$ , where $\mu_4=E(X-E(X))^4$ .

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and $E[h^2(X_1,X_2)]<\infty, \, E|h(X_1,X_1)|<\infty,$ and $E[h(x,X_1)]\equiv 0$ . Then nV_2,n converges in distribution to a weighted sum of independent chi-squared variables:

: $n V_{2,n} {\stackrel d \longrightarrow} \sum_{k=1}^\infty \lambda_k Z^2_k,$

where $Z_k$ are independent standard normal variables and $\lambda_k$ are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V_2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.See Lee (1990, p. 160) for the kernel function.

Notes

References

{{cite journal

| last = Hoeffding | first = W.

| year = 1948

| title = A class of statistics with asymptotically normal distribution

| journal = Annals of Mathematical Statistics

| volume = 19 | issue = 3

| pages = 293–325

| jstor = 2235637

| doi=10.1214/aoms/1177730196

| doi-access = free

}}

{{cite book

| last1 = Koroljuk | first1 = V.S.

| last2 = Borovskich | first2 = Yu.V.

| year = 1994

| title = Theory of U-statistics

| edition = English translation by P.V.Malyshev and D.V.Malyshev from the 1989 Ukrainian

| publisher = Kluwer Academic Publishers | location = Dordrecht

| isbn = 0-7923-2608-3

}}

{{cite book

| last = Lee | first = A.J.

| year = 1990

| title = U-Statistics: theory and practice

| publisher = Marcel Dekker, Inc. | location = New York

| isbn = 0-8247-8253-4

}}

{{cite journal

| last = Neuhaus | first = G.

| year = 1977

| title = Functional limit theorems for U-statistics in the degenerate case

| journal = Journal of Multivariate Analysis

| volume = 7 | issue = 3

| pages = 424–439

| doi = 10.1016/0047-259X(77)90083-5

| doi-access = free

}}

{{cite journal

| last = Rosenblatt | first = M.

| year = 1952

| title = Limit theorems associated with variants of the von Mises statistic

| journal = Annals of Mathematical Statistics

| volume = 23 | issue = 4

| pages = 617–623

| jstor = 2236587

| doi=10.1214/aoms/1177729341

| doi-access = free

}}

{{cite book

| last = Serfling | first = R.J.

| year = 1980

| title = Approximation theorems of mathematical statistics

| publisher = John Wiley & Sons | location = New York

| isbn = 0-471-02403-1

}}

{{cite book

| last1 = Taylor | first1 = R.L.

| last2 = Daffer | first2 = P.Z.

| last3 = Patterson | first3 = R.F.

| year = 1985

| title = Limit theorems for sums of exchangeable random variables

| publisher = Rowman and Allanheld | location = New Jersey

}}

{{cite journal

| last = von Mises | first = R.

| year = 1947

| title = On the asymptotic distribution of differentiable statistical functions

| journal = Annals of Mathematical Statistics

| volume = 18 | issue = 2

| pages = 309–348

| jstor = 2235734

| doi=10.1214/aoms/1177730385

| doi-access = free

}}

Category:Estimation theory

Category:Asymptotic theory (statistics)

V-statistic

Statistical functions

= Examples of statistical functions =

= Representation as a V-statistic =

= Example of a V-statistic =

Asymptotic distribution

See also

Notes

References