Empirical process

In probability theory, an empirical process is a stochastic process that characterizes the deviation of the empirical distribution function from its expectation.

In mean field theory, limit theorems (as the number of objects becomes large) are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.{{Cite journal | last1 = Mojirsheibani | first1 = M. | title = Nonparametric curve estimation with missing data: A general empirical process approach | doi = 10.1016/j.jspi.2006.02.016 | journal = Journal of Statistical Planning and Inference | volume = 137 | issue = 9 | pages = 2733–2758 | year = 2007 }}

Definition

For X₁, X₂, ... X_n independent and identically-distributed random variables in R with common cumulative distribution function F(x), the empirical distribution function is defined by

: $F_n(x)=\frac{1}{n}\sum_{i=1}^n I_{(-\infty,x]}(X_i),$

where I_C is the indicator function of the set C.

For every (fixed) x, F_n(x) is a sequence of random variables which converge to F(x) almost surely by the strong law of large numbers. That is, F_n converges to F pointwise. Glivenko and Cantelli strengthened this result by proving uniform convergence of F_n to F by the Glivenko–Cantelli theorem.{{Cite journal | last1 = Wolfowitz | first1 = J. | doi = 10.1214/aoms/1177728852 | title = Generalization of the Theorem of Glivenko-Cantelli | journal = The Annals of Mathematical Statistics | volume = 25 | pages = 131–138 | year = 1954 | doi-access = free }}

A centered and scaled version of the empirical measure is the signed measure

: $G_n(A)=\sqrt{n}(P_n(A)-P(A))$

It induces a map on measurable functions f given by

: $f\mapsto G_n f=\sqrt{n}(P_n-P)f=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n f(X_i)-\mathbb{E}f\right)$

By the central limit theorem, $G_n(A)$ converges in distribution to a normal random variable N(0, P(A)(1 − P(A))) for fixed measurable set A. Similarly, for a fixed function f, $G_nf$ converges in distribution to a normal random variable $N(0,\mathbb{E}(f-\mathbb{E}f)^2)$ , provided that $\mathbb{E}f$ and $\mathbb{E}f^2$ exist.

Definition

: $\bigl(G_n(c)\bigr)_{c\in\mathcal{C}}$ is called an empirical process indexed by $\mathcal{C}$ , a collection of measurable subsets of S.

: $\bigl(G_nf\bigr)_{f\in\mathcal{F}}$ is called an empirical process indexed by $\mathcal{F}$ , a collection of measurable functions from S to $\mathbb{R}$ .

A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of Donsker classes: sets of functions with the useful property that empirical processes indexed by these classes converge weakly to a certain Gaussian process. While it can be shown that Donsker classes are Glivenko–Cantelli classes, the converse is not true in general.

Example

As an example, consider empirical distribution functions. For real-valued iid random variables X₁, X₂, ..., X_n they are given by

: $F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.$

In this case, empirical processes are indexed by a class $\mathcal{C}=\{(-\infty,x]:x\in\mathbb{R}\}.$ It has been shown that $\mathcal{C}$ is a Donsker class, in particular,

: $\sqrt{n}(F_n(x)-F(x))$ converges weakly in $\ell^\infty(\mathbb{R})$ to a Brownian bridge B(F(x)) .

References

External links

[http://www.stat.yale.edu/~pollard/Books/Iowa Empirical Processes: Theory and Applications], by David Pollard, a textbook available online.
[http://www.bios.unc.edu/~kosorok/current.pdf Introduction to Empirical Processes and Semiparametric Inference], by Michael Kosorok, another textbook available online.

Category:Nonparametric statistics

Empirical process

Definition

Example

See also

References

Further reading

External links