Continuous mapping theorem

In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine's definition, is such a function that maps convergent sequences into convergent sequences: if x_n → x then g(x_n) → g(x). The continuous mapping theorem states that this will also be true if we replace the deterministic sequence {x_n} with a sequence of random variables {X_n}, and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables.

This theorem was first proved by Henry Mann and Abraham Wald in 1943,{{cite journal | doi = 10.1214/aoms/1177731415 | last1 = Mann |first1=H. B. | last2=Wald |first2=A. | year = 1943 | title = On Stochastic Limit and Order Relationships | journal = Annals of Mathematical Statistics | volume = 14 | issue = 3 | pages = 217–226 | jstor = 2235800 | doi-access = free }} and it is therefore sometimes called the Mann–Wald theorem.{{cite book | last = Amemiya | first = Takeshi | author-link = Takeshi Amemiya | year = 1985 | title = Advanced Econometrics | publisher = Harvard University Press | location = Cambridge, MA | isbn = 0-674-00560-0 | url = https://books.google.com/books?id=0bzGQE14CwEC&pg=pA88 |page=88 }} Meanwhile, Denis Sargan refers to it as the general transformation theorem.{{cite book |first=Denis |last=Sargan |title=Lectures on Advanced Econometric Theory |location=Oxford |publisher=Basil Blackwell |year=1988 |isbn=0-631-14956-2 |pages=4–8 }}

Statement

Let {X_n}, X be random elements defined on a metric space S. Suppose a function {{nowrap|g: S→S′}} (where S′ is another metric space) has the set of discontinuity points D_g such that {{nowrap|1=Pr[X ∈ D_g] = 0}}. Then{{cite book | last = Billingsley | first = Patrick | author-link = Patrick Billingsley | title = Convergence of Probability Measures | year = 1969 | publisher = John Wiley & Sons | isbn = 0-471-07242-7|page=31 (Corollary 1) }}{{cite book | last = van der Vaart | first = A. W. | title = Asymptotic Statistics | year = 1998 | publisher = Cambridge University Press | location = New York | isbn = 0-521-49603-9 | url =https://books.google.com/books?id=UEuQEM5RjWgC&pg=PA7 |page=7 (Theorem 2.3) }}

\begin{align}

X_n \ \xrightarrow\text{d}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{d}\ g(X); \\[6pt]

X_n \ \xrightarrow\text{p}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\text{p}\ g(X); \\[6pt]

X_n \ \xrightarrow{\!\!\text{a.s.}\!\!}\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow{\!\!\text{a.s.}\!\!}\ g(X).

\end{align}

where the superscripts, "d", "p", and "a.s." denote convergence in distribution, convergence in probability, and almost sure convergence respectively.

Proof

This proof has been adopted from {{harv|van der Vaart|1998|loc=Theorem 2.3}}

Spaces S and S′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x − y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.

=Convergence in distribution=

We will need a particular statement from the portmanteau theorem: that convergence in distribution $X_n\xrightarrow{d}X$ is equivalent to

: $\mathbb E f(X_n) \to \mathbb E f(X)$ for every bounded continuous functional f.

So it suffices to prove that $\mathbb E f(g(X_n)) \to \mathbb E f(g(X))$ for every bounded continuous functional f. For simplicity we assume g continuous. Note that $F = f \circ g$ is itself a bounded continuous functional. And so the claim follows from the statement above. The general case is slightly more technical.

=Convergence in probability=

Fix an arbitrary ε > 0. Then for any δ > 0 consider the set B_δ defined as

B_\delta = \big\{x\in S \mid x\notin D_g:\ \exists y\in S:\ |x-y|<\delta,\, |g(x)-g(y)|>\varepsilon\big\}.

This is the set of continuity points x of the function g(·) for which it is possible to find, within the δ-neighborhood of x, a point which maps outside the ε-neighborhood of g(x). By definition of continuity, this set shrinks as δ goes to zero, so that lim_δ → 0B_δ = ∅.

Now suppose that |g(X) − g(X_n)| > ε. This implies that at least one of the following is true: either |X−X_n| ≥ δ, or X ∈ D_g, or X∈B_δ. In terms of probabilities this can be written as

\Pr\big(\big|g(X_n)-g(X)\big|>\varepsilon\big) \leq

\Pr\big(|X_n-X|\geq\delta\big) + \Pr(X\in B_\delta) + \Pr(X\in D_g).

On the right-hand side, the first term converges to zero as n → ∞ for any fixed δ, by the definition of convergence in probability of the sequence {X_n}. The second term converges to zero as δ → 0, since the set B_δ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that

\lim_{n\to\infty}\Pr \big(\big|g(X_n)-g(X)\big|>\varepsilon\big) = 0,

which means that g(X_n) converges to g(X) in probability.

= Almost sure convergence =

By definition of the continuity of the function g(·),

\lim_{n\to\infty}X_n(\omega) = X(\omega) \quad\Rightarrow\quad \lim_{n\to\infty}g(X_n(\omega)) = g(X(\omega))

at each point X(ω) where g(·) is continuous. Therefore,

: $\begin{align}$

\Pr\left(\lim_{n\to\infty}g(X_n) = g(X)\right)

&\geq \Pr\left(\lim_{n\to\infty}g(X_n) = g(X),\ X\notin D_g\right) \\

&\geq \Pr\left(\lim_{n\to\infty}X_n = X,\ X\notin D_g\right) = 1,

\end{align}

because the intersection of two almost sure events is almost sure.

By definition, we conclude that g(X_n) converges to g(X) almost surely.

References

Category:Theorems in probability theory

Category:Theorems in statistics

Category:Articles containing proofs