Sanov's theorem

{{Short description|Mathematical theorem}}

{{multiple issues|

{{technical|date=February 2012}}

{{more footnotes|date=February 2012}}

}}

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector x^n = (x_1, x_2, \ldots, x_n). Then, we have the following bound on the probability that the empirical measure \hat{p}_{x^n} of the samples falls within the set A:

:q^n(\hat{p}_{x^n}\in A) \le (n+1)^

X
2^{-nD_{\mathrm{KL}}(p^*||q)},

where

D_{\mathrm{KL}}(P \| Q) = \sum_{x \in \mathcal{X}} P(x) \log \frac{P(x)}{Q(x)}.

In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set, then

:\lim_{n\to\infty}\frac{1}{n}\log q^n(\hat{p}_{x^n}\in A) = -D_{\mathrm{KL}}(p^*||q).

Technical statement

Define:

  • \Sigma is a finite set with size \geq 2. Understood as “alphabet”.
  • \Delta(\Sigma) is the simplex spanned by the alphabet. It is a subset of \R^\Sigma.
  • L_n is a random variable taking values in \Delta(\Sigma). Take n samples from the distribution \mu, then L_n is the frequency probability vector for the sample.
  • \mathcal L_n is the space of values that L_n can take. In other words, it is

\{(a_1/n, \dots, a_

\Sigma
/n): \sum_i a_i = n, a_i \in \N\}Then, Sanov's theorem states:{{Cite journal |last1=Dembo |first1=Amir |last2=Zeitouni |first2=Ofer |date=2010 |title=Large Deviations Techniques and Applications |url=https://link.springer.com/book/10.1007/978-3-642-03311-7 |journal=Stochastic Modelling and Applied Probability |volume=38 |language=en |pages=16–17 |doi=10.1007/978-3-642-03311-7 |isbn=978-3-642-03310-0 |issn=0172-4568|url-access=subscription }}

  • For every measurable subset S \in \Delta(\Sigma),-\inf_{\nu \in int(S)} D(\nu\|\mu) \leq \liminf_n \frac 1n \ln P_\mu(L_n \in S) \leq \limsup_n \frac 1n \ln P_\mu(L_n \in S) \leq -\inf_{\nu \in cl(S)} D(\nu \| \mu)
  • For every open subset U \in \Delta(\Sigma),-\lim_n \lim_{\nu \in U \cap \mathcal L_n} D(\nu\|\mu) = \lim_n \frac 1n \ln P_\mu(L_n \in S) = -\inf_{\nu \in U} D(\nu\|\mu)

Here, int(S) means the interior, and cl(S) means the closure.

References

{{Reflist}}

  • {{Cite book

| last1 = Cover | first1 = Thomas M.

| last2 = Thomas | first2 = Joy A.

| title = Elements of Information Theory

| url = https://archive.org/details/elementsinformat00cove_577 | url-access = limited | publisher = Wiley Interscience

| edition = 2

| date = 2006

| location = Hoboken, New Jersey

| pages = [https://archive.org/details/elementsinformat00cove_577/page/n387 362]| isbn = 9780471241959

}}

  • Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
  • Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". ''МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.

Category:Information theory

Category:Probabilistic inequalities

{{probability-stub}}