Error exponents in hypothesis testing

In statistical hypothesis testing, the error exponent of a hypothesis testing procedure is the rate at which the probabilities of Type I and Type II decay exponentially with the size of the sample used in the test. For example, if the probability of error P_{\mathrm{error}} of a test decays as e^{-n \beta}, where n is the sample size, the error exponent is \beta.

Formally, the error exponent of a test is defined as the limiting value of the ratio of the negative logarithm of the error probability to the sample size for large sample sizes: \lim_{n \to \infty}\frac{-\ln P_\text{error}}{n}. Error exponents for different hypothesis tests are computed using Sanov's theorem and other results from large deviations theory.

Error exponents in binary hypothesis testing

Consider a binary hypothesis testing problem in which observations are modeled as independent and identically distributed random variables under each hypothesis. Let Y_1, Y_2, \ldots, Y_n denote the observations. Let f_0 denote the probability density function of each observation Y_i under the null hypothesis H_0 and let f_1 denote the probability density function of each observation Y_i under the alternate hypothesis H_1.

In this case there are two possible error events. Error of type 1, also called false positive, occurs when the null hypothesis is true and it is wrongly rejected. Error of type 2, also called false negative, occurs when the alternate hypothesis is true and null hypothesis is not rejected. The probability of type 1 error is denoted P (\mathrm{error}\mid H_0) and the probability of type 2 error is denoted P (\mathrm{error}\mid H_1).

=Optimal error exponent for Neyman–Pearson testing=

In the Neyman–Pearson{{citation | last1 = Neyman | first1 = J. | authorlink1 = Jerzy Neyman| last2 = Pearson | first2 = E. S. | authorlink2 = Egon Pearson| doi = 10.1098/rsta.1933.0009 | title = On the problem of the most efficient tests of statistical hypotheses | journal = Philosophical Transactions of the Royal Society of London A | volume = 231 | issue = 694–706 | pages = 289–337 | year = 1933 | jstor = 91247 |bibcode = 1933RSPTA.231..289N | url = http://www.stats.org.uk/statistical-inference/NeymanPearson1933.pdf | doi-access = free }} version of binary hypothesis testing, one is interested in minimizing the probability of type 2 error P (\text{error}\mid H_1) subject to the constraint that the probability of type 1 error P (\text{error}\mid H_0) is less than or equal to a pre-specified level \alpha. In this setting, the optimal testing procedure is a likelihood-ratio test.{{cite book|title=Testing Statistical Hypotheses|edition=3|isbn=978-0-387-98864-1|last1=Lehmann|first1=E. L.|authorlink1 = Erich Leo Lehmann|first2=Joseph P.|last2=Romano|year=2005|publisher=Springer|location=New York}} Furthermore, the optimal test guarantees that the type 2 error probability decays exponentially in the sample size n according to \lim_{n \to \infty} \frac{- \ln P (\mathrm{error}\mid H_1)}{n} = D(f_0\parallel f_1).{{cite book|title=Elements of Information Theory|edition=2|last1=Cover|first1=Thomas M.|authorlink1=Thomas M. Cover|first2=Joy A.|last2=Thomas|year=2006|publisher=Wiley-Interscience|location=New York}} The error exponent D(f_0\parallel f_1) is the Kullback–Leibler divergence between the probability distributions of the observations under the two hypotheses. This exponent is also referred to as the Chernoff–Stein lemma exponent.

=Optimal error exponent for average error probability in Bayesian hypothesis testing=

In the Bayesian version of binary hypothesis testing one is interested in minimizing the average error probability under both hypothesis, assuming a prior probability of occurrence on each hypothesis. Let \pi_0 denote the prior probability of hypothesis H_0 . In this case the average error probability is given by P_\text{ave} = \pi_0 P (\text{error}\mid H_0) + (1-\pi_0)P (\text{error}\mid H_1). In this setting again a likelihood ratio test is optimal and the optimal error decays as \lim_{n \to \infty} \frac{- \ln P_\text{ave} }{n} = C(f_0,f_1) where C(f_0,f_1) represents the Chernoff-information between the two distributions defined as C(f_0,f_1) = \max_{\lambda \in [0,1]} \left[-\ln \int (f_0(x))^\lambda (f_1(x))^{(1-\lambda)} \, dx \right].

References