Poisson binomial distribution

{{Short description|Probability distribution}}

{{Probability distribution

| name = Poisson binomial

| type = mass

| parameters = \mathbf{p}\in [0,1]^n — success probabilities for each of the n trials

| support = k ∈ { 0, …, n }

| pdf = \sum\limits_{A\in F_k} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j)

| cdf = \sum\limits_{l=0}^k \sum\limits_{A\in F_l} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j)

| mean = \sum\limits_{i=1}^n p_i

| median =

| mode =

| variance = \sigma^2 =\sum\limits_{i=1}^n (1 - p_i)p_i

| skewness = \frac{1}{\sigma^3}\sum\limits_{i=1}^n ( 1-2p_i ) ( 1-p_i ) p_i

| kurtosis = \frac{1}{\sigma^4}\sum\limits_{i=1}^n ( 1 - 6(1 - p_i)p_i )( 1 - p_i )p_i

| entropy =

| mgf = \prod\limits_{j=1}^n (1-p_j+p_j e^t)

| cf = \prod\limits_{j=1}^n (1-p_j+p_j e^{it})

| pgf = \prod\limits_{j=1}^n (1-p_j+p_j z)

}}

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.

In other words, it is the probability distribution of the

number of successes in a collection of n independent yes/no experiments with success probabilities p_1, p_2, \dots , p_n. The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is p_1 = p_2 = \cdots = p_n.

Definitions

=Probability mass function=

The probability of having k successful trials out of a total of n can be written as the sum

{{Cite journal

| volume = 3

| issue = 2

| pages = 295–312

| last = Wang

| first = Y. H.

| title = On the number of successes in independent trials

| journal = Statistica Sinica

| year = 1993

| url = http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf

}}

:\Pr(K=k) = \sum\limits_{A\in F_k} \prod\limits_{i\in A} p_i \prod\limits_{j\in A^c} (1-p_j)

where F_k is the set of all subsets of k integers that can be selected from \{1,2,3,...,n\}. For example, if n = 3, then F_2=\left\{ \{1,2\},\{1,3\},\{2,3\} \right\}. A^c is the complement of A, i.e. A^c =\{1,2,3,\dots,n\}\smallsetminus A.

F_k will contain n!/((n-k)!k!) elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n = 30, F_{15} contains over 1020 elements). However, there are other, more efficient ways to calculate \Pr(K=k).

As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula

{{Cite journal

| volume = 27

| issue = 3

| pages = 123–124

| last = Shah

| first = B. K.

| title = On the distribution of the sum of independent integer valued random variables

| journal = American Statistician

| year = 1994

| jstor=2683639

}}

{{Cite journal

| volume = 81

| issue = 3

| pages = 457

| last = Chen

| first = X. H.

|author2=A. P. Dempster |author3=J. S. Liu

| title = Weighted finite population sampling to maximize entropy

| journal = Biometrika

| year = 1994

| url = http://www.people.fas.harvard.edu/~junliu/TechRept/94folder/cdl94.pdf

| doi=10.1093/biomet/81.3.457

}}

:\Pr (K=k)= \begin{cases}

\prod\limits_{i=1}^n (1-p_i) & k=0 \\

\frac{1}{k} \sum\limits_{i=1}^k (-1)^{i-1}\Pr (K=k-i)T(i) & k>0 \\

\end{cases}

where

: T(i)=\sum\limits_{j=1}^n \left( \frac{p_j}{1-p_j} \right)^i.

The recursive formula is not numerically stable, and should be avoided if n is greater than approximately 20.

An alternative is to use a divide-and-conquer algorithm: if we assume n = 2^b is a power of two, denoting by f(p_{i:j}) the Poisson binomial of p_i, \dots, p_j and * the convolution operator, we have f(p_{1:2^b}) = f(p_{1:2^{b-1}})*f(p_{2^{b-1}+1:2^b}).

More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors P_1, \dots, P_n where P_i=[1-p_i,p_i]. This observation leads to the direct convolution (DC) algorithm for computing \Pr (K=0) through \Pr (K=n) :

// PMF and nextPMF begin at index 0

function DC(p_1, \dots, p_n) is

declare new PMF array of size 1

PMF[0] = [1]

for i = 1 to n do

declare new nextPMF array of size i + 1

nextPMF[0] = (1 - p_i) * PMF[0]

nextPMF[i] = p_i * PMF[i - 1]

for k = 1 to i - 1 do

nextPMF[k] = p_i * PMF[k - 1] + (1 - p_i) * PMF[k]

repeat

PMF = nextPMF

repeat

return PMF

end function

\Pr (K=k) will be found in PMF[k]. DC is numerically stable, exact, and, when implemented as a software routine, exceptionally fast for n \leq 2000 . It can also be quite fast for larger n , depending on the distribution of the p_i .

Another possibility is using the discrete Fourier transform.

{{Cite journal

| volume = 46

| issue = 2

| pages = 803–817

| last = Fernandez

| first = M.

|author2=S. Williams

| title = Closed-Form Expression for the Poisson-Binomial Probability Density Function

| journal = IEEE Transactions on Aerospace and Electronic Systems

| year = 2010

| doi = 10.1109/TAES.2010.5461658

| bibcode = 2010ITAES..46..803F

| s2cid = 1456258

}}

:\Pr (K=k)=\frac{1}{n+1} \sum_{\ell=0}^n C^{-lk} \prod_{m=1}^n \left( 1+(C^\ell-1) p_m \right)

where C=\exp \left( \frac{2i\pi }{n+1} \right) and i=\sqrt{-1}.

Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu{{Cite journal

| volume = 7

| pages = 875–892

| last = Chen

| first = S. X.

|author2=J. S. Liu

| title = Statistical Applications of the Poisson–Binomial and conditional Bernoulli distributions

| journal = Statistica Sinica

| year = 1997

| url = http://www3.stat.sinica.edu.tw/statistica/password.asp?vol=7&num=4&art=4

}} and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.{{Cite journal |last1=Biscarri |first1=William |last2=Zhao |first2=Sihai Dave |last3=Brunner |first3=Robert J. |date=2018-06-01 |title=A simple and fast method for computing the Poisson binomial distribution function |url=https://www.sciencedirect.com/science/article/pii/S0167947318300082 |journal=Computational Statistics & Data Analysis |volume=122 |pages=92–100 |doi=10.1016/j.csda.2018.01.007 |issn=0167-9473}}

= Cumulative distribution function =

The cumulative distribution function (CDF) can be expressed as:

: \Pr(K \leq k) = \sum^{k}_{\ell=0} \sum_{A\in F_\ell} \prod_{i\in A} p_i \prod_{j\in A^c} (1-p_j),

where F_\ell is the set of all subsets of size \ell that can be selected from \{1,2,3,\ldots,n\}.

It can be computed by invoking the DC function above, and then adding elements 0 through k of the returned PMF array.

Properties

=Mean and variance=

Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:

: \mu = \sum\limits_{i=1}^n p_i

: \sigma^2 =\sum\limits_{i=1}^n (1-p_i) p_i

=Entropy=

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.{{Cite journal

| volume = 47 | issue = 5

| pages = 2039–2041

| last = Harremoës

| first = P.

| title = Binomial and Poisson distributions as maximum entropy distributions

| journal = IEEE Transactions on Information Theory

| year = 2001

| url = https://web.archive.org/web/20160116154755id_/http://yaroslavvb.com/papers/harremoes-binomial.pdf

| doi=10.1109/18.930936

}}

The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities p_1, p_2, \dots , p_n.{{Cite encyclopedia

|editor1-first = J.

|editor1-last = Gani

|editor2-first = V.K.

|editor2-last = Rohatgi

|author1-last = Shepp

|author1-first = Lawrence

|author2-last = Olkin

|author2-first = Ingram

|title = Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution

|encyclopedia = Contributions to probability: A collection of papers dedicated to Eugene Lukacs

|pages = 201–206

|publisher = Academic Press

|location = New York

|year = 1981

|mr = 0618689

|url = https://books.google.com/books?id=9qziBQAAQBAJ

|isbn = 0-12-274460-8

}} This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.{{Cite journal |last1=Hillion|first1=Erwan|last2=Johnson|first2=Oliver|date=2015-03-05|title=A proof of the Shepp–Olkin entropy concavity conjecture |journal=Bernoulli|volume = 23|issue=4B|pages = 3638–3649 | arxiv=1503.01570 |doi=10.3150/16-BEJ860|s2cid=8358662}} The Shepp–Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in p_i, if all p_i \leq 1/2. This conjecture was also proved by Hillion and Johnson, in 2019.{{Cite journal |last1=Hillion|first1=Erwan|last2=Johnson|first2=Oliver|date=2019-11-09|title=A proof of the Shepp–Olkin entropy monotonicity conjecture |journal=Electronic Journal of Probability| volume = 24 |number=126 | pages = 1–14 |doi=10.1214/19-EJP380|doi-access=free|arxiv=1810.09791}}

=Chernoff bound=

The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when s \geq \mu and for any t>0):

:

\begin{align}

\Pr[S\ge s]

&\le \exp(-st)\operatorname E\left[\exp\left[t \sum_i X_i\right]\right]

\\&= \exp(-st)\prod_i (1-p_i+e^t p_i)

\\&= \exp\left(-st + \sum_i \log\left( p_i (e^t - 1) + 1\right)\right)

\\&\le \exp\left(-st + \sum_i \log\left( \exp(p_i(e^t-1))\right)\right)

\\&= \exp\left(-st + \sum_i p_i(e^t-1)\right)

\\&= \exp\left(s-\mu-s\log\frac{s}{\mu}\right),

\end{align}

where we took t=\log\left(s/\mu\right). This is similar to the tail bounds of a binomial distribution.

Related distribution

= Approximation by binomial distribution =

A Poisson binomial distribution PB can be approximated by a binomial distribution B where \mu, the mean of the p_i, is the success probability of B. The variances of PB and B are related by the formula

: \operatorname{Var}(PB)=\operatorname{Var}(B)- \sum_{i=1}^n (p_i-\mu)^2

As can be seen, the closer the p_i are to \mu, that is, the more the p_i tend to homogeneity, the larger PB's variance. When all the p_iare equal to \mu, PB becomes B, \operatorname{Var}(PB)=\operatorname{Var}(B), and the variance is at its maximum.

Ehm has determined bounds for the total variation distance of PB and B, in effect providing bounds on the error introduced when approximating PB with B. Let \nu = 1 - \mu and d(PB,B) be the total variation distance of PB and B. Then

: d(PB,B)\le(1-\mu^{n+1}-\nu^{n+1}) \frac{\sum_{i=1}^n (p_i-\mu)^2}{((n+1)\mu\nu)}

: d(PB,B)\ge C\min\left\{\,1,\frac{1}{n\mu\nu}\,\right\} \sum_{i=1}^n (p_i-\mu)^2

where C\ge\frac{1}{124}.

d(PB,B) tends to 0 if and only if \operatorname{Var}(PB)/\operatorname{Var}(B) tends to 1.{{Cite journal |last=Ehm |first=Werner |date=1991-01-01 |title=Binomial approximation to the Poisson binomial distribution |url=https://dx.doi.org/10.1016/0167-7152%2891%2990170-V |journal=Statistics & Probability Letters |volume=11 |issue=1 |pages=7–16 |doi=10.1016/0167-7152(91)90170-V |issn=0167-7152|url-access=subscription }}

= Approximation by Poisson distribution =

A Poisson binomial distribution PB can also be approximated by a Poisson distribution Po with mean \lambda=\sum_{i=1}^n p_i

. Barbour and Hall have shown that

: \frac{1}{32}\min\left\{\,\frac{1}{\lambda},1\,\right\} \sum_{i=1}^n p_i^2\le d(PB, Po)\le\frac{1-\varepsilon^{-\lambda}} \lambda \sum_{i=1}^n p_i^2

where d(PB,B) is the total variation distance of PB and Po.{{Cite journal

| last1=Barbour | first1=A. D.

| last2=Hall | first2=Peter

| date=1984

| title=On the Rate of Poisson Convergence | url=https://www.zora.uzh.ch/id/eprint/23054/11/Barbour_1984V.pdf

| journal=Mathematical Proceedings of the Cambridge Philosophical Society

| volume=95

| issue=3

| pages=473–480

| doi=10.1017/S0305004100061806| doi-broken-date=24 December 2024

}} It can be seen that the smaller the p_i, the better Po approximates PB.

As \operatorname{Var}(Po)=\lambda=\sum_{i=1}^n p_i

and \operatorname{Var}(PB) =\sum\limits_{i=1}^n p_i-\sum\limits_{i=1}^n p_i^2, \operatorname{Var}(\mathrm{Po})>\operatorname{Var}(PB)

; so a Poisson binomial distribution's variance is bounded above by a Poisson distribution with \lambda=\sum_{i=1}^n p_i , and the smaller the p_i, the closer \operatorname{Var}(\mathrm{Po})

will be to \operatorname{Var}(PB)

.

Computational methods

The reference

{{cite journal |last1=Hong |first1=Yili |title=On computing the distribution function for the Poisson binomial distribution |journal=Computational Statistics & Data Analysis |date=March 2013 |volume=59 |pages=41–51 | doi = 10.1016/j.csda.2012.10.006}}

discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:

  • An R package [https://CRAN.R-project.org/package=poibin poibin] was provided along with the paper, which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
  • [https://github.com/tsakim/poibin poibin – Python implementation] – can compute the PMF and CDF, uses the DFT method described in the paper for doing so.

See also

{{Portal|Mathematics}}

References