Continuous Bernoulli distribution

{{Probability distribution

| name = Continuous Bernoulli distribution

| type = density

| notation = $\mathcal{CB}(\lambda)$

| parameters = $\lambda \in (0,1)$

| support = $x \in [0, 1]$

| pdf = $C(\lambda) \lambda^x (1-\lambda)^{1-x}\!$
where $C(\lambda) = \begin{cases} 2 &\text{if } \lambda = \frac{1}{2}\\ \frac{2 \tanh^{-1}(1-2\lambda)}{1-2\lambda} &\text{ otherwise} \end{cases}$

| cdf = $\begin{cases} x &\text{ if } \lambda = \frac{1}{2} \\ \frac{\lambda^x (1-\lambda)^{1-x} + \lambda - 1}{2\lambda - 1} &\text{ otherwise} \end{cases}\!$

| mean = $\operatorname{E}[X] = \begin{cases} \frac{1}{2} &\text{ if } \lambda = \frac{1}{2} \\ \frac{\lambda}{2\lambda - 1} + \frac{1}{2 \tanh^{-1}(1-2\lambda)} &\text{ otherwise} \end{cases}\!$

| variance = $\operatorname{var}[X] = \begin{cases} \frac{1}{12} &\text{ if } \lambda = \frac{1}{2} \\ -\frac{(1-\lambda) \lambda}{(1-2\lambda)^2} + \frac{1}{(2 \tanh^{-1}(1-2\lambda))^2} &\text{ otherwise} \end{cases}\!$

}}

In probability theory, statistics, and machine learning, the continuous Bernoulli distributionLoaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulliTensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli {{Webarchive|url=https://web.archive.org/web/20201125001136/https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli |date=2020-11-25 }} is a family of continuous probability distributions parameterized by a single shape parameter $\lambda \in (0, 1)$ , defined on the unit interval $x \in [0, 1]$ , by:

: $p(x | \lambda) \propto \lambda^x (1-\lambda)^{1-x}.$

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19). for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $[0,1]$ -valued data.Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, $\{0,1\}$ -valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $\eta = \log\left(\lambda/(1-\lambda)\right)$ for the natural parameter, the density can be rewritten in canonical form:

$p(x | \eta) \propto \exp (\eta x)$ .

Statistical inference

Given a sample of $N$ points $x_1,\dots,x_n$ with $x_i\in[0,1]\,\forall i$ , the maximum likelihood estimator of $\lambda$ is the empirical mean,

: $\hat{\lambda}=\bar{x}=\frac{1}{N}\sum_{i=1}^nx_i.$

Equivalently, the estimator for the natural parameter $\eta$ is the logit of $\bar{x}$ ,

: $\hat{\eta}=\text{logit}(\bar{x})=\log(\bar{x}/(1-\bar{x})).$

Related distributions

= Bernoulli distribution =

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set $\{0,1\}$ by the probability mass function:

: $p(x) = p^x (1-p)^{1-x},$

where $p$ is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $[0,1]$ results in the continuous Bernoulli probability density function, up to a normalizing constant.

= Beta distribution =

The Beta distribution has the density function:

: $p(x) \propto x^{\alpha - 1} (1-x)^{\beta - 1},$

which can be re-written as:

: $p(x) \propto x_1^{\alpha_1 - 1} x_2^{\alpha_2 - 1},$

where $\alpha_1, \alpha_2$ are positive scalar parameters, and $(x_1, x_2)$ represents an arbitrary point inside the 1-simplex, $\Delta^{1} = \{ (x_1, x_2): x_1 > 0, x_2 > 0, x_1 + x_2 = 1 \}$ . Switching the role of the parameter and the argument in this density function, we obtain:

: $p(x) \propto \alpha_1^{x_1} \alpha_2^{x_2}.$

This family is only identifiable up to the linear constraint $\alpha_1 + \alpha_2 = 1$ , whence we obtain:

: $p(x) \propto \lambda^{x_1} (1-\lambda)^{x_2},$

corresponding exactly to the continuous Bernoulli density.

= Exponential distribution =

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate{{which|date=November 2022}} parameter.

= Continuous categorical distribution =

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

References

Category:Continuous distributions

Category:Exponential family distributions