Cox's theorem

{{Short description|Derivation of the laws of probability theory}}

{{Bayesian statistics}}

Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates.{{Cite journal | last = Cox | first = R. T. | author-link = Richard Threlkeld Cox| doi = 10.1119/1.1990764 | title = Probability, Frequency and Reasonable Expectation | journal = American Journal of Physics | volume = 14 | pages = 1–10 | year = 1946 | issue = 1 | bibcode = 1946AmJPh..14....1C }}{{cite book|first=R. T. |last=Cox |author-link=Richard Threlkeld Cox |title=The Algebra of Probable Inference |publisher=Johns Hopkins University Press |location=Baltimore, MD |year=1961 }} This derivation justifies the so-called "logical" interpretation of probability, as the laws of probability derived by Cox's theorem are applicable to any proposition. Logical (also known as objective Bayesian) probability is a type of Bayesian probability. Other forms of Bayesianism, such as the subjective interpretation, are given other justifications.

Cox's assumptions

Cox wanted his system to satisfy the following conditions:

  1. Divisibility and comparability – The plausibility of a proposition is a real number and is dependent on information we have related to the proposition.
  2. Common sense – Plausibilities should vary sensibly with the assessment of plausibilities in the model.
  3. Consistency – If the plausibility of a proposition can be derived in many ways, all the results must be equal.

The postulates as stated here are taken from Arnborg and Sjödin.Stefan Arnborg and Gunnar Sjödin, On the foundations of Bayesianism, Preprint: Nada, KTH (1999) — http://www.stats.org.uk/cox-theorems/ArnborgSjodin2001.pdfStefan Arnborg and Gunnar Sjödin, A note on the foundations of Bayesianism, Preprint: Nada, KTH (2000a) — http://www.stats.org.uk/bayesian/ArnborgSjodin1999.pdfStefan Arnborg and Gunnar Sjödin, "Bayes rules in finite models," in European Conference on Artificial Intelligence, Berlin, (2000b) — https://frontiersinai.com/ecai/ecai2000/pdf/p0571.pdf

"Common sense" includes consistency with Aristotelian logic in the sense that logically equivalent propositions shall have the same plausibility.

The postulates as originally stated by Cox were not mathematically

rigorous (although more so than the informal description above),

as noted by Halpern.Joseph Y. Halpern, "A counterexample to theorems of Cox and Fine," Journal of AI research, 10, 67–85 (1999) — http://www.jair.org/media/536/live-536-2054-jair.ps.Z {{Webarchive|url=https://web.archive.org/web/20151125021821/http://www.jair.org/media/536/live-536-2054-jair.ps.Z |date=2015-11-25 }}Joseph Y. Halpern, "Technical Addendum, Cox's theorem Revisited," Journal of AI research, 11, 429–435 (1999) — http://www.jair.org/media/644/live-644-1840-jair.ps.Z {{Webarchive|url=https://web.archive.org/web/20151125022616/http://www.jair.org/media/644/live-644-1840-jair.ps.Z |date=2015-11-25 }} However it appears to be possible

to augment them with various mathematical assumptions made either

implicitly or explicitly by Cox to produce a valid proof.

Cox's notation:

:The plausibility of a proposition A given some related information X is denoted by A\mid X.

Cox's postulates and functional equations are:

  • The plausibility of the conjunction AB of two propositions A, B, given some related information X, is determined by the plausibility of A given X and that of B given AX.

:In form of a functional equation

::AB\mid X=g(A\mid X,B\mid AX)

:Because of the associative nature of the conjunction in propositional logic, the consistency with logic gives a functional equation saying that the function g is an associative binary operation.

  • Additionally, Cox postulates the function g to be monotonic.

:All strictly increasing associative binary operations on the real numbers are isomorphic to multiplication of numbers in a subinterval of {{closed-closed|0, +∞}}, which means that there is a monotonic function w mapping plausibilities to {{closed-closed|0, +∞}} such that

::w(AB\mid X)=w(A\mid X)w(B\mid AX)

  • In case A given X is certain, we have AB\mid X=B\mid X and B\mid AX=B\mid X due to the requirement of consistency. The general equation then leads to

:w(B\mid X)=w(A\mid X)w(B\mid X)

:This shall hold for any proposition B, which leads to

::w(A\mid X)=1

  • In case A given X is impossible, we have AB\mid X=A\mid X and A\mid BX=A\mid X due to the requirement of consistency. The general equation (with the A and B factors switched) then leads to

:w(A\mid X)=w(B\mid X)w(A\mid X)

:This shall hold for any proposition B, which, without loss of generality, leads to a solution

::w(A\mid X)=0

::Due to the requirement of monotonicity, this means that w maps plausibilities to interval {{closed-closed|0, 1}}.

  • The plausibility of a proposition determines the plausibility of the proposition's negation.

:This postulates the existence of a function f such that

::w(\text{not } A\mid X)=f(w(A\mid X))

:Because "a double negative is an affirmative", consistency with logic gives a functional equation

::f(f(x))=x,

:saying that the function f is an involution, i.e., it is its own inverse.

  • Furthermore, Cox postulates the function f to be monotonic.

:The above functional equations and consistency with logic imply that

::w(AB\mid X)=w(A\mid X)f(w(\text{not }B\mid AX))=w(A\mid X)f\left( {w(A\text{ not }B\mid X) \over w(A\mid X)} \right)

:Since AB is logically equivalent to BA, we also get

::w(A\mid X)f\left( {w(A\text{ not }B\mid X) \over w(A\mid X)} \right)=w(B\mid X)f\left( {w(B\text{ not }A\mid X) \over w(B\mid X)} \right)

:If, in particular, B=\text{ not }(AD), then also A\text{ not } B = \text{not }B and B\text{ not }A=\text{ not }A and we get

::w(A\text{ not }B\mid X)=w(\text{not }B\mid X)=f(w(B\mid X))

:and

::w(B\text{ not }A\mid X)=w(\text{not }A\mid X)=f(w(A\mid X))

:Abbreviating w(A\mid X)=x and w(B\mid X)=y we get the functional equation

::x\,f\left({f(y) \over x}\right)=y\,f\left({f(x) \over y}\right)

Implications of Cox's postulates

The laws of probability derivable from these postulates are the following.Edwin Thompson Jaynes, Probability Theory: The Logic of Science, Cambridge University Press (2003). — preprint version (1996) at {{cite web

|url=http://omega.albany.edu:8008/JaynesBook.html

|title=Archived copy

|access-date=2016-01-19

|url-status=dead

|archive-url=https://web.archive.org/web/20160119131820/http://omega.albany.edu:8008/JaynesBook.html

|archive-date=2016-01-19

}}; Chapters 1 to 3 of published version at http://bayes.wustl.edu/etj/prob/book.pdf

Let A\mid B be the plausibility of the proposition A given B satisfying Cox's postulates. Then there is a function w mapping plausibilities to interval [0,1] and a positive number m such that

  1. Certainty is represented by w(A\mid B)=1.
  2. w^m(A|B)+w^m(\text{not }A\mid B)=1.
  3. w(AB\mid C)=w(A\mid C)w(B\mid AC)=w(B\mid C)w(A\mid BC).

It is important to note that the postulates imply only these general properties. We may recover the usual laws of probability by setting a new function, conventionally denoted P or \Pr, equal to w^m. Then we obtain the laws of probability in a more familiar form:

  1. Certain truth is represented by \Pr(A\mid B)=1, and certain falsehood by \Pr(A\mid B)=0.
  2. \Pr(A\mid B)+\Pr(\text{not }A\mid B)=1.
  3. \Pr(AB\mid C)=\Pr(A\mid C)\Pr(B\mid AC)=\Pr(B\mid C)\Pr(A\mid BC).

Rule 2 is a rule for negation, and rule 3 is a rule for conjunction. Given that any proposition containing conjunction, disjunction, and negation can be equivalently rephrased using conjunction and negation alone (the conjunctive normal form), we can now handle any compound proposition.

The laws thus derived yield finite additivity of probability, but not countable additivity. The measure-theoretic formulation of Kolmogorov assumes that a probability measure is countably additive. This slightly stronger condition is necessary for certain results. An elementary example (in which this assumption merely simplifies the calculation rather than being necessary for it) is that the probability of seeing heads for the first time after an even number of flips in a sequence of coin flips is \tfrac13.{{citation

| last = Price | first = David T.

| doi = 10.2307/2319450

| journal = American Mathematical Monthly

| jstor = 2319450

| mr = 350798

| pages = 886–889

| title = Countable additivity for probability measures

| volume = 81

| year = 1974| issue = 8

}}

Interpretation and further discussion

Cox's theorem has come to be used as one of the justifications for the use of Bayesian probability theory. For example, in Jaynes it is discussed in detail in chapters 1 and 2 and is a cornerstone for the rest of the book. Probability is interpreted as a formal system of logic, the natural extension of Aristotelian logic (in which every statement is either true or false) into the realm of reasoning in the presence of uncertainty.

It has been debated to what degree the theorem excludes alternative models for reasoning about uncertainty. For example, if certain "unintuitive" mathematical assumptions were dropped then alternatives could be devised, e.g., an example provided by Halpern. However Arnborg and Sjödin suggest additional

"common sense" postulates, which would allow the assumptions to be relaxed in some cases while still ruling out the Halpern example. Other approaches were devised by HardyMichael Hardy, "Scaled Boolean algebras", [http://www.sciencedirect.com/science/journal/01968858 Advances in Applied Mathematics], August 2002, pages 243–292 (or [https://arxiv.org/abs/math.PR/0203249 preprint]); Hardy has said, "I assert there that I think Cox's assumptions are too strong, although I don't really say why. I do say what I would replace them with." (The quote is from a Wikipedia discussion page, not from the article.) or Dupré and Tipler.Dupré, Maurice J. & Tipler, Frank J. (2009). [http://projecteuclid.org/download/pdf_1/euclid.ba/1340369856 "New Axioms for Rigorous Bayesian Probability"], Bayesian Analysis, 4(3): 599-606.

The original formulation of Cox's theorem is in {{Harvtxt|Cox|1946}}, which is extended with additional results and more discussion in {{Harvtxt|Cox|1961}}. Jaynes cites AbelNiels Henrik Abel "Untersuchung der Functionen zweier unabhängig veränderlichen Gröszen x und y, wie f(x, y), welche die Eigenschaft haben, dasz f[z, f(x,y)] eine symmetrische Function von z, x und y ist.", Jour. Reine u. angew. Math. (Crelle's Jour.), 1, 11–15, (1826). for the first known use of the associativity functional equation. János AczélJános Aczél, Lectures on Functional Equations and their Applications, Academic Press, New York, (1966). provides a long proof of the "associativity equation" (pages 256-267). Jaynes{{rp|27}} reproduces the shorter proof by Cox in which differentiability is assumed. A guide to Cox's theorem by Van Horn aims at comprehensively introducing the reader to all these references.{{Cite journal | last1 = Van Horn | first1 = K. S. | doi = 10.1016/S0888-613X(03)00051-3 | title = Constructing a logic of plausible inference: A guide to Cox's theorem | journal = International Journal of Approximate Reasoning | volume = 34 | pages = 3–24 | year = 2003 | doi-access = }}

Baoding Liu, the founder of uncertainty theory, criticizes Cox's theorem for presuming that the truth value of conjunction P \land Q is a twice differentiable function f of truth values of the two propositions P and Q, i.e., T(P \land Q) = f(T(P), T(Q)), which excludes uncertainty theory's "uncertain measure" from its start, because the function f(x, y) = x \land y,{{refn|group=lower-alpha|Liu uses the symbol ∧ as the "minimum operator", most likely referring to a binary operation that takes two numbers and returns the smaller (or minimum) of the two.}} used in uncertainty theory, is not differentiable with respect to x and y.{{Cite book |last=Liu |first=Baoding |title=Uncertainty Theory |date=2015 |publisher=Springer Berlin Heidelberg : Imprint: Springer |isbn=978-3-662-44354-5 |edition=4th ed. 2015 |series=Springer Uncertainty Research |location=Berlin, Heidelberg |pages=459–460}} According to Liu, "there does not exist any evidence that the truth value of conjunction is completely determined by the truth values of individual propositions, let alone a twice differentiable function."

See also

Notes

{{reflist|group=lower-alpha}}

References

{{Reflist}}

Further reading

  • {{cite book |first=Terrence L. |last=Fine |authorlink=Terrence L. Fine |title=Theories of Probability : An examination of foundations |publisher=Academic Press |location=New York |year=1973 |isbn=0-12-256450-2 }}
  • {{cite book |first1=C. Ray |last1=Smith |first2=Gary |last2=Erickson |chapter=From Rationality and Consistency to Bayesian Probability |pages=29–44 |title=Maximum Entropy and Bayesian Methods |editor-first=John |editor-last=Skilling |location=Dordrecht |publisher=Kluwer |year=1989 |isbn=0-7923-0224-9 |doi=10.1007/978-94-015-7860-8_2 }}

Category:Theorems in probability theory

Category:Probability interpretations

Category:Theorems in statistics