Notation in probability and statistics

{{Short description|none}}

{{ProbabilityTopicsTOC}}

{{StatsTopicTOC}}

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

Probability theory

{{Unreferenced section|date=March 2021}}

  • Random variables are usually written in upper case Roman letters, such as X or Y and so on. Random variables, in this context, usually refer to something in words, such as "the height of a subject" for a continuous variable, or "the number of cars in the school car park" for a discrete variable, or "the colour of the next bicycle" for a categorical variable. They do not represent a single number or a single category. For instance, if P(X = x) is written, then it represents the probability that a particular realisation of a random variable (e.g., height, number of cars, or bicycle colour), X, would be equal to a particular value or category (e.g., 1.735 m, 52, or purple), x. It is important that X and x are not confused into meaning the same thing. X is an idea, x is a value. Clearly they are related, but they do not have identical meanings.
  • Particular realisations of a random variable are written in corresponding lower case letters. For example, x_1,x_2, \ldots,x_n could be a sample corresponding to the random variable X. A cumulative probability is formally written P(X\le x) to distinguish the random variable from its realization.{{Cite web |date=2021-08-09 |title=Calculating Probabilities from Cumulative Distribution Function |url=https://analystprep.com/cfa-level-1-exam/quantitative-methods/calculating-probabilities-from-cumulative-distribution-function/ |access-date=2024-02-26}}
  • The probability is sometimes written \mathbb{P} to distinguish it from other functions and measure P to avoid having to define "P is a probability" and \mathbb{P}(X\in A) is short for P(\{\omega \in\Omega: X(\omega) \in A\}), where \Omega is the event space, X is a random variable that is a function of \omega (i.e., it depends upon \omega), and \omega is some outcome of interest within the domain specified by \Omega (say, a particular height, or a particular colour of a car). \Pr(A) notation is used alternatively.
  • \mathbb{P}(A \cap B) or \mathbb{P}[B \cap A] indicates the probability that events A and B both occur. The joint probability distribution of random variables X and Y is denoted as P(X, Y), while joint probability mass function or probability density function as f(x, y) and joint cumulative distribution function as F(x, y).
  • \mathbb{P}(A \cup B) or \mathbb{P}[B \cup A] indicates the probability of either event A or event B occurring ("or" in this case means one or the other or both).
  • σ-algebras are usually written with uppercase calligraphic (e.g. \mathcal F for the set of sets on which we define the probability P)
  • Probability density functions (pdfs) and probability mass functions are denoted by lowercase letters, e.g. f(x), or f_X(x).
  • Cumulative distribution functions (cdfs) are denoted by uppercase letters, e.g. F(x), or F_X(x).
  • Survival functions or complementary cumulative distribution functions are often denoted by placing an overbar over the symbol for the cumulative:\overline{F}(x) =1-F(x), or denoted as S(x),
  • In particular, the pdf of the standard normal distribution is denoted by \varphi(z), and its cdf by \Phi(z).
  • Some common operators:

:* \mathrm{E}[X] : expected value of X

:* \operatorname{var}[X] : variance of X

:* \operatorname{cov}[X,Y] : covariance of X and Y

  • X is independent of Y is often written X \perp Y or X \perp\!\!\!\perp Y, and X is independent of Y given W is often written

:X \perp\!\!\!\perp Y \,|\, W or

:X \perp Y \,|\, W

  • \textstyle P(A\mid B), the conditional probability, is the probability of \textstyle A given \textstyle B {{Citation |title=Probability and stochastic processes |date=2013-07-22 |url=http://dx.doi.org/10.1201/b15257-3 |work=Applied Stochastic Processes |pages=9–36 |access-date=2023-12-08 |publisher=Chapman and Hall/CRC |doi=10.1201/b15257-3 |isbn=978-0-429-16812-3}}

Statistics

{{Unreferenced section|date=March 2021}}

  • Greek letters (e.g. θ, β) are commonly used to denote unknown parameters (population parameters).{{Cite web |date=1999-02-13 |title=Letters of the Greek Alphabet and Some of Their Statistical Uses |url=https://lesn.appstate.edu/olson/EDL7150/Components/Other%20useful%20links/Greek%20Alphabet%20and%20Statistics.htm |access-date=2024-02-26 |website=les.appstate.edu/}}
  • A tilde (~) denotes "has the probability distribution of".
  • Placing a hat, or caret (also known as a circumflex), over a true parameter denotes an estimator of it, e.g., \widehat{\theta} is an estimator for \theta.
  • The arithmetic mean of a series of values x_1,x_2, \ldots,x_n is often denoted by placing an "overbar" over the symbol, e.g. \bar{x}, pronounced "x bar".
  • Some commonly used symbols for sample statistics are given below:
  • the sample mean \bar{x},
  • the sample variance s^2,
  • the sample standard deviation s,
  • the sample correlation coefficient r,
  • the sample cumulants k_r.
  • Some commonly used symbols for population parameters are given below:
  • the population mean \mu,
  • the population variance \sigma^2,
  • the population standard deviation \sigma,
  • the population correlation \rho,
  • the population cumulants \kappa_r,
  • x_{(k)} is used for the k^\text{th} order statistic, where x_{(1)} is the sample minimum and x_{(n)} is the sample maximum from a total sample size n.{{Cite web |title=Order Statistics |url=https://www.colorado.edu/amath/sites/default/files/attached-files/order_stats.pdf |access-date=2024-02-26 |website=colorado.edu}}

Critical values

{{Unreferenced section|date=March 2021}}

The α-level upper critical value of a probability distribution is the value exceeded with probability \alpha, that is, the value x_\alpha such that F(x_\alpha) = 1-\alpha, where F is the cumulative distribution function. There are standard notations for the upper critical values of some commonly used distributions in statistics:

Linear algebra

{{Unreferenced section|date=March 2021}}

  • Matrices are usually denoted by boldface capital letters, e.g. \bold{A}.
  • Column vectors are usually denoted by boldface lowercase letters, e.g. \bold{x}.
  • The transpose operator is denoted by either a superscript T (e.g. \bold{A}^\mathrm{T}) or a prime symbol (e.g. \bold{A}').
  • A row vector is written as the transpose of a column vector, e.g. \bold{x}^\mathrm{T} or \bold{x}'.

Abbreviations

{{Unreferenced section|date=March 2021}}

Common abbreviations include:

See also

References

{{Reflist}}

{{refbegin}}

  • {{Citation |title = Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation |first1 = Max |last1 = Halperin |first2 = H. O. |last2 = Hartley |first3 = P. G. |last3 = Hoel |journal = The American Statistician |volume = 19 |year = 1965 |pages = 12–14 |issue = 3 |doi = 10.2307/2681417 |jstor = 2681417 }}

{{refend}}