generalized extreme value distribution
{{Short description|Family of probability distributions}}
{{More citations needed|date=May 2020}}
{{Probability distribution
| type = density
| notation =
| parameters = (location)
(scale)
(shape)
| support = when
when
when
| pdf =
where
\exp\left(-\tfrac{x - \mu}{\sigma} \right) & \text{if } \xi = 0 \end{cases}
| cdf = for in the support (see above)
| mean =
\mu + \sigma\gamma & \text{if } \xi = 0 \\
\infty & \text{if } \xi \geq 1 \end{cases}
where (see Gamma function)
and is Euler’s constant
| median =
\mu - \sigma \ln\ln2 & \text{if } \xi = 0 \end{cases}
| mode =
\mu & \text{if } \xi = 0 \end{cases}
| variance =
\sigma^2 \, \frac{\pi^2}{6} & \text{if } \xi = 0 \\
\infty & \text{if } \xi \geq \tfrac{1}{2} \end{cases}
| skewness =
\dfrac{12 \sqrt{6} \, \zeta(3)}{\pi^3} & \text{if } \xi = 0 \end{cases}
where is the sign function
and is the Riemann zeta function
| kurtosis =
\tfrac{12}{5} & \text{if } \xi = 0\end{cases}
| entropy =
| mgf = see {{harvp|Muraleedharan|Guedes Soares|Lucas|2011}}
| char = see {{harvp|Muraleedharan|Guedes Soares|Lucas|2011}}
}}
In probability theory and statistics, the generalized extreme value (GEV) distribution
{{cite web
|last=Weisstein |first=Eric W.
|title=Extreme value distribution
|website=mathworld.wolfram.com
|url=https://mathworld.wolfram.com/ExtremeValueDistribution.html
|access-date=2021-08-06
|lang=en
}}
is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.
In some fields of application the generalized extreme value distribution is known as the Fisher–Tippett distribution, named after R.A. Fisher and L.H.C. Tippett who recognised three different forms outlined below. However usage of this name is sometimes restricted to mean the special case of the Gumbel distribution. The origin of the common functional form for all three distributions dates back to at least {{harvp|Jenkinson|1955}},
{{cite journal
|last=Jenkinson |first=Arthur F.
|title=The frequency distribution of the annual maximum (or minimum) values of meteorological elements
|journal=Quarterly Journal of the Royal Meteorological Society
|pages=158–171
|year=1955
|volume=81 |issue=348
|doi=10.1002/qj.49708134804
|bibcode=1955QJRMS..81..158J
}}
{{cite book
|last1=Haan |first1=Laurens
|last2=Ferreira |first2=Ana
|year=2007
|title=Extreme Value Theory: An introduction
|publisher=Springer
}}
it could also have been given by {{harvp|von Mises|1936}}.
{{cite journal
|last=von Mises |first=R.
|year=1936
|title=La distribution de la plus grande de {{mvar|n}} valeurs
|journal=Rev. Math. Union Interbalcanique
|volume=1 |pages=141–160
}}
Specification
Using the standardized variable , where , the location parameter, can be any real number, and is the scale parameter; the cumulative distribution function of the GEV distribution is then
:
\exp \bigl( \! - ( 1 + \xi s)^{-1/\xi} \bigr) & \text{for } \xi \neq 0 \text{ and } \xi s > -1 , \\
0 & \text{for } \xi > 0 \text{ and } s \le -\tfrac{1}{\xi} , \\
1 & \text{for } \xi < 0 \text{ and } s \ge \tfrac{1}
\xi |
where , the shape parameter, can be any real number. Thus, for , the expression is valid for , while for it is valid for . In the first case, is the negative, lower end-point, where is {{math|0}}; in the second case, is the positive, upper end-point, where is 1. For , the second expression is formally undefined and is replaced with the first expression, which is the result of taking the limit of the second, as in which case can be any real number.
In the special case of , we have , so regardless of the values of and .
The probability density function of the standardized distribution is
:
(1 + \xi s)^{-( 1 + 1/\xi )} \exp\bigl(\! -( 1 + \xi s )^{-1/\xi} \bigr) & \text{for } \xi \neq 0 \text{ and } \xi s > -1 , \\
0 & \text{otherwise;} \end{cases}
again valid for in the case , and for in the case . The density is zero outside of the relevant range. In the case , the density is positive on the whole real line.
Since the cumulative distribution function is invertible, the quantile function for the GEV distribution has an explicit expression, namely
:
\mu - \sigma \ln ( -\ln p ) & \text{for } \xi = 0 \text{ and } p \in (0, 1) , \\
\mu + \dfrac{\sigma}{\xi} \big( ( -\ln p)^{-\xi} - 1 \big) & \text{for } \xi > 0 \text{ and } p \in [0, 1) , \text{ or } \xi < 0 \text{ and } p \in (0, 1] ; \end{cases}
and therefore the quantile density function is
:
valid for and for any real .
Example of probability density functions for distributions of the GEV family.
Summary statistics
Using for where is the gamma function, some simple statistics of the distribution are given by:{{citation needed|date=May 2011}}
: for
:
:
The skewness is
:
\ \operatorname{skewness}\left( X \right) = \begin{cases}
\frac{\ \bigl(\ g_3\ -\ 3\ g_2\ g_1\ +\ 2\ g_1^3\ \bigr)\ }{ \bigl(\ g_2 - g_1^2\ \bigr)^{3/2} } \cdot \sgn(\xi) \quad ~~ \mathsf{ for }\quad \xi \ne 0\ , \\
\\
\quad \frac{\ 12\sqrt{6\ }\ \zeta(3)\ }{~ \pi^3 } \quad \approx \quad 1.14 \quad\qquad \mathsf{ for }\quad \xi = 0 ~.
\end{cases}
The excess kurtosis is:
:
Link to Fréchet, Weibull, and Gumbel families
The shape parameter governs the tail behavior of the distribution. The sub-families defined by three cases: and these correspond, respectively, to the Gumbel, Fréchet, and Weibull families, whose cumulative distribution functions are displayed below.
- Type I or Gumbel extreme value distribution, case for all
:
- Type II or Fréchet extreme value distribution, case for all
:Let and
:
- Type III or reversed Weibull extreme value distribution, case for all
:Let and
:
The subsections below remark on properties of these distributions.
= Modification for minima rather than maxima =
The theory here relates to data maxima and the distribution being discussed is an extreme value distribution for maxima. A generalised extreme value distribution for data minima can be obtained, for example by substituting for in the distribution function, and subtracting the cumulative distribution from one: That is, replace with {{nobr| .}} Doing so yields yet another family of distributions.
= Alternative convention for the Weibull distribution =
The ordinary Weibull distribution arises in reliability applications and is obtained from the distribution here by using the variable which gives a strictly positive support, in contrast to the use in the formulation of extreme value theory here. This arises because the ordinary Weibull distribution is used for cases that deal with data minima rather than data maxima. The distribution here has an addition parameter compared to the usual form of the Weibull distribution and, in addition, is reversed so that the distribution has an upper bound rather than a lower bound. Importantly, in applications of the GEV, the upper bound is unknown and so must be estimated, whereas when applying the ordinary Weibull distribution in reliability applications the lower bound is usually known to be zero.
= Ranges of the distributions =
Note the differences in the ranges of interest for the three extreme value distributions: Gumbel is unlimited, Fréchet has a lower limit, while the reversed Weibull has an upper limit.
More precisely, univariate extreme value theory describes which of the three is the limiting law according to the initial law {{mvar| X }} and in particular depending on the original distribution's tail.
= Distribution of log variables =
One can link the type I to types II and III in the following way: If the cumulative distribution function of some random variable is of type II, and with the positive numbers as support, i.e. then the cumulative distribution function of is of type I, namely Similarly, if the cumulative distribution function of is of type III, and with the negative numbers as support, i.e. then the cumulative distribution function of is of type I, namely
Link to logit models (logistic regression)
Multinomial logit models, and certain other types of logistic regression, can be phrased as latent variable models with error variables distributed as Gumbel distributions (type I generalized extreme value distributions). This phrasing is common in the theory of discrete choice models, which include logit models, probit models, and various extensions of them, and derives from the fact that the difference of two type-I GEV-distributed variables follows a logistic distribution, of which the logit function is the quantile function. The type-I GEV distribution thus plays the same role in these logit models as the normal distribution does in the corresponding probit models.
Properties
The cumulative distribution function of the generalized extreme value distribution solves the stability postulate equation.{{Citation needed|date=May 2011}} The generalized extreme value distribution is a special case of a max-stable distribution, and is a transformation of a min-stable distribution.
Applications
- The GEV distribution is widely used in the treatment of "tail risks" in fields ranging from insurance to finance. In the latter case, it has been considered as a means of assessing various financial risks via metrics such as value at risk.
{{cite report
|last=Moscadelli |first=Marco
|date=30 July 2004
|title=The modelling of operational risk: Experience with the analysis of the data collected by the Basel Committee
|ssrn=557214 |doi=10.2139/ssrn.557214
|type=non-peer reviewed article
|url=http://www.unalmed.edu.co/~ndgirald/Archivos%20Lectura/Archivos%20curso%20Riesgo%20Operativo/moscadelli%202004.pdf
|via=Archivos curso Riesgo Operativo de N.D. Girald (unalmed.edu.co)
}}
{{cite journal
|last1 = Guégan |first1 = D.
|last2 = Hassani |first2 = B.K.
|year = 2014
|title = A mathematical resurgence of risk management: An extreme modeling of expert opinions
|journal = Frontiers in Finance and Economics
|volume = 11 |issue = 1 |pages = 25–45
|ssrn = 2558747
}}
[[File:GEV Surinam.png|thumb|300px|Fitted GEV probability distribution to monthly maximum one-day rainfalls in October, Surinam
{{cite web
|title=CumFreq for probability distribution fitting
|website=waterlog.info
|url=https://www.waterlog.info/cumfreq.htm
}} See also CumFreq.
]]
- However, the resulting shape parameters have been found to lie in the range leading to undefined means and variances, which underlines the fact that reliable data analysis is often impossible.
{{cite web
|first=Kjersti |last=Aas
|date=23 January 2008
|title={{grey|[no title cited]}}
|type=lecture
|publisher=Norges teknisk-naturvitenskapelige universitet
|place=Trondheim, NO
|website=citeseerx.ist.psu.edu
|citeseerx=10.1.1.523.6456
|url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.523.6456&rep=rep1&type=pdf
|format=PDF |url-status=dead
}}
{{full citation|date=October 2024}}
- In hydrology the GEV distribution is applied to extreme events such as annual maximum one-day rainfalls and river discharges.
{{cite journal
|last1=Liu |first1=Xin
|last2=Wang |first2=Yu
|date=September 2022
|title=Quantifying annual occurrence probability of rainfall-induced landslide at a specific slope
|journal=Computers and Geotechnics
|volume=149 |page=104877
|doi=10.1016/j.compgeo.2022.104877 |doi-access=free
|bibcode=2022CGeot.14904877L
|s2cid=250232752 |lang=en
}}
The blue picture, made with CumFreq, illustrates an example of fitting the GEV distribution to ranked annually maximum one-day rainfalls showing also the 90% confidence belt based on the binomial distribution. The rainfall data are represented by plotting positions as part of the cumulative frequency analysis.
=Example for Normally distributed variables=
Let be i.i.d. normally distributed random variables with mean {{math|0}} and variance {{math|1}}.
The Fisher–Tippett–Gnedenko theorem
{{cite book
|last1=David |first1=Herbert A.
|title=Order Statistics
|last2=Nagaraja |first2=Haikady N.
|publisher=John Wiley & Sons
|year=2004
|page=299
}}
tells us that
where
\begin{align}
\mu_n &= \Phi^{-1}\left( 1 - \frac{\ 1\ }{ n } \right) \\
\sigma_n &= \Phi^{-1}\left( 1 - \frac{ 1 }{\ n\ \mathrm{e}\ } \right)- \Phi^{-1}\left(1-\frac{\ 1\ }{ n } \right) ~.
\end{align}
This allow us to estimate e.g. the mean of
from the mean of the GEV distribution:
\begin{align}
\operatorname{\mathbb E}\left\{\ \max\left\{\ X_i\ \big|\ 1 \le i \le n\ \right\}\ \right\}
& \approx \mu_n + \gamma_{\mathsf E}\ \sigma_n \\
&= (1 - \gamma_{\mathsf E})\ \Phi^{-1}\left( 1 - \frac{\ 1\ }{ n } \right) + \gamma_{\mathsf E}\ \Phi^{-1}\left( 1 - \frac{1}{\ e\ n\ } \right) \\
&= \sqrt{\log \left(\frac{ n^2 }{\ 2 \pi\ \log \left(\frac{n^2}{2\pi} \right)\ }\right) ~}\ \cdot\ \left(1 + \frac{ \gamma }{\ \log n\ } + \mathcal{o} \left(\frac{ 1 }{\ \log n\ } \right) \right)\ ,
\end{align}
where is the Euler–Mascheroni constant.
Related distributions
- If then
- If (Gumbel distribution) then
- If (Weibull distribution) then
- If then (Weibull distribution)
- If (Exponential distribution) then
- If and then (see Logistic distribution).
- If and then (The sum is not a logistic distribution).
:: Note that
=Proofs=
4. Let then the cumulative distribution of is:
:
\begin{align}
\operatorname{\mathbb P}\left\{\ \mu \left(1-\sigma\log\frac{\ X\ }{ \sigma } \right) < x\ \right\} &= \operatorname{\mathbb P}\left\{\ \log\frac{X}{\sigma} > \frac{1 - x/\mu}{\sigma}\ \right\} \\ {} \\
& \mathsf{\ Since\ the\ logarithm\ is\ always\ increasing:\ } \\ {} \\
&= \operatorname{\mathbb P}\left\{\ X > \sigma \exp\left[ \frac{1 - x/\mu}{\sigma} \right]\ \right\} \\
&= \exp\left( - \left(\cancel{\sigma} \exp\left[ \frac{1 - x/\mu}{\sigma} \right] \cdot \cancel{\frac{1}{\sigma}} \right)^\mu \right) \\
&= \exp\left( - \left( \exp\left[ \frac{\cancelto{\mu}{1} - x/\cancel{\mu}}{\sigma} \right] \right)^\cancel{\mu} \right) \\
&= \exp\left( - \exp\left[ \frac{\mu - x}{\sigma} \right] \right) \\
&= \exp\left( - \exp\left[ - s \right] \right), \quad s = \frac{x - \mu}{\sigma}\ ,
\end{align}
:which is the cdf for
5. Let then the cumulative distribution of is:
:
\begin{align}
\operatorname{\mathbb P}\left\{\ \mu - \sigma \log X < x\ \right\} &= \operatorname{\mathbb P}\left\{\ \log X > \frac{\mu - x}{\sigma}\ \right\} \\ {} \\
& \mathsf{\ Since\ the\ logarithm\ is\ always\ increasing:\ } \\ {} \\
&= \operatorname{\mathbb P}\left\{\ X > \exp\left( \frac{\ \mu - x\ }{ \sigma } \right)\ \right\} \\
&= \exp\left[- \exp\left( \frac{\ \mu - x\ }{ \sigma } \right) \right] \\
&= \exp\left[- \exp(-s) \right]\ , \quad ~\mathsf{ where }~ \quad s \equiv \frac{x - \mu}{\sigma}\ ;
\end{align}
:which is the cumulative distribution of
See also
- Extreme value theory (univariate theory)
- Fisher–Tippett–Gnedenko theorem
- Generalized Pareto distribution
- German tank problem, opposite question of population maximum given sample maximum
- Pickands–Balkema–De Haan theorem
References
{{reflist|refs=
{{cite book
|last1=Muraleedharan |first1=G.
|last2=Guedes Soares |first2=C.
|last3=Lucas |first3=Cláudia
|year=2011
|section=Characteristic and moment generating functions of generalised extreme value distribution (GEV)
|editor-first=Linda L. |editor-last=Wright
|title=Sea Level Rise, Coastal Engineering, Shorelines, and Tides
|at=Chapter 14, pp. 269–276
|publisher=Nova Science Publishers
|isbn=978-1-61728-655-1
}}
|last1 = Norton
|first1 = Matthew
|last2 = Khokhlov
|first2 = Valentyn
|last3 = Uryasev
|first3 = Stan
|year = 2019
|title = Calculating CVaR and bPOE for common probability distributions with application to portfolio optimization and density estimation
|journal = Annals of Operations Research
|volume = 299
|issue = 1–2
|pages = 1281–1315
|publisher = Springer
|doi = 10.1007/s10479-019-03373-1
|arxiv = 1811.11301
|s2cid = 254231768
|url = http://uryasev.ams.stonybrook.edu/wp-content/uploads/2019/10/Norton2019_CVaR_bPOE.pdf
|access-date = 2023-02-27
|archive-date = 2023-03-31
|archive-url = https://web.archive.org/web/20230331230821/http://uryasev.ams.stonybrook.edu/wp-content/uploads/2019/10/Norton2019_CVaR_bPOE.pdf
|url-status = dead
}}
}}
Further reading
{{refbegin|25em|small=yes}}
- {{cite book
|last1=Embrechts |first1=Paul
|last2=Klüppelberg |first2=Claudia |author2-link= Claudia Klüppelberg
|last3=Mikosch |first3=Thomas
|title=Modelling Extremal Events for Insurance and Finance
|year=1997
|location=Berlin, DE
|publisher=Springer Verlag
|isbn=9783540609315
|url=https://books.google.com/books?id=BXOI2pICfJUC |via=Google books
}}
- {{cite book
| last1=Leadbetter | first1=M.R.
| last2=Lindgren | first2=G.
| last3=Rootzén | first3=H.
| year=1983
| title=Extremes and Related Properties of Random Sequences and Processes
| publisher=Springer-Verlag
| isbn = 0-387-90731-9
}}
- {{cite book
| last=Resnick | first=S.I.
| year=1987
| title=Extreme Values, Regular Variation, and Point Processes
| publisher=Springer-Verlag
| isbn = 0-387-96481-9
}}
- {{cite book
| last=Coles | first=Stuart
| year=2001
| title=An Introduction to Statistical Modeling of Extreme Values
| url = https://books.google.com/books?id=2nugUEaKqFEC&pg=PP1
| publisher=Springer-Verlag
| isbn = 1-85233-459-2
}}
{{refend}}
{{ProbDistributions|continuous-variable}}
Category:Continuous distributions