Hurdle model

A hurdle model is a class of statistical models where a random variable is modelled using two parts, the first of which is the probability of attaining the value 0, and the second part models the probability of the non-zero values. The use of hurdle models is often motivated by an excess of zeroes in the data that is not sufficiently accounted for in more standard statistical models.

In a hurdle model, a random variable x is modelled as

: $\Pr (x = 0) = \theta$

: $\Pr (x \ne 0) = p_{x \ne 0}(x)$

where $p_{x \ne 0}(x)$ is a truncated probability distribution function, truncated at 0.

Hurdle models were introduced by John G. Cragg in 1971,{{cite journal |first=John G. |last=Cragg |year=1971 |jstor=1909582 |title=Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods |journal=Econometrica |volume=39 |issue=5 |pages=829–844 |doi=10.2307/1909582 }} where the non-zero values of x were modelled using a normal model, and a probit model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation hurdle model. Hurdle models were later developed for count data, with Poisson, geometric,{{cite journal |first=John |last=Mullahy |doi=10.1016/0304-4076(86)90002-3 |title=Specification and testing of some modified count data models |journal=Journal of Econometrics |volume=33 |issue=3 |year=1986 |pages=341–365 }} and negative binomial{{cite journal |first1=A. H. |last1=Welsh |first2=R. B. |last2=Cunningham |first3=C. F. |last3=Donnelly |first4=D. B. |last4=Lindenmayer |year=1996 |doi=10.1016/0304-3800(95)00113-1 |title=Modelling the abundance of rare species: statistical models for counts with extra zeros |journal=Ecological Modelling |volume=88 |issue=1–3 |pages=297–308 }} models for the non-zero counts .

Relationship with zero-inflated models

Hurdle models differ from zero-inflated models in that zero-inflated models model the zeros using a two-component mixture model. With a mixture model, the probability of the variable being zero is determined by both the main distribution function $p(x = 0)$ and the mixture weight $\pi$ . Specifically, a zero-inflated model for a random variable x is

: $\Pr (x = 0) = \pi + (1 - \pi) \times p(x = 0)$

: $\Pr (x = h_i) = (1 - \pi) \times p(x = h_i)$

where $\pi$ is the mixture weight that determines the amount of zero-inflation. A zero-inflated model can only increase the probability of $\Pr (x = 0)$ , but this is not a restriction in hurdle models.{{cite journal |first1=Yongyi |last1=Min |first2=Alan |last2=Agresti |year=2005 |doi=10.1191/1471082X05st084oa |title=Random effect models for repeated measures of zero-inflated count data |journal=Statistical Modelling |volume=5 |issue=1 |pages=1–19 |s2cid=2400918 |citeseerx=10.1.1.296.3503 }}

References

Category:Statistical models

Hurdle model

Relationship with zero-inflated models

See also

References