tobit model

{{Short description|Statistical model for censored regressands}}

In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way.{{cite book |first=Fumio |last=Hayashi |author-link=Fumio Hayashi |title=Econometrics |url=https://archive.org/details/econometrics00haya_012 |url-access=limited |location=Princeton |publisher=Princeton University Press |year=2000 |isbn=0-691-01018-8 |pages=[https://archive.org/details/econometrics00haya_012/page/n534 518]–521 }} The term was coined by Arthur Goldberger in reference to James Tobin,{{cite book |first=Arthur S. |last=Goldberger |title=Econometric Theory |url=https://archive.org/details/econometrictheor0000gold |url-access=registration |location=New York |publisher=J. Wiley |year=1964 |pages=[https://archive.org/details/econometrictheor0000gold/page/253 253–55] |isbn=9780471311010 }}{{efn|When asked why it was called the "tobit" model, instead of Tobin, James Tobin explained that this term was introduced by Arthur Goldberger, either as a portmanteau of "Tobin's probit", or as a reference to the novel The Caine Mutiny, a novel by Tobin's friend Herman Wouk, in which Tobin makes a cameo as "Mr Tobit". Tobin reports having actually asked Goldberger which it was, and the man refused to say. See {{cite journal |first=Robert J. |last=Shiller |title=The ET Interview: Professor James Tobin |journal=Econometric Theory |volume=15 |issue=6 |year=1999 |pages=867–900 |doi=10.1017/S0266466699156056 |s2cid=122574727 }} }} who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods.{{Cite journal |last=Tobin |first=James |year=1958 |title=Estimation of Relationships for Limited Dependent Variables |journal=Econometrica |volume=26 |issue=1 |pages=24–36 |jstor=1907382 |doi=10.2307/1907382 |url=http://cowles.yale.edu/sites/default/files/files/pub/d00/d0003-r.pdf }}{{efn|An almost identical model was independently suggested by Anders Hald in 1949, see {{cite journal |first=A. |last=Hald |year=1949 |title=Maximum Likelihood Estimation of the Parameters of a Normal Distribution which is Truncated at a Known Point |journal=Scandinavian Actuarial Journal |volume=49 |issue=4 |pages=119–134 |doi=10.1080/03461238.1949.10419767 }} }} Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples,{{efn|A sample (y_{i}, \mathbf{x}_{i}) is {{em|censored}} in y_{i} when \mathbf{x}_{i} is observed for all observations i = 1, 2, \ldots, n, but the true value of y_{i} is known only for a restricted range of observations. If the sample is {{em|truncated}}, both \mathbf{x}_{i} and y_{i} are only observed if y_{i} falls in the restricted range. See {{cite book |first=Richard |last=Breen |title=Regression Models : Censored, Samples Selected, or Truncated Data |location=Thousand Oaks |publisher=Sage |year=1996 |isbn=0-8039-5710-6 |pages=2–4 |url=https://books.google.com/books?id=btrvKnZSqIIC&pg=PA4 }} }} some authors adopt a broader definition of the tobit model that includes these cases.{{Cite journal |last=Amemiya |first=Takeshi |author-link=Takeshi Amemiya |year=1984 |title=Tobit Models: A Survey |journal=Journal of Econometrics |volume=24 |issue=1–2 |pages=3–61 |doi=10.1016/0304-4076(84)90074-5 }}

Tobin's idea was to modify the likelihood function so that it reflects the unequal sampling probability for each observation depending on whether the latent dependent variable fell above or below the determined threshold.{{cite book |first=Peter |last=Kennedy |author-link=Peter Kennedy (economist) |title=A Guide to Econometrics |location=Cambridge |publisher=MIT Press |edition=Fifth |year=2003 |isbn=0-262-61183-X |pages=283–284 }} For a sample that, as in Tobin's original case, was censored from below at zero, the sampling probability for each non-limit observation is simply the height of the appropriate density function. For any limit observation, it is the cumulative distribution, i.e. the integral below zero of the appropriate density function. The tobit likelihood function is thus a mixture of densities and cumulative distribution functions.{{cite book |first=Herman J. |last=Bierens |title=Introduction to the Mathematical and Statistical Foundations of Econometrics |url=https://archive.org/details/introductiontoma00bier_187 |url-access=limited |publisher=Cambridge University Press |year=2004 |page=[https://archive.org/details/introductiontoma00bier_187/page/n229 207] }}

The likelihood function

Below are the likelihood and log likelihood functions for a type I tobit. This is a tobit that is censored from below at y_L when the latent variable y_j^* \leq y_L . In writing out the likelihood function, we first define an indicator function I :

: I(y) = \begin{cases}

0 & \text{if } y \leq y_L, \\

1 & \text{if } y > y_L.

\end{cases}

Next, let \Phi be the standard normal cumulative distribution function and \varphi to be the standard normal probability density function. For a data set with N observations the likelihood function for a type I tobit is

: \mathcal{L}(\beta, \sigma) = \prod _{j=1}^N \left(\frac{1}{\sigma}\varphi \left(\frac{y_j-X_j\beta }{\sigma} \right)\right)^{I(y_j)} \left(1-\Phi \left(\frac{X_j\beta-y_L}{\sigma}\right)\right)^{1-I(y_j)}

and the log likelihood is given by

:\begin{align}

\log \mathcal{L}(\beta, \sigma) &= \sum^n_{j = 1} I(y_j) \log \left( \frac{1}{\sigma} \varphi\left( \frac{y_j - X_j\beta}{\sigma} \right) \right) + (1 - I(y_j)) \log\left( 1- \Phi\left( \frac{X_j \beta - y_L}{\sigma} \right) \right) \\

&= \sum_{y_j>y_L} \log \left( \frac{1}{\sigma} \varphi\left( \frac{y_j - X_j\beta}{\sigma} \right) \right) + \sum_{y_j=y_L} \log\left( \Phi\left( \frac{ y_L - X_j \beta}{\sigma} \right) \right)

\end{align}

=Reparametrization=

The log-likelihood as stated above is not globally concave, which complicates the maximum likelihood estimation. Olsen suggested the simple reparametrization \beta = \delta/\gamma and \sigma^2 = \gamma^{-2}, resulting in a transformed log-likelihood,

:\log \mathcal{L}(\delta, \gamma) = \sum_{y_j>y_L} \left\{ \log \gamma + \log \left[ \varphi\left( \gamma y_j - X_j \delta \right) \right] \right\} + \sum_{y_j=y_L} \log\left[ \Phi\left( \gamma y_L - X_j \delta \right) \right]

which is globally concave in terms of the transformed parameters.{{cite journal |first=Randall J. |last=Olsen |title=Note on the Uniqueness of the Maximum Likelihood Estimator for the Tobit Model |journal=Econometrica |volume=46 |issue=5 |year=1978 |pages=1211–1215 |doi=10.2307/1911445 |jstor=1911445 }}

For the truncated (tobit II) model, Orme showed that while the log-likelihood is not globally concave, it is concave at any stationary point under the above transformation.{{cite journal |first=Chris |last=Orme |title=On the Uniqueness of the Maximum Likelihood Estimator in Truncated Regression Models |journal=Econometric Reviews |volume=8 |year=1989 |issue=2 |pages=217–222 |doi=10.1080/07474938908800171 }}{{cite journal |first=Shigeru |last=Iwata |title=A Note on Multiple Roots of the Tobit Log Likelihood |journal=Journal of Econometrics |volume=56 |issue=3 |year=1993 |pages=441–445 |doi=10.1016/0304-4076(93)90129-S }}

=Consistency=

If the relationship parameter \beta is estimated by regressing the observed y_i on x_i , the resulting ordinary least squares regression estimator is inconsistent. It will yield a downwards-biased estimate of the slope coefficient and an upward-biased estimate of the intercept. Takeshi Amemiya (1973) has proven that the maximum likelihood estimator suggested by Tobin for this model is consistent.{{Cite journal |last=Amemiya |first=Takeshi |year=1973 |title=Regression analysis when the dependent variable is truncated normal |journal=Econometrica |volume=41 |issue=6 |pages=997–1016 |jstor=1914031 |doi=10.2307/1914031 }}

=Interpretation=

The \beta coefficient should not be interpreted as the effect of x_i on y_i, as one would with a linear regression model; this is a common error. Instead, it should be interpreted as the combination of

(1) the change in y_i of those above the limit, weighted by the probability of being above the limit; and

(2) the change in the probability of being above the limit, weighted by the expected value of y_i if above.{{Cite journal

|last1=McDonald |first1=John F.

|last2=Moffit |first2=Robert A.

|year=1980

|title=The Uses of Tobit Analysis

|journal=The Review of Economics and Statistics

|volume=62 |issue=2

|pages=318–321

|jstor=1924766

|doi= 10.2307/1924766

}}

Variations of the tobit model

Variations of the tobit model can be produced by changing where and when censoring occurs. {{harvtxt|Amemiya|1985|loc=p. 384}} classifies these variations into five categories (tobit type I – tobit type V), where tobit type I stands for the first model described above. Schnedler (2005) provides a general formula to obtain consistent likelihood estimators for these and other variations of the tobit model.{{Cite journal |last=Schnedler |first=Wendelin |year=2005 |title=Likelihood estimation for censored random vectors |journal=Econometric Reviews |volume=24 |issue=2 |pages=195–217 |url= http://www.uni-heidelberg.de/md/awi/forschung/dp417.pdf|doi=10.1081/ETC-200067925 |hdl=10419/127228 |s2cid=55747319 }}

=Type I=

The tobit model is a special case of a censored regression model, because the latent variable y_i^* cannot always be observed while the independent variable x_i is observable. A common variation of the tobit model is censoring at a value y_L different from zero:

: y_i = \begin{cases}

y_i^* & \text{if } y_i^* >y_L, \\

y_L & \text{if } y_i^* \leq y_L.

\end{cases}

Another example is censoring of values above y_U.

: y_i = \begin{cases}

y_i^* & \text{if } y_i^*

y_U & \text{if } y_i^* \geq y_U.

\end{cases}

Yet another model results when y_i is censored from above and below at the same time.

: y_i = \begin{cases}

y_i^* & \text{if } y_L

y_L & \text{if } y_i^* \leq y_L, \\

y_U & \text{if } y_i^* \geq y_U.

\end{cases}

The rest of the models will be presented as being bounded from below at 0, though this can be generalized as done for Type I.

=Type II=

Type II tobit models introduce a second latent variable.{{cite book |last=Amemiya |first=Takeshi |title=Advanced econometrics |publisher=Harvard University Press |location=Cambridge, Mass |year=1985 |isbn=0-674-00560-0 |oclc=11728277 |page=[https://archive.org/details/advancedeconomet00amem/page/384 384] |url-access=registration |chapter=Tobit Models |url=https://archive.org/details/advancedeconomet00amem}}

: y_{2i} = \begin{cases}

y_{2i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

In Type I tobit, the latent variable absorbs both the process of participation and the outcome of interest. Type II tobit allows the process of participation (selection) and the outcome of interest to be independent, conditional on observable data.

The Heckman selection model falls into the Type II tobit,{{cite journal|last1=Heckman|first1=James J.|title=Sample Selection Bias as a Specification Error|journal=Econometrica|volume=47|issue=1|year=1979|pages=153–161|issn=0012-9682|doi=10.2307/1912352|jstor=1912352}} which is sometimes called Heckit after James Heckman.{{cite journal|last1=Sigelman|first1=Lee|last2=Zeng|first2=Langche|title=Analyzing Censored and Sample-Selected Data with Tobit and Heckit Models|journal=Political Analysis|volume=8|issue=2|year=1999|pages=167–182|issn=1047-1987|doi=10.1093/oxfordjournals.pan.a029811|jstor=25791605}}

=Type III=

Type III introduces a second observed dependent variable.

: y_{1i} = \begin{cases}

y_{1i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

: y_{2i} = \begin{cases}

y_{2i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

The Heckman model falls into this type.

=Type IV=

Type IV introduces a third observed dependent variable and a third latent variable.

: y_{1i} = \begin{cases}

y_{1i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

: y_{2i} = \begin{cases}

y_{2i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

: y_{3i} = \begin{cases}

y_{3i}^* & \text{if } y_{1i}^* \leq0, \\

0 & \text{if } y_{1i}^* <0.

\end{cases}

=Type V=

Similar to Type II, in Type V only the sign of y_{1i}^* is observed.

: y_{2i} = \begin{cases}

y_{2i}^* & \text{if } y_{1i}^* >0, \\

0 & \text{if } y_{1i}^* \leq 0.

\end{cases}

: y_{3i} = \begin{cases}

y_{3i}^* & \text{if } y_{1i}^* \leq 0, \\

0 & \text{if } y_{1i}^* > 0.

\end{cases}

=Non-parametric version=

If the underlying latent variable y_i^* is not normally distributed, one must use quantiles instead of moments to analyze the

observable variable y_i. Powell's CLAD estimator offers a possible way to achieve this.{{cite journal|last1=Powell|first1=James L|title=Least absolute deviations estimation for the censored regression model|journal=Journal of Econometrics|date=1 July 1984|volume=25|issue=3|pages=303–325|doi=10.1016/0304-4076(84)90004-6 |citeseerx=10.1.1.461.4302}}

Applications

Tobit models have, for example, been applied to estimate factors that impact grant receipt, including financial transfers distributed to sub-national governments who may apply for these grants. In these cases, grant recipients cannot receive negative amounts, and the data is thus left-censored. For instance, Dahlberg and Johansson (2002) analyse a sample of 115 municipalities (42 of which received a grant).{{Cite journal|last1=Dahlberg|first1=Matz|last2=Johansson|first2=Eva|date=2002-03-01|title=On the Vote-Purchasing Behavior of Incumbent Governments|journal=American Political Science Review|volume=96|issue=1|pages=27–40|doi=10.1017/S0003055402004215|issn=1537-5943|citeseerx=10.1.1.198.4112|s2cid=12718473}} Dubois and Fattore (2011) use a tobit model to investigate the role of various factors in European Union fund receipt by applying Polish sub-national governments.{{Cite journal|last1=Dubois|first1=Hans F. W.|last2=Fattore|first2=Giovanni|date=2011-07-01|title=Public Fund Assignment through Project Evaluation|journal=Regional & Federal Studies|volume=21|issue=3|pages=355–374|doi=10.1080/13597566.2011.578827|s2cid=154659642|issn=1359-7566}} The data may however be left-censored at a point higher than zero, with the risk of mis-specification. Both studies apply Probit and other models to check for robustness. Tobit models have also been applied in demand analysis to accommodate observations with zero expenditures on some goods. In a related application of tobit models, a system of nonlinear tobit regressions models has been used to jointly estimate a brand demand system with homoscedastic, heteroscedastic and generalized heteroscedastic variants.{{Cite journal|last=Baltas|first=George|date=2001|title=Utility-consistent Brand Demand Systems with Endogenous Category Consumption: Principles and Marketing Applications|journal=Decision Sciences|language=en|volume=32|issue=3|pages=399–422|doi=10.1111/j.1540-5915.2001.tb00965.x|issn=0011-7315}}

See also

Notes

{{Notelist}}

References

{{Reflist}}

Further reading

  • {{cite book|ref=none |last=Amemiya |first=Takeshi |chapter=Tobit Models |title=Advanced Econometrics |year=1985 |publisher=Basil Blackwell |location=Oxford |isbn=0-631-13345-3 |pages=360–411 |chapter-url=https://books.google.com/books?id=0bzGQE14CwEC&pg=PA360 }}
  • {{cite book|ref=none |first=Richard |last=Breen |title=Regression Models : Censored, Samples Selected, or Truncated Data |location=Thousand Oaks |publisher=Sage |year=1996 |isbn=0-8039-5710-6 |chapter=The Tobit Model for Censored Data |pages=12–33 }}
  • {{cite book|ref=none |first=Christian |last=Gouriéroux |author-link=Christian Gouriéroux |chapter=The Tobit Model |title=Econometrics of Qualitative Dependent Variables |location=New York |publisher=Cambridge University Press |year=2000 |isbn=0-521-58985-1 |pages=170–207 |chapter-url=https://books.google.com/books?id=dE2prs_U0QMC&pg=PA170 }}
  • {{cite book|ref=none |last=King |first=Gary |author-link=Gary King (political scientist) |chapter=Models with Nonrandom Selection |title=Unifying Political Methodology : the Likehood Theory of Statistical Inference |publisher=Cambridge University Press |year=1989 |isbn=0-521-36697-6 |pages=208–230 |chapter-url=https://books.google.com/books?id=cligOwrd7XoC&pg=PA208 }}
  • {{cite book|ref=none |first=G. S. |last=Maddala |author-link=G. S. Maddala |title=Limited-Dependent and Qualitative Variables in Econometrics |url=https://archive.org/details/limiteddependent00madd |url-access=limited |location=New York |publisher=Cambridge University Press |year=1983 |isbn=0-521-24143-X |chapter=Censored and Truncated Regression Models |pages=[https://archive.org/details/limiteddependent00madd/page/n82 149]–196 }}

{{Economics}}

Category:Regression models

Category:Single-equation methods (econometrics)