endogeneity (econometrics)
{{Short description|Concept in econometrics}}
{{Technical|date=January 2023}}
{{for multi|the concept in economic theory|Exogenous and endogenous variables|other uses|Endogeneity (disambiguation)}}
In econometrics, endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term.{{cite book |last=Wooldridge |first=Jeffrey M. |title=Introductory Econometrics: A Modern Approach |location=Australia |publisher=South-Western |year=2009 |edition=Fourth |pages=88 |isbn=978-0-324-66054-8 }} The distinction between endogenous and exogenous variables originated in simultaneous equations models, where one separates variables whose values are determined by the model from variables which are predetermined.{{efn|For example, in a simple supply and demand model, when predicting the quantity demanded in equilibrium, the price is endogenous because producers change their price in response to demand and consumers change their demand in response to price. In this case, the price variable is said to have total endogeneity once the demand and supply curves are known. In contrast, a change in consumer tastes or preferences would be an exogenous change on the demand curve.}}{{cite book |first=Jan |last=Kmenta |authorlink=Jan Kmenta |title=Elements of Econometrics |location=New York |publisher=MacMillan |edition=Second |year=1986 |isbn=0-02-365070-2 |pages=[https://archive.org/details/elementsofeconom0003kmen/page/652 652–53] |url=https://archive.org/details/elementsofeconom0003kmen/page/652 }} Ignoring simultaneity in the estimation leads to biased estimates as it violates the exogeneity assumption of the Gauss–Markov theorem. The problem of endogeneity is often ignored by researchers conducting non-experimental research and doing so precludes making policy recommendations.{{Cite journal|last=Antonakis|first=John|last2=Bendahan|first2=Samuel|last3=Jacquart|first3=Philippe|last4=Lalive|first4=Rafael|date=December 2010|title=On making causal claims: A review and recommendations|journal=The Leadership Quarterly|volume=21|issue=6|pages=1086–1120|doi=10.1016/j.leaqua.2010.10.010|issn=1048-9843|url=https://serval.unil.ch/resource/serval:BIB_12A79F6E956F.P001/REF.pdf}} Instrumental variable techniques are commonly used to mitigate this problem.
Besides simultaneity, correlation between explanatory variables and the error term can arise when an unobserved or omitted variable is confounding both independent and dependent variables, or when independent variables are measured with error.{{cite book |first=John |last=Johnston |authorlink=John Johnston (econometrician) |title=Econometric Methods |location=New York |publisher=McGraw-Hill |edition=Second |year=1972 |isbn=0-07-032679-7 |pages=[https://archive.org/details/econometricmetho0000john_t7q9/page/267 267–291] |url=https://archive.org/details/econometricmetho0000john_t7q9/page/267 }}
Exogeneity versus endogeneity
In a stochastic model, the notion of the usual exogeneity, sequential exogeneity, strong/strict exogeneity can be defined. Exogeneity is articulated in such a way that a variable or variables is exogenous for parameter . Even if a variable is exogenous for parameter , it might be endogenous for parameter .
When the explanatory variables are not stochastic, then they are strong exogenous for all the parameters.
If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent. There are many methods of correcting the bias, including instrumental variable regression and Heckman selection correction.
= Static models =
The following are some common sources of endogeneity.
== Omitted variable ==
{{further|Omitted-variable bias}}
In this case, the endogeneity comes from an uncontrolled confounding variable, a variable that is correlated with both the independent variable in the model and with the error term. (Equivalently, the omitted variable affects the independent variable and separately affects the dependent variable.)
Assume that the "true" model to be estimated is
:
but is omitted from the regression model (perhaps because there is no way to measure it directly).
Then the model that is actually estimated is
:
where (thus, the term has been absorbed into the error term).
If the correlation of and is not 0 and separately affects (meaning ), then is correlated with the error term .
Here, is not exogenous for and , since, given , the distribution of depends not only on and , but also on and .
== Measurement error ==
Suppose that a perfect measure of an independent variable is impossible. That is, instead of observing , what is actually observed is where is the measurement error or "noise". In this case, a model given by
:
can be written in terms of observables and error terms as
:
\begin{align}
y_i & = \alpha+\beta(x_i-\nu_i) + \varepsilon_i \\[3pt]
y_i & = \alpha+\beta x_i +(\varepsilon_i - \beta\nu_i) \\[3pt]
y_i & = \alpha+\beta x_i +u_i \quad (\text{where } u_i=\varepsilon_i - \beta\nu_i)
\end{align}
Since both and depend on , they are correlated, so the OLS estimation of will be biased downward.
Measurement error in the dependent variable, , does not cause endogeneity, though it does increase the variance of the error term.
== Simultaneity ==
Suppose that two variables are codetermined, with each affecting the other according to the following "structural" equations:
:
:
Estimating either equation by itself results in endogeneity. In the case of the first structural equation, . Solving for while assuming that results in
:.
Assuming that and are uncorrelated with ,
:.
Therefore, attempts at estimating either structural equation will be hampered by endogeneity.
= Dynamic models =
The endogeneity problem is particularly relevant in the context of time series analysis of causal processes. It is common for some factors within a causal system to be dependent for their value in period t on the values of other factors in the causal system in period t − 1. Suppose that the level of pest infestation is independent of all other factors within a given period, but is influenced by the level of rainfall and fertilizer in the preceding period. In this instance it would be correct to say that infestation is exogenous within the period, but endogenous over time.
Let the model be y = f(x, z) + u. If the variable x is sequential exogenous for parameter , and y does not cause x in the Granger sense, then the variable x is strongly/strictly exogenous for the parameter .
== Simultaneity ==
Generally speaking, simultaneity occurs in the dynamic model just like in the example of static simultaneity above.
See also
Footnotes
{{notelist}}
References
{{Reflist}}
Further reading
- {{cite book |first=William H. |last=Greene |title=Econometric Analysis |location=Upper Saddle River |publisher=Pearson |edition=Sixth |year=2012 |isbn=978-0-13-513740-6 }}
- {{cite book |first=Peter |last=Kennedy |title=A Guide to Econometrics |edition=Sixth |location=Malden |publisher=Blackwell |year=2008 |page=139 |isbn=978-1-4051-8257-7 }}
- {{cite book |first=Jan |last=Kmenta |authorlink=Jan Kmenta |title=Elements of Econometrics |location=New York |publisher=MacMillan |edition=Second |year=1986 |isbn=0-02-365070-2 |pages=[https://archive.org/details/elementsofeconom0003kmen/page/651 651–733] |url=https://archive.org/details/elementsofeconom0003kmen/page/651 }}
External links
- {{YouTube|dLuTjoYmfXs|Endogeneity: An inconvenient truth. Podcast with Prof. John Antonakis}}
- {{YouTube|id=WlOtUA8Rqw8&list=PLD15D38DC7AA3B737&index=14#t=7m42s|title=Lecture on Simultaneity Bias}} by Mark Thoma
- [http://sethgodin.typepad.com/seths_blog/2017/05/what-about-endogeneity.html Seth Godin's simple views on endogeneity]