Applied Econometrics .fr

General issue in policy evaluation .... There is no general expression for its variance-covariance matrix .... Ex : Moving to Opportunity, RAND Health Insurance.
1MB taille 6 téléchargements 280 vues
Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

2/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

3/70

What if hypotheses are not verified ?

I

Heteroskedasticity : error terms do not have the same variance (ex : size effect for firms) → cross sections

I

Autocorrelation : error term from period t is correlated to error term of period t − 1, and/or t − 2 ... → time series

In that case, V (u) 6= σ 2 IN : OLS are still unbiased but less precise. Furthermore, all the tests we derived are not valid any more since they rely on homoskedasticity assumptions. We thus have to modify the estimation to make a reliable assessment on parameter values.

4/70

The problem with OLS

Let’s use the following model : y = Xb + u I

Assume that V (u) = σ 2 Ω, with Ω any suitable matrix E (bˆols ) = b : still unbiased

I

But V (bˆols ) = σ 2 (X 0 X )−1 X 0 ΩX (X 0 X )−1

I

If we run an OLS regression with a software, we will get bˆ ˆ (b) ˆ computed as computed as bˆ = (X 0 X )−1 X 0 y and V 2 0 −1 ˆ ˆ V (b) = σ ˆ (X X )

I

The software will thus give us unbiased estimates, but wrong variances, so all the tests we may run are wrong (T-test, F-test)

I

5/70

How can we correct for these problems ?

I

We need to transform the model so as to get homoskedastic and independent error terms

I

We can use generalized least squares, that will replace ordinary least squares

I

We will eventually end up re-weighting the model with the help of the variance matrix of error terms (see modified Chow test : same concept)

6/70

Generalized least squares Let’s use the following model : y = Xb + u I

Assume that V (u) = σ 2 Ω

I

Ω is symmetric and positive definite, so Ω−1 exists and Ω−1/2 exists as well

I

Ω−1/2 Ω−1/2 = Ω−1

I

Let’s multiply the model on the left handside by Ω−1/2 : Ω−1/2 y = Ω−1/2 Xb + Ω−1/2 u

I

The error terms of this "new" model are homoskedastic and uncorrelated : V (Ω−1/2 u) = σ 2 IN

I

We can use the transformed model for the estimation of b and most of all for inference

7/70

The GLS estimator

I

I

Using OLS on the transformed model, we get the GLS estimator bˆgls = (X 0 Ω−1 X )−1 X 0 Ω−1 y

I

It is obtained by minimizing the sum of squared residuals weighted by Ω−1

I

GLS are unbiased

I

The "true" variance that can be used for tests is the following : V (bˆgls ) = σ 2 (X 0 Ω−1 X )−1

I

Notice that the transformed model does not necessarily have a meaning : it is only useful for the estimation procedure

8/70

Using GLS

Since GLS amounts to OLS computed on a model that has nice properties, we can implement all the tests we described earlier (T-test, F-test)

9/70

The FGLS estimator

I

GLS have nice properties

I

The problem is however that we do not know the true Ω : we thus have to use an estimator of Ω and we thus turn to the feasible generalized least squares FGLS

I

We cannot compute all the variances and covariances for the N error terms, since we have N observations

I

We’ll have to estimate these variances and covariances in some way (see next chapter)

10/70

Remarks

I

FGLS and GLS are asymptotically equivalent (when sample size goes to infinity)

I

FGLS have good properties only asymptotically

I

FGLS are usually biased with a small sample

I

Thus, in small samples, we cannot be sure that FGLS outperform OLS

11/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

12/70

Heteroskedasticity I I

I I I

I

Heteroskedasticity usually occurs with cross sections Let’s assume error terms are uncorrelated but do not have the same variance (ex : households with very different income levels) The variance matrix of error terms is a diagonal matrix σ 2 Ω Is it thus easy to compute Ω−1 This is sometimes called the Weighted Least Squares estimator (WLS) because each observation is weighted proportionally to the inverse of the standard error of its corresponding error term It is as if we put a smaller weight on high-income households 

a1 0 · · ·   0 a2 · · · Ω= .. ..  .. . . . 0 0 0

0 0 .. . aN

    so Ω−1  



1/a1 0 ···  0 1/a  2 ··· = . . ..  . .. .  . 0 0 0

0 0 .. . 1/aN

     

(1)

13/70

Testing for heteroskedasticity

I

First, simply looking at residuals

I

Then, various tests are available

I

Breusch-Pagan test : regress the squared residuals on all the explanatory variables

I

White test : regress the squared residuals on all the explanatory variables raw, squared, and their cross-products

I

If we find some parameters significantly different from 0, then we conclude there is heteroskedasticity

I

We will eventually use these regressions to estimate Ω and lead the weighted regression

Stata command : estat hettest, bpagan, whitetst

14/70

Weighted least squares in practice Let’s consider a heteroskedastic model y = Xb + u, with V (ui ) = E (ui2 ) = σi2 I

The variance matrix of error terms is a diagonal with N unknown elements

I

Let’s assume for example that V (ui ) = a.xi,j , assuming variable xj is always positive

I

A way to check if it is true is to first estimate the original regression with OLS, then regress the squared residuals on variable xj (in our case, without a constant)

I

If we find this is the best specification, we only have to, for √ each individual i, divide every term of the model by xi,j

I

This holds for any function of x ’s : if we find out q that V (ui ) = a.f (xi,j ), then we divide the model by f (xi,j ) 15/70

Should we always use FGLS ?

I

FGLS do not have nice small sample properties, while OLS do

I

But using OLS gives us wrong standard errors for estimates

I

We could use OLS and compute corrected standard errors for them

I

We can use the White (1980) matrix to run tests

16/70

The White matrix

White (1980) proved that a consistent estimator for the variance matrix of parameters is : ˆ (bˆols ) = (X 0 X )−1 ( V

N X

uˆt2 Xt Xt0 )(X 0 X )−1

t=1

Where the uˆ’s are the residuals from the OLS regression. This matrix can be used as an estimate of the true variance of OLS estimators, and thus for tests. They are called "White" or "robust" standard errors (because they are robust to heteroskedasticity), and can be computed directly by softwares (option "robust").

17/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

18/70

Autocorrelation

I

Error term from period t is correlated to error term of period t − 1, and/or t − 2 ...

I

First-order autocorrelation : ut = ρut−1 + εt

I

Where |ρ| < 1 and ε is a "white noise"

I

"White noise" means ∀t, E (εt ) = 0 and V (εt ) = σε , all εt uncorrelated to each other and to u’s

19/70

First-order autocorrelation

The variance of error terms σ 2 Ω is no longer a diagonal matrix because Ω is no longer the identity matrix :   

Ω=  

1 ρ .. .

ρ 1 .. .

ρ2 ρ .. .

··· ··· .. .

ρN−1 ρN−2 ρN−3 · · ·

And we have : E (ut ) = 0, V (ut ) = σ 2 =



ρN−1 ρN−2   ..   . 

(2)

1

σε2 . 1−ρ2

The correlation coefficient between ut−h and ut is ρ|h| .

20/70

Testing for first-order autocorrelation Durbin-Watson test : H0 : ρ = 0 vs. H1 : ρ 6= 0 Test statistic : PN (ˆ ut − uˆt−1 )2 DW = t=2PN ˆt2 t=1 u And DW ≈ 2(1 − ρˆ) The Durbin-Watson table gives us two values d1 and d2 such that : DW ∈ [0, d1 ] ⇒ ρ > 0 DW ∈ [d1 , d2 ] ⇒? DW ∈ [d2 , 4 − d2 ] ⇒ ρ = 0 DW ∈ [4 − d2 , 4 − d1 ] ⇒? DW ∈ [4 − d1 , 4] ⇒ ρ < 0 Requisites : the model needs a constant term ; in the explanatory variables, we can’t have the dependent variable lagged 21/70

The Breusch-Godfrey test

I

This test is used to detect autocorrelation >1 and works when the dependent variable lagged belongs to the explanatory variables

I

Run OLS on the model

I

Save residuals et

I

Estimate the auxiliary regression explaining et as a linear function of explanatory variables + p previous residuals

I

Under the null (parameters of previous residuals simultaneously equal to zero), (N − p).R 2 follows a χ2 distribution of parameter p

I

In the test, the R 2 used is the one from the auxiliary regression

22/70

What to do if we find autocorrelation

I

The finding of autocorrelation often means that the model is misspecified

I

We thus should not directly switch from OLS to FGLS, but rather try to change the model

I

Ex : include as explanatory variable the dependent variable lagged (y from previous period), switch from x to log(x ), use variables expressed as growth rates instead of levels, etc.

23/70

FGLS in the case of autocorrelation

I

If modifying the model doesn’t help, we have to correct for autocorrelation using FGLS, just as we did in the case of heteroskedasticity

I

We will use "quasi-differencing"

24/70

Reminder : first-order autocorrelation

   Ω=  

1 ρ .. .

ρ 1 .. .

ρ2 ρ .. .

··· ··· .. .

ρN−1 ρN−2 ρN−3 · · ·



ρN−1 ρN−2   ..   . 

(3)

1

So we know the form of Ω−1 : 

Ω−1

1 −ρ 0 ··· −ρ 1 + ρ2 −ρ ···  1  2  0 −ρ 1+ρ ··· = 1 − ρ2  .. .. ..  ..  . . . . 0 0 0 ··· − ρ



0 0   0 ..   .

(4)

1

25/70

The FGLS transforming matrix

We can find a matrix Γ such that (1 − ρ2 )Ω−1 = Γ0 Γ with : p    Γ=   

1 − ρ2 0 −ρ 1 0 −ρ .. .. . . 0 0

0 0 1 .. .

··· ··· ··· .. .



0 0   0 ..   .

(5)

0 ··· − ρ 1

And the FGLS transformed model is written as : Γy = ΓXb + Γu.

26/70

The transformed model

I

The FGLS transformed model is written as : Γy = ΓXb + Γu

I

In the new model, y1 is replaced by ( 1 − ρ2 )y1 and all the other outcome variables yi are replaced by yi − ρyi−1

I

Same for each independent variable xj including the constant

I

Except for the 1st observation, all observations are "quasi-differenced"

I

yi − ρyi−1 = b0 (1 − ρ) + b1 (x1,i − ρx1,i−1 ) + ... + εi

I

∀i > 1, ui − ρui−1 = εi which is a white noise : now regular OLS can be used on this new model

I

The last thing we need to run the regression is an estimate of ρ, ρb

p

27/70

Estimation : the two-step method

1. Run original OLS regression y = Xb + u, 2. Save residuals uˆi P

uˆi uˆi−1 3. Estimate ρ by ρb = P 2 uˆi−1 4. Quasi-difference the model using ρb 5. Then run OLS on the transformed model

28/70

Estimation : the iterative method

Cochrane-Orcutt or Prais-Winsten method 1. Run the two-step method once, save ρc1 and bc1 2. From OLS on the (already) transformed model, save estimated parameters and compute residuals 3. Use these residuals to compute a new ρb, ρc2 4. Repeat until two successive ρb’s are sufficiently close to each other Stata command : prais

29/70

Last, what if we have both autocorrelation and heteroskedasticity ?

I

We may use a generalization of the White variance matrix if we have both heteroskedasticity and autocorrelation that could not be corrected for

I

It is used when we assume that the correlation between the u’s is zero after some lag

I

We get the so-called Newey-West standard errors, that are heteroskedasciticy and autocorrelation consistent

I

These can be computed automatically by Stata (newey)

30/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

31/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

32/70

Considering random explanatory variables

I

Until now, we considered the X ’s as non random

I

This amounted to reasoning conditionally on X , i.e. as if the X were fixed

I

A hypothesis we widely relied upon was that X and u were independent, which is always the case if X is fixed

I

But the X ’s can be considered as random variables

I

Conditionally on X , all the previous results hold

33/70

Hypotheses for estimation

1. E (u) = 0 2. ∀t, t 0 , Xt is random and uncorrelated to ut 0 3. Rk(X ) = k 4. E (uu 0 ) = σ 2 IN 5. Error terms are iid (0, σ 2 ) 6. plim[(X 0 X )]/N = VX which is a positive definite matrix (when N goes to infinity, the X variables always have some variance)

34/70

Properties of OLS

I I I

I I

I

We know that bˆols = (X 0 X )−1 X 0 y = b + (X 0 X )−1 X 0 u So E (bˆols ) = b + E [(X 0 X )−1 X 0 u] We get : E [(X 0 X )−1 X 0 u] = EX [Eu [(X 0 X )−1 X 0 u / X ]] = EX [(X 0 X )−1 X 0 Eu [u / X ]] and Eu [u / X ] = 0 bˆols is still unbiased In the same fashion, we get an expression for the variance of bˆols : V (bˆols ) = σ 2 E [(X 0 X )−1 ]

35/70

Unconditional inference

I I

We know that bˆols = b + (X 0 X )−1 X 0 u If we assume that the u are normal, bˆols is normal only conditionally on X

I

But if we assume that X is random, there is no reason why bˆols should be normal so in principle, we cannot get the Student and Fisher distributions we use for tests

I

However, it can be shown that usual tests still work

36/70

Proof Recall that for any parameter bj , the usual Student t-statistic is the following : tj = I

I

I I

I

bˆj σ ˆbˆj

We proved earlier that conditionally on X , tj followed a Student distribution TN−k But in fact the distribution TN−k does not depend on the X variables : so we can state that tj follows a TN−k distribution unconditionally The same kind of result can be proved for the F distribution Conclusion : all the tests we used are still valid, which is quite powerful given that we only need the u to be normal and that X can follow any distribution The fact that these test statistics follow a given distribution that does not depend on X makes them pivotal

37/70

General properties

I

We assume that the Xi are iid 1

I

OLS estimators are thus asymptotically normal and consistent

I

We can thus run the usual T- and F-tests

I

Remark : this was true as well when we considered the X non random, since even if the u were not normal, estimators were asymptotically normal

I

Thus, even if the u are not normal, we can still use the usual t-test and F-test if the sample is large enough

38/70

1. independent and identically distributed

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

39/70

Non exogenous explanatory variables

I

Let’s use the following model : yt = Xt0 b + ut

I

Assume that E (Xt0 ut ) 6= 0 : some explanatory variables are correlated to the current error term

I

In this case, plim(X 0 u/N) 6= 0

I

OLS is biased and inconsistent

X 0u X 0 X −1 )] plim( ) 6= b N N Even if we increase the size of the sample, we won’t get the right value for b plim(bˆols ) = b + [plim(

40/70

Omitted variables Let’s assume the true model is the following : yt = Xt0 b + wt d + vt If we omit wt and estimate the following model instead : yt = Xt0 b + ut I

wt is included in ut

I

If wt is correlated to X , then E (Xt0 wt ) 6= 0 and thus E (Xt0 ut ) 6= 0 : OLS is biased

I

Remark : in the fixed X case, we would get the same result using the Frish-Waugh theorem (see Dormont)

I

When a variable that is correlated to the X ’s is omitted, estimators suffer from an omitted variable bias 41/70

Variables measured with error Let’s use a very simple one-variable theoretical model : yt∗ = xt∗ .b + vt Assume that x is measured with error. To run the estimation, we have access to values yt and xt : yt = yt∗ and xt = xt∗ + et , et being a white noise. The estimated model is thus : yt = xt .b + ut With ut = vt − bet . We thus have the following : E (xt ut ) = E [(xt∗ + et )(vt − bet )] = −bσe2 6= 0 OLS is thus inconsistent, and it can be shown that it is biased towards 0 : the influence of x on y is underestimated. 42/70

Simultaneous equations

Say we have the following system of equations : I

Yt = a + bXt + ut (1)

I

Xt = Yt + Zt (2)

Because of equation (1), Yt is endogenous and because of equation (2), Xt is endogenous too. At the system level, only Zt is exogenous. To get back to what we are used to, we should rewrite endogenous variables as functions of exogenous variables only.

43/70

Simultaneous equations cont’d

The system can be rewritten in the following way : I

Yt =

I

Xt =

a 1−b a 1−b

+ +

b 1−b Zt 1 1−b Zt

+ +

1 1−b ut 1 1−b ut

Calling µ the "new" residual, we get back to a usual framework. Notice that Xt appears to be a function of ut : in equation (1) of the first system, it is thus correlated to the error term. A basic hypothesis of OLS is violated (cov (Xt , ut ) 6= 0) and if we estimate (1) without taking into account the information provided by (2), estimates will be non consistent.

44/70

General issue in policy evaluation I

Say we want to evaluate the impact of a policy on people’s wages (ex : a specific program helping the unemployed)

I

A model describing the wage outcome is Yi = a + bXi + cPi + ui , where X comprises individual characteristics and P is a dummy variable indicating whether the individual was assigned to the program

I

If the policy is not randomized, i.e. if the fact of being assigned to the program depends on unobserved individual characteristics, then P is correlated to u and evaluation of the impact of the policy cannot be estimated consistently

I

Intuitive reason : those who where assigned to the program are not comparable to the ones who were not, so the latter cannot be considered as a valid control group for the former

I

This is called a selection effect 45/70

Consequences

In the case of non-exogeneity (also called endogeneity) of some explanatory variables : I

OLS estimators are biased : on average, we do not get the true value of parameters

I

OLS are non consistent : even if we increase the size of the sample, the bias does not go to zero and we will never get the true value of parameters

We thus have to use another technique of estimation, using auxiliary variables : the instrumental variables technique.

46/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

47/70

Instrumental variables Let’s use the following model, X being endogenous : y = X .b + u Let’s consider a set of variables Z(N,p) , with the following properties : I

E (Zt0 ut ) = 0 : variables Z are exogenous

I

Rk(Z ) = p

I

plim(Z 0 X /N) = VZX with VZX a non null matrix of dimension (p, k) and rank k

I

plim(Z 0 Z /N) = VZ with VZ a finite positive definite matrix of dimension (p, p)

Variables Z need to be exogenous and correlated to X , and the number of IV, p, is such that p > k. 48/70

The estimator The original model is : y = X .b + u I I I

Assume we regress y , u and every X on variables Z (k + 2 regressions) ˜ We compute the predictions, respectively y˜ , u˜ and X We use these new elements in the model instead : ˜ .b + u˜ y˜ = X

I I I I

Let’s call PZ the orthogonal projection matrix, that projects on L(Z ) ˜ We know that PZ y = y˜ , PZ u = u˜ and PZ X = X It’s as if we had premultiplied the original model by PZ The IV estimator is : ˜ 0X ˜ )−1 X ˜ 0 y˜ = (X 0 PZ X )−1 X 0 PZ y bˆiv = (X

49/70

Intuition

I

OLS on the original model y = X .b + u are inconsistent because variables X are correlated to u

I

To get rid of this correlation, we keep only the part of information from X that is uncorrelated to the error terms

I

Algebraically, we project the model on subspace L(Z ), that is generated by the Z variables, that are both exogenous and correlated to X

I

The more the Z are correlated to the X (and the more numerous the Z ’s are), the more precise the estimator is

50/70

Remarks

I

I

I

If p = k (same number of instruments Z and of explanatory variables X ), we get bˆiv = (Z 0 X )−1 Z 0 y Proof : rewriting bˆiv , given that in this case, matrix Z 0 X is square and invertible Even if this expression is simple, it is not recommended to choose this minimal number of intruments

51/70

Properties of the IV estimator (1)

I

bˆiv is biased in small sample

I

We thus cannot derive a general expression for its variance-covariance matrix

I

But it is consistent (when sample size goes to infinity)

I

It is asymptotically normal √ 0 N(bˆiv − b) → N(0, σ 2 (VZX VZ−1 VZX )−1 )

52/70

Properties of the IV estimator (2)

I

There is no general expression for its variance-covariance matrix, but we can derive its asymptotic variance-covariance matrix that is a consistent estimator of its "true" variance-covariance matrix

with σˆiv 2

0 √ 2 X PZ X −1 basympt ( N(b ˆiv − b)) = σ V ˆiv ( ) N iv = SSR and uˆi = y − X bˆiv N

53/70

Tests I

We do not have convenient properties of the estimator when in finite distance

I

To run tests, we rely on asymptotic results to run what is called Wald tests (C bˆiv − Cb)0 [C (X 0 PZ X )−1 C 0 ]−1 (C bˆiv − Cb) → χ2r 2 σ ˆiv

With r the number of linear restrictions : C is of size (r , k). Proof : taking the expression of the asymptotic normality of bˆiv , taking its quadratic form and "dividing" it by its variance will give a χ2 distribution (the "true" variance is replaced by its consistent estimator). 54/70

Outline

"Non spherical disturbances" : GLS Heteroskedasticity Autocorrelation Endogeneity and Instrumental variables Random explanatory variables Non-exogeneity : definition & examples The IV estimator IV in practice Examples

55/70

How to run IV estimation

I

Stata : "ivregress" command

I

This amounts to running two-stage least squares

I

Intuition : first, regress the y and X 0 s on variables Z , then use the predictions in the model instead of the original values

56/70

Two-stage least squares (1)

I

Run k + 1 regression, to get PZ y and PZ X

I

Estimate OLS on the transformed model PZ y = PZ yb + u We thus get bˆiv = (X 0 PZ X )−1 X 0 PZ y

I I

The first k + 1 regressions can be used to assess the conveniency of instruments (they have to be correlated enough to the X 0 s)

I

Remark : this yields the same values if we do not replace y by PZ y

57/70

Two-stage least squares (2)

I

Warning : if we do this procedure "by hand", running 2 OLS regressions, instead of running the convenient procedure with the software, the s.e.’s of the second regression cannot be used for tests on the coefficients

I

Reason : in the second stage equation, residuals are computed as : uˆ = PZ y − PZ X bˆiv Whereas they should be computed as uˆ = y − X bˆiv

I

58/70

Remark

I

Exogenous X 0 s can be used as instruments

I

In that case, 2SLS amounts to regressing the potentially endogenous explanatory variables (say, x1 to xj ) onto the exogenous explanatory variables (say, xj+1 to xk ) and instruments Z

59/70

Exogeneity test I

We test H0 : E (X 0 u) = 0

I

This is called the "Hausman test" or "Durbin-Wu-Hausman test", but in softwares it can be found under the "hausman" command

I

If H0 is true, then both OLS and IV estimators are consistent

I

If H0 is false, only the IV estimator is consistent The test is based on the difference between bˆiv and bˆols

I I

They are asymptotically normal : if we compute the difference between the two, take its quadratic form and "divide" it by its variance matrix, we will get a χ2 distribution, of parameter the number of variables tested (the ones that are potentially endogenous)

60/70

A convenient auxiliary regression

I

Consider model y = Xb + u, where a subset of variables belonging to X might be endogenous : x

I

Let’s call Z the instruments, some belonging to X (in fact the X without the x ) and some not

I

Consider the augmented model : y = Xb + MZ xc + ε

I

MZ x are the residuals of the regression of x on Z The bˆ of this "augmented" regression is equal to the IV estimator of the original model

I

I

Testing c = 0 amounts to testing exogeneity of the x (it is equivalent to the Hausman test)

61/70

Proofs

I

bˆaug = bˆiv and equivalence of tests : using the Frish-Waugh theorem

I

Remark : this augmented model has no theoretical meaning, the estimation is led only for our purpose

62/70

Selecting convenient instruments I

Sargan test : H0 : E (Z 0 u) = 0

I

Also called : test of overidentifying restrictions (Stata : overid command)

Under H0 : uˆ0 PZ uˆ → χ2p−k s2 with uˆ = y − X bˆiv and s 2 =

u ˆ0 u ˆ N .

uˆ0 PZ uˆ is the sum of the predicted value of the regression of uˆ on Z , squared. Remark : when p = k, the statistic is always zero so we cannot run the test because bˆiv = (Z 0 X )−1 Z 0 y and Z 0 uˆ = 0 63/70

The problem with weak instruments I

If instruments are too weakly correlated to the X 0 s, even if we increase the number of observations, there can be an important bias in estimations

I

Plus, the estimator has a nonnormal sampling distribution which makes statistical inference meaningless

I

The weak instrument problem is exacerbated with many instruments, so drop the weakest ones and use the most relevant ones

I

A way to measure how instruments are correlated to potentially endogenous variables is to run the regression explaining the former by the latter and check its goodness of fit

I

A criterion can be the global F statistic : if F