Panel data - Rémi Bazillier

and autocorrelation → option cluster in stata .... Lagrange multiplier (LM) test due to Breusch and Pagan ... Choosing between FE and RE (Hausman test).
441KB taille 13 téléchargements 45 vues
M2R “Development Economics” Empirical Methods in Development Economics Universit´e Paris 1 Panth´eon Sorbonne

Panel data R´emi Bazillier [email protected]

Semester 1, Academic year 2016-2017

1 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

2 / 40

The magic of Panel Data

“Panel Data allows us to observe the unobservable” I

Unobservable effects I

I

I

One of the most important reasons why it is difficult to argue for causality. We always face the objection that there is something unobserved driving the outcome we observe We can use Panel Data to control for time-invariant unobservables If the unobservables correlated with our explanatory variables do vary over time, then we need to take another step → Instrumental Variables (next chapter)

3 / 40

Panel data I

Panel data combines a cross-sectional with a time-series dimension I

I

I

But not all data that combines these two dimensions are panel-data: independently pooled cross sections are sampled randomly from a population at different points in time panel data covers the same individuals (or same households, firms, countries...) over time

The structure of the panel I I I

I

N individuals followed over T time periods In microeconomic datasets: N is large while T is short In macroeconomic datasets, T can be large → you may have to take into account the time-series properties of data We will focus here on static models I

Lagged dependent variables in panels create problems (→ you need to use dynamic panel data models)

4 / 40

Balanced and unbalanced Panels

I

Balanced panel: same T for all individuals

I

Unbalanced panel: different T

I

In practice: balanced and unbalanced panels are treated in the same way One exception when the panel is unbalanced for reasons that are not random

I

I

I

Firms with relatively low levels of productivity have relatively high exit rates (smaller T ) In that case, you have to model the process of selection

5 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

6 / 40

Panel data and endogeneity I

I

Let us set out the simplest version of a panel model (1)

uit = ci + eit

(2)

Two elements in the error term (ci and uit ) I

ci is the time-invariant unobservable determinant of yi t I I

I

yit = βxit + uit

If we think that an individual ‘ability’ is something with which they are endowed and does not change over time The quality of management (if time-invariant)

The Least Squares Dummy Variable model: I

Individual dummies to control for time-invariant unobservables

yit = βxit + ci + eit

(3)

7 / 40

I

Assumption of strict exogeneity using LSDVM: yit = βxit + ci + eit

I

I

(4)

It implies that explanatory variables in each period of time are uncorrelated with the idiosyncratic error, that is the time-varying part of the error, in each period

Pooled OLS estimators (estimations without ci ) yit = βxit + vitOLS vitOLS I

= ci + eit

(5) (6)

The xit should be uncorrelated with the time-invariant unobservables ci and the time-variant unobservables

8 / 40

I

However, even if both the eit and the ci are uncorrelated with the xit , the vitOLS will be autocorrelated I I

I

Standard errors from OLS will not be correct We need standard errors which are robust to heteroskedasticity and autocorrelation → option cluster in stata ... or use a different estimator allowing autocorrelation in the error term

9 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

10 / 40

A panel macro production function

log

I

Vit K = α log it + (1 − α)φ(Eit ) + (1 − α) log Ait + uit Lit Lit

We assume that the level of productivity differs by country and is time-invariant (log Ai ) and the change in productivity is common across all countries (Year ) log Ait = Ai + λYear

I

(7)

(8)

The production function is therefore: K V log it = α log it + (1 − α)φ(Eit ) + (1 − α) log Ai + (1 − α)λYear + uit Lit Lit (9)

11 / 40

I I

In Hall et Jones (1999): residuals are interpreted as differences in underlying productive efficiency differences accross countries Here: some time-invariant differences (logAi ) and a time trend (year ) I

I

This approach treats Ai as fixed (rather than random) and the results estimates are fixed effect estimates Here, fixed effects are the time-invariant difference in productivity across countries

12 / 40

Two waves (1 and 2) V K log i1 = α log i1 + (1 − α)φ(Ei1 ) + (1 − α) log Ai + (1 − α)λ + ui1 Li1 Li1 (10) K V log i2 = α log i2 + (1 − α)φ(Ei2 ) + (1 − α) log Ai + (1 − α)λ + ui2 Li2 Li2 (11) We can differenciate the equation: V log ∆ it = (1 − α)λ + α∆ log Lit

Kit + (1 − α)∆φ(Eit ) + ∆uit Lit

(12)

13 / 40

Pooled OLS

Data: PWT 1980 - 2000

14 / 40

Some remarks

I I

Natural log of capital per capita: 0.57 (still much higher than 0.3) Education: 0.08 I

I

I

I

The correlation between education and income at the macro level is much weaker than in the micro data A review of the evidence linking micro and macro data: Pritchett (2006) “Does learning to add up add up? The returns to schooling in aggregate data” in Handbook of the Economics of Education, vol. 1 See also Cohen and Soto (2007) today presentation

The measure of the change in productivity: negative and significant (but small)

15 / 40

Fixed effects (1)

Data: PWT 1980 - 2000 16 / 40

Fixed effects (2)

Data: PWT 1980 - 2000 17 / 40

Remarks

I

Two ways of estimating fixed effects (dummies or xtreg) I I

Coefficients are perfectly similar Slight changes in standard errors (the finite sample correction that is made is different btw the two methods in Stata)

I

Lower level of lkp (but still much higher than expected)

I

Returns to education turns not significant!

I

Change in productivity: not significant

18 / 40

First difference

Data: PWT 1980 - 2000

19 / 40

Remarks

I

Model in first difference gives exactly the same result as for the method which uses dummy variables

I

This does not hold after we have more than two waves of data (different estimates)

20 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

21 / 40

A panel micro production function

I

A panel micro firm-level data from Ghana (Soderbom and Teal 2004)

log Vit = α0 + α1 log Kit + α2 log Lit + α3 HCit + α4 Sectori +α5 Locationi + α6 Ownershipi + α6 FirmAgeit +α7 Wave7 + uit (13)

22 / 40

Pooled OLS

23 / 40

Fixed effects

24 / 40

I I

Factor inputs are not significant when controlling for unobservable time-invariant characteristics (fixed effects) Time dummy is not significant also: I

I

Changes in total factor productivity (as we controlled for observed input, what is left is TFP)

In Fixed effects estimates, we control for fixed effects I But no consistent estimates of these fixed effects → do not interpret the estimated coefficients!

25 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

26 / 40

Panel estimators yit = β 0 + β 1 x1it + β 2 x2it + ... + β K xKit + (ci + uit ) I I

(14)

Pooled OLS are consistent if ci + uit are uncorrelated with explanatory variables Within transformation I

I I

I

I

Transformation of the original equation by “demeaning variables” (yit − yi ) It eliminates ci from the equation Resulting estimator: Within estimator is unbiased (if E (uit − u x1it − x1, ..., xKit − xK 1 ) = 0 and consistent (if (uit − u, xKit − xK 1 = 0) We do not have to assume anything on the relationship between ci and the explanatory variables However, we need to assume that explanatory variables are strictly exogenous 27 / 40

I I

Fixed effects and First differences estimators are exactely equivalent when T = 2 It is generally not the case when T > 2 I

I

Under the null hypothesis that strict exogeneity holds, FE and FD differ only because of sampling error If FE and FD are significantly different (so that the differences in the estimates cannot be attributed to sampling error), it raises concerns as to the validity of the strict exogeneity assumption (See Wooldrige 2010, ch. 10)

28 / 40

Random Effects estimators

I

POLS is consistent if ci is uncorrelated with all explanotory variables and the time-varying part of the residual uit is uncorrelated with all explanatory variables

I

But POLS is not efficient because ci + uit is serially correlated Random effects models:

I

I I

Like POLS, the RE estimator leaves ci in the error term But RE explicitly deals with the serial correlation in the equation residual (ci + ui t)

29 / 40

Key assumptions for consistency

I

POLS: ci and uit have to be uncorrelated with the explanatory variables. Strict exogeneity not required.

I

FE: The residual in each period of time has to be uncorrelated with all explanatory variables across all time periods (strict exogeneity). Individual effects ci can be correlated with the explanatory variables

I

FD: the residual in t − 1, t and t + 1 has to be uncorrelated with all explanatory variables. Individual effects ci can be correlated with the explanatory variables

I

RE: Strict exogeneity and zero correlation between the individual effect and the explanatory variables

30 / 40

Model selection

I

If strict exogeneity holds I

I

I

If the individual effect ci is correlated with explanatory variables, use FD or FE If the individual effect ci is not correlated with explanatory variables, use RE (more efficient)

Testing for correlation between the ci and explanatory variables I

Hausman test I I

I

Null hypothesis accepted: RE is consistent and should be used Null hypothesis rejected: RE is not consistent, FE or FD should be used

The Mundlak-based approach (see next section)

31 / 40

I

Testing for the presence of an unobserved effect I

If the regressors are strictly exogenous and uit is non-autocorrelated and homoskedastic, then POLS and RE are efficient if the unobserved effects ci is constant across all individuals I

→ There are no time-invariant unobserved effects

I

If ci varies across individuals, then only RE are efficient

I

Lagrange multiplier (LM) test due to Breusch and Pagan (1980): I

xttest0 following RE estimation

32 / 40

Outline

Panel data Panel data and endogeneity Panel production functions A panel macro production function A panel micro production function Panel estimators Panel estimators The micro panel production function extended

33 / 40

Pooled OLS I

Now six waves of data using the firm-level data from Ghana

34 / 40

Fixed effects

35 / 40

Random effects

36 / 40

Choosing between POLS and RE (Brush-Pagan)

I We reject the hypothesis that there are no time-invariant unobserved effects I → RE is more efficient

37 / 40

Choosing between FE and RE (Hausman test)

I RE specification can be justified, since the p-value associated with the Hausman test is higher than 0.10 I However ’Vb − VB is not positive definite’ and RE standard error associated with wave 3 is higher thant its FE counterpart → not in line with the assumption that RE is more efficient than FE → we should interpret the Hausman test with caution

38 / 40

Choosing between FE and RE (Mundlack-based approach)

I

Method: I

I I

I

Add the individual specific means of the explanatory variables to the baseline specification and estimate by means of OLS Then test (using a F-test) the joint significance of these means If we reject the hypothesis that the coefficients on all the mean variables are equal to zero → we reject the hypothesis that ci is uncorrelated with the explanatory variables xjit → FE instead of RE

Here: we reject the hypothesis that ci is uncorrelated with the explanatory variables → FE estimates

39 / 40

What drives the productivity of Ghanaian firms?

I

I

If you believe that RE or POLS are consistent and efficient: labour productivity is driven primarly by differences in capital intensity. Human capital plays some role, although its level of significnace varies across estimator If you believe that only FE can be used: only time-invariant unobservables drive labour productivity I

I

It is an economist’s way of saying, “I don’t know”!

In this chapter, we assume that xit are uncorrelated with the uit I If it is not the case, endogeneity → Instrumental Variables → Next chapter

40 / 40