Programme evaluation, Matching, RDD - Rémi Bazillier

Empirical Methods in Development Economics. Université Paris 1 .... labor training program on post intervention income levels. .... The balancing assumption is important because it ensures that one ..... teacher if class enrolment exceeds 60.
807KB taille 169 téléchargements 37 vues
M2R “Development Economics” Empirical Methods in Development Economics Universit´e Paris 1 Panth´eon Sorbonne

Programme evaluation, Matching, RDD R´emi Bazillier [email protected]

Semester 1, Academic year 2016-2017

1 / 71

Introduction

I

Programme evaluation: identify the causal effects of a ‘treatment’ or a ‘programme’

I

For an individual i with observed characteristics xi assigned to treatment w ∈ {0, 1} and with observed outcome yi , what would individual i have looked like if they had received treatment w’ instead?

I

See The Roy-Rubin model (ch. 1)

2 / 71

The Roy-Rubin model

I

The evaluation problem can be divided in two distinct parts: I I

I

A set of potential outcome An assignement mechanism that assigns each unit to one and only one treatment at each point in time

The fundamental problem of causal inference (Holland 1986): I

I

At any point in time, only one of these potential outcomes will actually be observed, depending on the assignment mechanism For individual whom we observe under treatment, we have to form an estimate of what they would have looked like if they had not been treated

3 / 71

I

The observed differences in potential outcomes between the treated and non-treated can be composed into the difference outcomes for those who are treated (the treatment effect on the treated , ATT) plus the difference in the potential outcome without treatment between those who actually received treatment and those who did not (the selection bias)

I

When potential outcomes are uncorrelated with treatment status, the selection bias is equal to zero (the case of randomized experiments)

4 / 71

I

Measures of impact: I

Average Treatment Effect (ATE): τ ATE = E (τ ) = E [Y (1) − Y (0)].

I

Average Treatment Effect on the Treated (ATT): τ ATT = E [Y (1) − Y (0)|T = 1].

I

τ ATE = τ ATT when there is no selection bias I

I I

This holds when the treatment assignment is uncorrelated with potential outcomes (the hypothesis of unconditional unconfoundedness) It implies the zero conditional mean assumption (see ch. 2) Unconfoundedness will hold if we carry out an experiment by which we randomly select the group for treatment from the population of interest (Randomized controlled trial, RCT)

5 / 71

Selection on observables I I

The simple difference-in-means estimator will be biased if there is selection into treatment However, it is possible to obtain an unbiased estimate of the average treatment effect if the selection is on observables I

A simple example: differences in achievement between students in private and public schools I I I I

Obviously: the ‘treatment’ (attending a private school) will be confounded by other factors (like the wealth of the household) The conditional independence assumption (CIA) or gnorability of treatment’ or ’unconfoundedness” Assumption that the potential outcomes are independent of actual treatment status, conditional on a vector of observables In that case, the ATE and ATT are the same and we can identify these effects under the overlap assumption

6 / 71

Overlap (or common support) assumption I

I

Overlap if we have both treated and untreated individuals, conditional on the vector of observables To obtain the ATE, we need to be able to evaluate: E (Achievement |(Wealth = high ), (School = Private ) E (Achievement |(Wealth = high ), (School = State ) E (Achievement |(Wealth = low ), (School = Private ) E (Achievement |(Wealth = low ), (School = State )

I

But if no household with low wealth goes to private school, we have Pr(Achievement |(Wealth = low ), (School = Private )) = 0

I

The overlap assumption is violated, we cannot obtain the ATE over the whole range of wealth 7 / 71

I

The condition of strongly ignorable treatment assignment (Rosenbaum and Rubin, 1983): I

I

The combination of the assumptions of unconfoundedness and overlap In that case, we can estimate the average treatment effect without bias by regressing the outcome variable y on the treatment dummy w and the observable variable x I

I

It implies that there is no omitted variable! (not very likely..)

If we want to allow for the possibility that the effect of the observable differs across the groups, we would need to include in the regression the interaction term

8 / 71

2. The matching approach 2.1. Matching based on a multi linear regression... I

Dehejia and Wahba (1999) want to estimate the impact of a labor training program on post intervention income levels.

I

More precisely, they aim at showing that a matching approach offers estimates that are close to those stemming from a randomized experiment...

I

... provided: I

unconfoundedness is credible (the dataset is rich: it allows to control for key pre-intervention variables or variables that are fixed over time);

I

common support (overlap) holds.

9 / 71

2. The matching approach 2.1. Matching based on a multi linear regression... I

To achieve their objective, Dehejia and Wahba (1999) rely on two groups of individuals: I

those who are treated (data for this group stem from Lalonde (1986)’s experimental dataset called NSW (National Supported Work); this dataset is based on a randomized experiment);

I

those who are untreated (data related to this group notably stem from the PSID (Panel Study of Income Dynamics): they are observational).

I

Dataset NSW-PSID is a combination of these two sources of data.

I

The outcome of interest is the earnings of individuals in 1978 (post intervention) in terms of 1978 dollars. 10 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Lalonde (1986) estimates the following equation, based on NSW data (i.e. data stemming from a randomized experiment): Y 78 = α + τT + u. (1)

I

What does τ stands for?

11 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

I

Given that NSW data are experimental, one can write: I

E [Y |T = 1] = E [Y (1)] = α + τ (the average outcome for treated individuals when the treatment is random);

I

E [Y |T = 0] = E [Y (0)] = α (the average outcome for untreated individuals when the treatment is random).

Indeed, E [u |T ] = 0.

12 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Hence, τ = E [Y (1)] − E [Y (0)] = E [Y (1) − Y (0)] = ATE.

I

Lalonde (1986) finds an ATE of 1,794 dollars (OLS estimation) that is significant at the 1% confidence level.

I

Let’s now rely on the NSW-PSID dataset and estimate Equation (1) with OLS.

13 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Coefficient τ is equal to -15204.78 (and significant at the 1% confidence level)!

I

Do you see why? 14 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

This is because the assignment of individuals to the treated and to the non treated group is not random anymore!

I

We are now working on observational, not on experimental data!

I

Put differently, there are characteristics (such as being unemployed before the intervention) that: I

positively impact the probability of being enrolled in the labor training program;

I

negatively impact the post treatment earnings.

15 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Therefore, even in the absence of the labor training program, those who enrolled in this program would anyway have ended up with lower post treatment earnings as compared to those who did not enroll.

I

Hence, the selection bias captured by E [Y (0)|T = 1] − E [Y (0)|T = 0] is negative... it runs against us finding a positive impact of the labor training program on post treatment earnings.

I

How could we reduce this bias?

16 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

One could introduce in Equation (1) the variables that likely influence both treatment assignment and potential outcomes (we denote this set of variables by X): Y 78 = α + τT + X0 β + u.

(2)

I

Assume that you convince the reader that all of these variables are included, and hence that the unconfoundedness assumption is satisfied.

I

What does τ capture now?

17 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

I

We can write: I

E [Y |T = 1, X] = E [Y (1)|X] = α + τ (the average outcome for treated individuals when the treatment is random, conditional on observables);

I

E [Y |T = 0, X] = E [Y (0)|X] = α (the average outcome for untreated individuals when the treatment is random, conditional on observables).

Indeed, E [u |T , X] = 0.

18 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Hence, τ = E [Y (1)|X] − E [Y (0)|X] = E [Y (1) − Y (0)|X] = CATE.

I

CATE stands for Conditional Average Treatment Effect.

I

What is the OLS estimate of τ equal to when X includes a large set of pre-intervention variables?

19 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

20 / 71

2. The matching approach 2.1. Matching based on a multi linear regression...

I

Coefficient τ is now equal to +751.9464.

I

We get closer to Lalonde’s estimate.

I

But we are not quite there: the order of magnitude is much lower and not statistically significant...

21 / 71

2. The matching approach 2.2. ... is problematic I

This is because common support usually does not hold when one implements matching based on a multi linear regression.

I

In the NSW-PSID dataset that contains only few observations (N = 2675), the common support assumption already fails to hold when we control for only one critical observable: education.

I

Let’s type the following command on Stata: twoway (scatter re78 education if treat==0, mcolor(black)) (scatter re78 education if treat==1, mcolor(red)), legend(order(1 "not trained" 2 "trained")) 22 / 71

2. The matching approach 2.2. ... is problematic

23 / 71

2. The matching approach 2.2. ... is problematic

I

It is striking that there is no common support for many values of the variable “education”.

I

This is the case when education is equal to 0, 2, 3, and 17.

I

What does τ capture when we only control for education in Equation (2)?

24 / 71

2. The matching approach 2.2. ... is problematic

I

For each value of the variable “education”, Stata (or any statistical software) computes the difference in outcome (re78) between those who are treated and those who are not treated.

I

The parameter τ captures the average of these differences.

I

But how can these differences be meaningfully computed for values of the variable “education” where there are no treated observations?

25 / 71

2. The matching approach 2.2. ... is problematic

I

The failure of the common support assumption leads to: I

a biased estimate of the treatment effect (the difference in outcome between those who are treated and those who are not treated cannot be always computed);

I

a large variance of this estimate (since there are, in some instances, no or very few observation(s) to construct the counterfactual).

26 / 71

2. The matching approach 2.2. ... is problematic I

Obviously, ensuring that the common support assumption holds is even more difficult when one controls for a set of observables.

I

Assume that one controls for education and married.

I

The variable “education” ranges from 0 to 17 while the variable “married” is a dummy.

I

This means that: I

there are now 18*2=36 different categories of individuals characterized by a specific education and marital status;

I

for each of these categories, we need both treated and non treated observations! 27 / 71

3. Propensity score matching: theory 3.1. Three, not two identifying assumptions I

We’ve just seen that implementing a matching strategy based on a multi linear regression is clearly not a good approach.

I

The alternative is to rely on a balancing score matching approach.

I

This approach has been defined by Rosenbaum and Rubin (1983): I

it does not consist in matching treated and non treated individuals based on a set of observables X;

I

it entails matching treated and non treated individuals based on only one variable called a balancing score.

28 / 71

3. Propensity score matching: theory 3.1. Three, not two identifying assumptions

I

A balancing score is a function of X, denoted by b (X) that must satisfy the following balancing assumption: T q X|b (X).

I

This assumption asserts that, conditional on the balancing score, the set of observables X are independent of assignment to the treatment.

I

Put differently, for observations with the same balancing score, the distribution of observables is the same across the treatment and the control group.

29 / 71

3. Propensity score matching: theory 3.1. Three, not two identifying assumptions I

The balancing assumption is important because it ensures that one only needs to match treated and non treated individuals based on the balancing score (i.e. matching these individuals on a set of observables is not required anymore).

I

Rosenbaum and Rubin (1983) show that a possible balancing score is the propensity score matching.

I

The propensity score is the probability for an individual to participate in a treatment given his observed characteristics X.

I

It is denoted by P (T = 1|X) = P (X). 30 / 71

3. Propensity score matching: theory 3.1. Three, not two identifying assumptions I

An approach that consists in matching treated and non treated individuals based on the propensity score is called propensity score matching.

I

Clearly, for propensity score maching to isolate the treatment effect, three identifying assumptions are needed: 1. the unconfoundedness assumption: (Y (0); Y (1)) q T |X; 2. the common support assumption: 0 < P (X) < 1; 3. the balancing assumption: T q X|P (X).

I

Note that, like the common support assumption, the balancing assumption can (and must) be tested. 31 / 71

3. Propensity score matching: theory 3.2. Choosing the propensity score function

I

The propensity score matching approach builds on the unconfoundedness assumption, which requires that the outcome variable must be independent of treatment assignment conditional on observables.

I

Hence, implementing this approach requires choosing a set of variables X as predictors of the probability of being treated that credibly satisfy this condition.

I

Put differently, all the variables that influence both treatment assignment and the outcome variable should be included.

32 / 71

3. Propensity score matching: theory 3.2. Choosing the propensity score function I

However, these variables should be those for which there are no feedback effects.

I

They should be those that are unaffected by the treatment (or by the anticipation of the treatment).

I

Therefore, they are: I

either fixed over time;

I

or pre-treatment (i.e., measured before treatment assignment)... but in this case, one must ensure that these pre-treatment variables have not been influenced by the anticipation of participation.

33 / 71

3. Propensity score matching: theory 3.2. Choosing the propensity score function I

Clearly, economic theory, a sound knowledge of previous research and also information about the institutional settings of the policy whose effect is estimated should guide the researcher in building up the model.

I

It is important to mention that the final specification that is chosen by the researcher should be the one that

I

I

satisfies the conditions above-mentioned;

I

ensures that the common support and the balancing assumptions are satisfied.

Remark: a discrete choice model (logit or probit) must be used to estimate the propensity score function since the dependent variable (being treated or not) is binary. 34 / 71

3. Propensity score matching: theory 3.3. Computing the ATT

I

I

Propensity score matching consists in computing the average difference between: I

the mean outcome of treated individuals characterized by a specific propensity score

I

and the mean outcome of untreated individuals that are characterized by a similar propensity score.

Hence, ATT, not ATE is estimated (since individuals are matched based on the propensity score of the treated).

35 / 71

4. Propensity score matching: practice 4.1. Step 1: Questioning the plausibility of the unconfoundedness assumption

I

What are the variables that influence both treatment assignment and potential outcomes?

I

Is your dataset rich enough to control for all of them?

I

If yes, proceed to Step 2.

36 / 71

4. Propensity score matching: practice 4.2. Step 2: Estimating the propensity score function

I

Estimate the probability of getting the treatment as a function of variables (either fixed over time or truly pre-treatment) that influence both treatment assignment and potential outcomes.

I

Rely on a logit (or probit) model.

37 / 71

4. Propensity score matching: practice 4.3. Step 3: Generating propensity scores

I

Do so for all treated and non treated observations.

I

After your logit estimation, type the following command: predict pscore, pr

38 / 71

4. Propensity score matching: practice 4.4. Step 4: Testing the common support assumption

I

First, discard: I

observations (if any) in the control group whose propensity scores is less than the minimum propensity scores in the treatment group or whose propensity scores is higher than the maximum propensity scores in the treatment group;

I

observations (if any) in the treatment group whose propensity scores is less than the minimum propensity scores in the control group or whose propensity scores is higher than the maximum propensity scores in the control group.

39 / 71

4. Propensity score matching: practice 4.4. Step 4: Testing the common support assumption

I

Second, rely on the histogram command to graphically test, for each strata (i.e. intervals) of your propensity score, whether there are observations both in the treated and in the non treated group.

I

Third, rely on the tabulate command to numerically test for the common support assumption.

I

If the common support assumption is violated, go back to step 2.

40 / 71

4. Propensity score matching: practice 4.5. Step 5: Testing the balancing assumption

I

For each strata of your propensity score, and for each observable characteristic that allows you to estimate the propensity score function, run a difference of means analysis across treated and non treated observations.

I

Rely on the ttest command.

I

If the balancing assumption is violated, go back to step 2.

41 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores

I

To do so, rely on the psmatch2 command.

I

psmatch2 is being continuously improved and developed.

I

Make sure to keep your version up-to-date by typing the following command: ssc install psmatch2, replace

42 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores

I

The psmatch2 command allows to perform many matching methods (type help psmatch2 for a full description).

I

The most straightforward matching estimator is nearest neighbour (“NN” hereafter) matching.

I

The individual from the comparison group is chosen as a matching partner for a treated individual because it is the closest in terms of the propensity score.

I

NN matching can be without or with replacement.

43 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores I

Without replacement, an untreated individual can be used only once as a match.

I

In this case, estimate depends on the order in which observations get matched when there are more than one observation with the same propensity score.

I

Hence, if you want to be able to replicate your results, it is critical to ensure that observations in the dataset are randomly ordered.

I

To do so, type the following command: generate random=runiform() sort random 44 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores

I

With replacement, an untreated individual can be used more than once as a match.

I

Matching with replacement involves a trade-off between bias and variance: I

the average quality of matching increases and the bias decreases;

I

the number of distinct non treated individuals used to construct the counterfactual outcome decreases and therefore the variance increases.

45 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores

I

Conducting NN matching with replacement is of particular interest with data where the propensity score distribution is very different in the treatment and the control group.

I

For example, if we have a lot of treated individuals with high propensity scores but only few comparison individuals with high propensity scores, we get bad matches as some of the high-score treated individuals will get matched to low-score non treated individuals.

I

In this case NN matching with replacement may be a good solution.

46 / 71

4. Propensity score matching: practice 4.6. Step 6: Matching treated with non treated individuals based on propensity scores I

Note that one can use more than one NN, what is called oversampling (in this case, the matching is performed with replacement).

I

Finally, it is worth emphasizing that NN matching faces the risk of bad matches if the closest neighbour is far away.

I

This can be avoided by imposing a tolerance level on the maximum propensity score distance (what is called caliper): I

bad matches are avoided and the matching quality rises (bias decreases);

I

however, if fewer matches can be performed (as compared to the situation where no tolerance level is imposed), the variance of the estimates increases. 47 / 71

4. Propensity score matching: practice 4.7. Step 7: Ensuring that the balancing assumption is satisfied after matching

I

Check whether the observable characteristics of treated and non treated individuals that were matched during the matching procedure are indeed similar.

I

To do so, one needs to rely on the command pstest (after psmatch2).

I

The difference of means analysis is provided after maching.

I

For good balancing, it should be non significant.

48 / 71

4. Propensity score matching: practice 4.7. Step 7: Ensuring that the balancing assumption is satisfied after matching

I

At the end of the output of pstest, the mean of the absolute value of the “standardized percentage bias” after matching is provided.

I

This bias should be less than 5%.

I

If it is greater than 5%, go back to step 6 (find another way of matching treated and non treated individuals), or to step 2 if necessary.

49 / 71

PSM: summary I

You can think of relying on propensity score matching if a rich and large dataset is available.

I

However, this should be your least preferred option.

I

The unconfoundedness option is indeed very difficult to buy... no empirical research based on PSM is published in the highest ranking journals anymore.

I

At any rate, if you rely on PSM, you must do so in a very rigorous way (i.e. implement each of the steps above mentioned very carefully).

50 / 71

PSM: References I

Dehejia, Rajeev H. and Sadek Wahba. 1999. Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. Journal of the American Statistical Association 94(448): 1053-1062.

I

Lalonde R. 1986. Evaluating the econometric evaluations of training programs. American Economic Review 76(4): 604-620.

I

Rosenbaum, P. and Rubin, D. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70(1): 41-50.

51 / 71

5. Regression Discontinuity Design (RDD)

I

The RDD is considered by scholars as an evaluation strategy that provides results which are as compelling as the estimates derived from randomized experiments, knowing that randomized experiments are widely seen as the gold standard of impact evaluation.

I

Therefore, once one is aware of the specific features of the RDD, it is critical to know when and how this valuable evaluation strategy can be used.

52 / 71

Regression Discontinuity Design

1. What are the specific features of the RDD? 2. When can one implement the RDD? 3. How must one implement the RDD?

53 / 71

5.1. What are the specific features of the RDD? I

Like randomized experiments, the RDD consists in comparing 2 outcome variables: the outcome variable related to individuals who were treated (i.e.: who received a treatment) and the outcome variable related to individuals who were not treated (i.e.: who did not receive the treatment).

I

Yet, the RDD has 2 specific features.

I

First, the treatment of the population depends on whether an observed variable exceeds a critical value denoted c, knowing that this variable is not orthogonal to the observed and unobserved characteristics of individuals.

54 / 71

5.1. What are the specific features of the RDD?

I

This variable is called the “assignment” variable or the “forcing” variable.

I

Denote X this assignment (or forcing) variable.

I

The first specific feature of the RDD is therefore given by:  D = 0, if X < c D = 1, if X ≥ c, where D = 0 means that the population is not treated and D = 1 means that the population is treated.

55 / 71

5.1. What are the specific features of the RDD? I

For instance, the RDD was first used by Donald L. Thistlethwaite and Donald T. Campbell (Journal of Educational Psychology, 1960) in order to analyze the impact of merit awards on future academic outcomes (career aspirations, enrollment in postgraduate programs, etc.).

I

The RDD was particularly suitable to this research objective since the allocation of the merit awards depended on individuals’ observed test scores: I

I

Students with test scores X greater than or equal to a cutoff value c received the award; Students with test scores X below the cutoff c were denied the award.

56 / 71

5.1. What are the specific features of the RDD?

I

Clearly, the assignment (or forcing) variable that consists in individuals’ test scores is not orthogonal: I

either to individuals’ observed characteristics (for instance, the socio-economic status of their parents);

I

or to individuals’ unobserved characteristics (for instance, their Intelligence Quotient, their “taste for working”...etc).

57 / 71

5.1. What are the specific features of the RDD?

I

Second, the RDD estimates the causal impact of the treatment (e.g.: receiving the merit award) on the outcome (e.g.: academic achievements) by computing the difference between the outcome of treated individuals characterized by an X located just above c and the outcome of non treated individuals characterized by an X located just below c.

58 / 71

5.1. What are the specific features of the RDD? I

Assume that the relationship between y (academic achievements) and X (test scores) is as follows:

59 / 71

5.1. What are the specific features of the RDD? I

I

To measure the causal impact of the merit award, the RDD consists in computing, for the same individual, the y he would get with and without the treatment. To do so, the RDD focusses on individuals who score c and reasons as follows: I

I

I

B 0 (that is related to a score c 0 located just above c) would be a reasonable guess for the value of y of the individual scoring c in case he receives the treatment; A00 (that is related to a score c 00 located just below c) would be a reasonable guess for the value of y of the individual scoring c in the counterfactual case where he doesn’t receive the treatment.

As a consequence, the RDD considers B 0 − A00 = τ as the causal impact of merit awards on academic achievements.

60 / 71

5.2. When can one implement the RDD?

I

Two conditions must be satisfied.

I

First and obviously, the treatment of the population must depend on whether an observed variable exceeds a critical value denoted c.

I

Second, for τ to be considered as capturing the impact of the merit award on academic achievements, one must make sure that individuals do not have a precise control on the assignment (or forcing) variable.

61 / 71

5.2. When can one implement the RDD? I

If individuals have a precise control on the assignment (or forcing) variable, this means that individuals of different types (characterized by different sets of observed and unobserved characteristics) will reach distinct outcomes.

I

More precisely, individuals on one side of the cutoff c (i.e: at X = c 00 = c − e when e → 0) will be systematically different from those on the other side (i.e: at X = c 0 = c + e when e → 0), both with respect to observed and unobserved characteristics.

I

Let’s call individuals who, given their characteristics, reach X = c 00 for sure “type A individuals” and individuals who, given their characteristics, reach X = c 0 “type B individuals”. 62 / 71

5.2. When can one implement the RDD?

I

Put differently, when individuals have a precise control over the assignment (or forcing) variable, it is not possible to attribute the jump in y (the fact that y is a discontinuous function of the test score) to the impact of the merit award only.

I

Indeed, the jump in y also reflects the jump in individuals’ observed and unobserved characteristics in that case.

63 / 71

5.2. When can one implement the RDD?

I

On the contrary, if individuals have no precise control on the assignment (or forcing) variable, then τ can be considered as the impact of the merit award on academic achievements.

I

The expression “no precise control” means that individuals have an imprecise control on the assignment (or forcing) variable.

64 / 71

5.2. When can one implement the RDD? I

In other words, among those scoring near the threshold, it is a matter of “luck” as to which side of the threshold they land.

I

Put differently, type A individuals have the same probability as type B individuals to be just above rather than just below the threshold.

I

This allows to say that those who marginally fail (those characterized by a grade just below the cutoff) and those who marginally pass (those characterized by a grade just above the cutoff) are identical .

I

This is the reason why, if the “no precise control” assumption is satisfied, the RDD is considered as a local randomized experiment: in the neighborhood of the cutoff, being treated or not is orthogonal to observed and unobserved characteristics. 65 / 71

5.3. RDD in papers of development economics Asadullah (2005), “The effect of class size on student achievement: evidence from Bangladesh”, Applied Economics Letter

I

In Bangladesh, a Ministry of Education (MoE) circular maintains that registered secondary schools can recruit a new teacher if class enrolment exceeds 60

I

Such a teacher allocation rule results in an abrupt drop in class size whenever observed grade enrolment exceeds 60 or an integer multiple of 60 → discontinuity

I

The true causal effect is recoverable if one uses the class size predicted by the rule as an instrument for observed class size in the achievement function

I

Effect on aggregate pass rate

66 / 71

5.3. RDD in papers of development economics Asadullah (2005), “The effect of class size on student achievement: evidence from Bangladesh”, Applied Economics Letter

ˆ j10 + ∑ φ SchTypeij + ej Pj = α + θCompj + δEj10 + βIV + CS i ij

where Pj aggregate pass rate in SSC examination in the jth school (fraction of grade 10 students passing the examination by securing more than 60% marks); ˆ j10 Compj competition index; Ej10 Total enrolment in grade 10; CS instrumented class size for grade 10 in jth school;SchTypeij School type (Public, private aided, girls, boys, co-education and double shift) of jth school. I Instrument for class size P − Csizej is a prediction of class size as a

discontinuous function of Ej10 P − Csizej = Ej10 /[(Ej10 − 1)/C max ] + 1

67 / 71

68 / 71

5.3. RDD in papers of development economics Edmonds (2004), “Does Illiquidity alter Child labor and schooling decisions? Evidence from Households responses to anticipated cash transfers in South Africa”, NBER WP 10265 I

The response of child labour supply and schooling attendance to anticipated social pensions income in South Africa I

Pension benefits are largely determined by age for black population (extension of the Old Age Pension (OAP) program after the end of apartheid)

I

The paper uses the age discontinuity in the pension benefit formula for identification

I

More precisely, the paper examines the response of child labour to the timing of income by comparing child labour supply and schooling in households that are eligible for the OAP to households that are not eligible

I

The paper finds large changes in child labor and schooling when an anticipated income is received, a finding consistent with the presence of liquidity constraints 69 / 71

70 / 71

71 / 71

72 / 71

Conclusion I

The RDD provides a highly credible and transparent way of estimating treatment effects.

I

You can rely on it as a substitute to randomized experiments as soon as there exists an assignment (or forcing) variable on which individuals have no precise control.

I

Think about using it as an impact evaluation strategy for your Master 2 dissertation!

I

For examples of applications of the RD design in economics, check page 339 to 342 of Lee and Lemieux (Journal of Economic Literature, 2010). 73 / 71