Robust Hedonic Price Regressions - Olivier Gergaud's Homepage

Jul 27, 2006 - ing card game and show that the suggested methodology should be preferred to OLS. ..... vance of hedonic price indices. The case of ...
205KB taille 8 téléchargements 247 vues
Robust Hedonic Price Regressions: Unmasking Outliers is not Witchcraft C. Dehon, O. Gergaudyand V. Verardiz July 27, 2006

Abstract Hedonic regressions are intensively used in economics to decompose the price of a good by valueing each of its constituent characteristics independently. The estimations are generally run using ordinary least squares (OLS). Unfortunately, since it often happens that some individuals (which we call superstars) are rewarded at higher levels than their competitors, OLS estimations will be distorted by the presence of these "outliers" and this will lead to non representative results. In this paper we present a methodology to overcome the latter problem using two robust estimation methods (LTS and M-GM). To illustrate our procedure, we use an original dataset on the Harry Potter trading card game and show that the suggested methodology should be preferred to OLS. Keywords: Robust Statistics, Hedonic Pricing Models, Superstars JEL Classi…cation codes: C49, D49, Z11 ECARES (European Center for Advanced Research in Economics and Statistics). CP 114, Av. F.D. Roosevelt, 50. B-1050 Brussels, Belgium. Tel: +32-2-6503858. Email:[email protected]. y OMI, Université de Reims Champagne-Ardenne and TEAM, Université de Paris I. Address: 57 bis, rue Pierre Taittinger, 51096 Reims Cedex, France. Tel.: + 33 (0) 3.26.91.38.56, Fax: + 33 (0) 3.26.91.38.69, E-mail : [email protected]. z ECARES (European Center for Advanced Research in Economics and Statistics) and CEE (Centre de l’Economie de l’Education). CP 139, Av. F.D. Roosevelt, 50. B-1050 Brussels, Belgium. Tel: +32-2-6504498. E-mail : [email protected].

1

1

Introduction

Rosen’s (1974) Hedonic Price Method (HPM), originally designed for the pure competition case, has been intensively used to model prices of goods and services for which information is imperfect, quality is subjective and talent is not fully observable1 . Some extreme examples are Harchaoui and Hamdad (2000) in the …eld of music, Combris et al. (1997) in that of wines, Chanel et al. (1996) for paintings and Gergaud and Verardi (2006a) for restaurants. Interestingly, in all of these examples, some cultural icons (or superstars) emerge and enjoy extreme incomes (or prices): for example, Madonna earned $50 million dollars in 20052 , Le garçon à la pipe by Picasso is valued at $93 million3 , a bottle of Château Margaux 1900 can be sold at $7,6004 and a meal for one at restaurant Plaza Athénée (in Paris) costs approximately 300 euros5 . 1

Throsby (1983) de…nes objective quality for the live performing arts by a vector of

characteristics including the repertoire classi…cation, standards of performance, production and design, standards of comfort, seating, acoustics, etc. Hamlen (1991, 1994) even used the vibrato of pop singers as an index of their objective talent. 2

According to Forbes’Celebrity 100 list.

3

According to Artprice.com.

4

According to Decanter.com.

5

According to Alain-ducasse.com

2

This phenomenon of the emergence of superstars was analyzed intensively both theoretically and empirically. From a theoretical viewpoint, Rosen (1981), Adler (1985) and MacDonald (1988), explain it either by a higher level of talent or by the need of consumers to share a common culture. Empirically, several articles in Arts and Sports6 and more recently even in Leisure Games7 have tested which of the two competing theories (Rosen-Mac-Donald vs. Adler) …ts best, leading to the conclusion that these theories are complementary rather than substitutes. Surprisingly, the way in which the existence of superstars a¤ects the intensively used HPM methodologically, has not really been studied in depth. In HPM modeling, the idea is to measure implicit prices of attributes (of vertically di¤erentiated products) awarded by buyers. Econometrically, it consists in running a regression of the observed price p on its objective characteristics z = (z1 ; z2; :::; zn ), where zj measures the amount of the jth characteristic contained in the good. As mentioned by Lévy-Garboua and Montmarquette (2003), “Objective characteristics have been extensively used as regressors in hedonic price functions (Rosen 1974) but they fall short in the 6

See e.g. Lucifora and Simmons (2003), Blass (1992), Chung and Cox (1994) and

Hamlen (1991, 1994). 7

Gergaud and Verardi, 2006b.

3

prediction of superstars à la Rosen (1981) and MacDonald (1988)”. Indeed, if superstars exist, they are, by nature, outliers and attract the OLS regression hyperplane towards them, thereby biasing the estimates. Diewert (2003) even states that the presence of “outliers”is one of the most important issues that needs to be resolved before hedonic regressions can be routinely applied. So, how should superstars be treated? The idea is to identify superstars and weight them di¤erently from the bulk of data. Unfortunately, classical method for outlier diagnostics8 fail to identify them properly and robust methodologies are needed. In this paper, we propose some robust methods to both clearly identify outliers and correctly estimate the coe¢ cients in a hedonic pricing framework. We will concentrate on methodologies that are well-suited to deal with the fact that, quite often, a large proportion of explicative variables in HPM are qualitative. To illustrate our procedure, we use an original dataset on a good for which all characteristics are fully observable and price is common knowledge; this allows to concentrate on the speci…c e¤ect of outliers while avoiding the problems related to the di¢ culties in measuring talent and obtaining data on incomes. The dataset is based on the Harry Potter trading card game 8

Such as standardized residuals, studentized residuals, Cook distances, Leverage etc.

4

(Harry Potter TCG). The paper is organized as follows: section 2 presents the methodology needed to estimate hedonic price models taking into account the existence of outliers, Section 3 describes the data and the results are presented in Section 4. Finally in Section 5, we conclude.

2

The methodology

As stated in the introduction, hedonic pricing modeling consists in regressing the price of a good on its attributes. More speci…cally, it consists in estimating a relation of the type 0 pi = z1i

1

0 + z2i

2

+ "i

i = 1; : : : ; n

(1)

where z1 is a set of qualitative variables and z2 is a set of quantitative variables, and (z1 ; z2 ) are independent of the error term ". The variance of " is denoted by

2

.

As mentioned above, it is well-known that the classical OLS estimator is not robust to the presence of outliers. Since outliers appear in almost all hedonic pricing models, due to the existence of superstars, we suggest that OLS cannot be used and propose some alternatives methods based on robust 5

estimators. Two estimators will be considered: the …rst is the Least Trimmed of Squares (LTS), proposed by Rousseeuw (1983), given by [(1

( ^ 1 ; ^ 2 ) = arg min 1; 2

where ri = yi

2 y^i are the residuals, r(1)

residuals and [(1

X)n]

2 r(i)

(2)

i=1

:::

2 are the ordered squared r(n)

)n] is the percentage of data on which the estimation is

run ( is the percentage of trimming (with 0
1) quantitative variables are present, di is de…ned as M

0 ^ z1i 1 j di = M s(~ z2 (a) z10 ^ 1 )

j~ z2i (a)

(7)

where, z~2i (a) = z2i a is the projection for each direction a 2 Rp2 : Finally, the distances are standardized by a robust measure of scale s. As in Maronna and Yohai (2000), the scale estimator we use is the slight modi…cation of the median absolute deviation (MAD) proposed by Tyler (1994). More generally, OLS, LTS, M and M-GM can be seen as solutions to arg min 1; 2

where

(r (

1;

2 ))(1)

:::

J X

[ (r (

1;

2 ))](i) wi

(8)

i=1

(r (

1;

2 ))(n)

are ordered transformed resid-

uals. For OLS and LTS, the weights attributed to the transformed residuals are equal for all individuals and the objective function ( ) is the square function. In the case of LTS, nevertheless, we only consider the subset of the data, that leads to the smallest aggregate sum of squared residuals (in our paper we trim 25% of the data). In the case of M and M-GM estimators, we use the bounded Tukey biweight as the objective function . For M, weights are the same for all individuals while for M-GM, they vary over individuals. For the sake of clarity, we summarize, the di¤erent cases presented here 9

above in Table 1. Table 1: Estimators Method

J

OLS

n

(r) = r2

Constant

LT S

h

(r) = r2 8 > > < (r) = > > : 8 > > < (r) = > > :

Constant

n

M

M

GM

n

Objective function

c2 6

1

wi

3 r 2 c

1 c2 6

c2 6

1

1

Once the regression parameters ( 1 ;

if jrj

c

if jrj

c

Constant

if jrj > c 3 r 2 c

M

min(1; a=

0 ^ j jz2i z1i 1 M ) s(z2 z 0 ^ )

c2 6

if jrj > c

2)

and the scale parameter of resid-

uals ( ) are estimated, standardized residuals can be easily computed and used to detect outlying observations. Since OLS and M have been proven to be non robust estimators (see Rousseuuw and Leroy, 1987), they cannot be used for this purpose. On the contrary, LTS and M-GM have been proven to be very robust against both vertical outliers and bad leverage points15 . Unfortunately the gain in robustness comes with a loss in e¢ ciency. A usual 15

Vertical outliers represent individuals characterized by standard right hand side vari-

ables but outlying observations in the y dimension. Bad leverage points represent individuals characterized by outlying observations in the x dimension that are not on the true regression hyperplane.

10

1 1

way to overcome this problem is to use a one-step weighted least squares (W LS) estimator where the weights

i

are attributed according to the de-

gree of outlyingness identi…ed by estimated robust standardized residuals. They are generally awarded following two weighting criteria, a hard one and a soft one. The weighting functions used in this paper are respectively 8 > > > 3:5 0 if r^i > > > < ri ri < 3:5 3:5 if 2:5 i = ^ ^ > > > > > > : 1 if r^i < 2:5 for the soft rejection criterion and 8 > > < 0 if = i > > : 1 if

ri ^

3

ri ^