Can Google Help Now- or Fore-casting French Unemployment?

Seminal paper by Choi & Varian (2009) highlights tree points: – Models including relevant Google Trends variables tend to outperform models ignoring them in ...
607KB taille 1 téléchargements 235 vues
Can Google Help Nowor Fore-casting French Unemployment?

Y. Fondeur 3

1 - EPEE-TEPP, université d’Evry Val d’Essonne (91) 2 - DYNARE Team, CEPREMAP (75) 3 - Centre d’Etude de l’Emploi (93)

F. Karamé 1,2,3

Introduction



Economic time-series are published with a significant delay and may still be revised afterwards.



Unemployment data are obviously subject to such delays.



In France, unemployment is published on a monthly basis using an administrative source, the claimant count (“Demandeurs d’Emploi en Fin de Mois”, DEFM hereafter).



Available on the 24th of the following month.

ETEPP 2011

Can Google Help… ?

2



Due to publication delay, providing real-time estimation of unemployment dynamics is a real stake.



Growing literature about nowcasting, i.e. predicting the present: – – – –

Giannone et al. (2008), Schumacher & Breitung (2008), Castle et al. (2009), Doornick (2008, 2009)



Nowcasting needs new kind of data, really real-time data.



In the query for such data, Internet may be a precious tool.

ETEPP 2011

Can Google Help… ?

3



Google Inc. publishes every monday weekly data for the volume of search for keywords.



In France, Google has a stable 90 % market share for several years (vs 60 % in 2006 and 70 % in 2010 in the US)



Seminal paper by Choi & Varian (2009) highlights tree points: – Models including relevant Google Trends variables tend to outperform models ignoring them in terms of predictions. – The gain can even reveal quite substantial in some cases. – In their research agenda, they also insist on the possibility to predict turning points.

ETEPP 2011

Can Google Help… ?

4



The aim of our paper is to apply this kind of approach to French DEFM to produce better forecasts and/or nowcasts.



Well-chosen keywords may be connected with the online job search behaviors of employed or unemployed people or capture the way firms are concerned by labor market situation.



We use unobserved variables approach to disentangle the components of variables and identify potential relations between some of them: the evolutions of their respective trends for instance.



In this paper, we model the DEFM slope as a function of the Google data slope.



The model is estimated with a modified version of the Kalman filter taking into account non-stationarity and multiple frequencies in our data.

ETEPP 2011

Can Google Help… ?

5

Plan

1.

Some examples in the literature

2.

Keywords and description of our dataset

3.

The model

4.

The estimation results

5.

The out-of-sample forecasting exercise

ETEPP 2011

Can Google Help… ?

6

The literature

Choi & Varian (2009) give examples of nowcasting for car and home sales in the US, and for travel to Hong-Kong.

Provide better predictions by 10% to 25%.

ETEPP 2011

Can Google Help… ?

7

Several papers use Google search data for influenza virus surveillance: Ginsberg et al. (2009), Doornik (2009).

ETEPP 2011

Can Google Help… ?

8



Askitas & Zimmermann (2009) for the unemployment rate in Germany.



They use four groups of keywords: a single keyword meaning “unemployment rate”.



two keywords related to the German federal employment agency

expected to be connected with people having contacted or being in the process of contacting the employment agency

two keywords relative to HR consulting

expected to proxy high-skilled workers reacting for fears of layoffs and companies preparing layoffs or reorganizations

eight keywords corresponding to the most popular job boards in Germany (“Stepstone” OR “Jobworld” OR “Jobscout” OR “Meinestadt” OR “meine Stadt” OR “Monster Jobs” OR “Monster.de” OR “Jobboerse”)

expected to capture job searching activities

The four resulting variables are used as regressors in a monthly model aiming at forecasting unemployment rate. ETEPP 2011

Can Google Help… ?

9



We choose a single and easy to interpret keyword: “EMPLOI” (“job”).



The main reason is the emergence of Pole Emploi in 2009, resulting from the merger of the ANPE (the national employment agency in charge of jobseekers assistance) and ASSEDIC (the unemployment compensation agencies).



Google activity along this term is expected to be directly connected with job searches, as it is the simplest way to find websites where jobs are posted. It may also reflect a more general concern about labor market situation, particularly when the situation deteriorates (or is expected so). 4100

95 85 75

Week 1

Week 2

Week 3

Week 4

3900

Week 5 3700

65 55 45

3500

3300

35 3100

25 15 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

ETEPP 2011

2900 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

Can Google Help… ?

10

The monthly data 95 85 75

Week 1

Week 2

Week 3

Week 4

Week 5

65 55 45 35 25 15 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

800

750

2800

700

2600

700

650

2400

600

600

550

2200

500

500

400 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

ETEPP 2011

2000 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

Can Google Help… ?

450 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10

11

The current literature using Google data mostly use standard time series models like AutoRegressive representations specified with automatic algorithms like Gets for instance.

Problems: • This approach kills long-term and seasonal information by first and seasonal differences. • It does not deal with this multi-frequency issue like weekly and monthly data. => The dataset is generally ‘impoverished’ by retaining the monthly frequency and sometimes using only one or two Google series (see Choi & Varian 2009, Doornik 2009 for instance).

How circumvent this limitation? ETEPP 2011

Can Google Help… ?

12

The weekly data

95

800

95

2800

95

800

75

700

75

2600

75

700

55

600

55

2400

55

600

35

500

35

2200

35

500

15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09

400

15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09

2000

15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09

400

ETEPP 2011

Can Google Help… ?

13



DEFM series are monthly while the Google series is weekly.



There is a clear seasonal pattern, with an obvious break for the Google index from 2009; we then need a flexible representation of seasonality; 120 2004

2005

2006

2008

2009

2010

2007

100

80

60

40

20 1

5

9

13

17

21

25

29

33

37

41

45

49

53



Data are non-stationary;



It can be interesting to find a relation between the trends of the data that seem to be strongly related;



The Google index may contain an important noise since it may also include search queries unrelated to the labor market.

=> We choose unobserved-components models and the Kalman filter. ETEPP 2011

Can Google Help… ?

14



Components extraction is based on relatively general specification choices and not on a priori values for traditional non-parametric filters (like HP, Bandpass, …). – more congruent with the data – allows components “mixing” in multivariate representations – allows forecasting



The decomposition is based on a maximum likelihood estimation: – standard errors for unknown parameters and parameter testing – confidence bands for unobserved variables



The data are non-stationary => diffuse Kalman filter (Durbin & Koopman 2001, 2003) – efficient estimations for parameters and evaluation for state variables. – provides a specific treatment due to the diffuse initial conditions of the filter. – Once the effect of initial conditions vanished, the filter becomes a standard Kalman filter.



We consider monthly data as partially-observed weekly data => we use the univariate treatment version of the diffuse and standard Kalman filters (Durbin & Koopman 2000, 2001). – allows evaluating the state vector by incorporating information from observables when available. – considerably speeds up the estimation of large models by manipulating scalars instead of matrices

ETEPP 2011

Can Google Help… ?

15

The model  Google t   y t = ln   DEFM  t 

t = 1,...T

y i ,t = Ti ,t + Si ,t + ε ty i

i = 1,2

ε ty i ≈ N ( 0 ,σ y2i )

S1 ,t =

[S / 2]

∑ j =1

  2 jπ  + b sinτ 2 jπ  j ,t  t a j ,t cos τ t S  S   

 a j ,t = a j ,t −1 + ε ta j   b  b j ,t = b j ,t −1 + ε t j 

S2 ,t

ETEPP 2011

S / 2   2 jπ  a j cos τ t S =  j =1    0



i = 1,2

ε t j ≈ N ( 0 ,σ a2j )

  

a

b εt j

≈ N ( 0 ,σ b2j

j = 1,...

)

  2 jπ  + b j sin τ t S  

  

Can Google Help… ?

S 2

S = 52.25

j = 1,...6

16

The point of the paper

Ti ,t = Ti ,t −1 + d i ,t −1 + ε tTi  d  d i ,t = d i ,t −1 + ε t i

ε tTi ≈ N ( 0 ,σ T2i ) ε tdi ≈ N ( 0 ,σ d2i )

i = 1,2

Integration of the potential Google effect in the DEFM T2 ,t = T2 ,t −1 + d 2 ,t + ε tT2  d d 2 ,t = α 0 + α 1 d 1 ,t + ε t 2

ETEPP 2011

ε tT2 ≈ N ( 0 ,σ T22 ) ε td2 ≈ N ( 0 ,σ d22 )

Can Google Help… ?

17

Estimation

st = g * ( s t −1 , e t ;θ ) st = At s t −1 + e t ≡   y t = m( st ,ε t ;θ )  y t = Z t s t + ε t

Kalman Filter =>

θˆ = arg max θ

ˆs = E [ s y ;θ ] t 1: t  tt  Pt t = V [ s t y 1:t ;θ ]  L ( y ;θ )  t t

t = 1,...T

T

L ( y; θ ) = arg max ∑ Lt ( yt ; θ ) θ

t =1

ˆs = E [ s y ;θˆ ] t 1:T  t T Kalman smoother =>  P = V [ s y ;θˆ ] t 1:T  t T

ETEPP 2011

Can Google Help… ?

18

The standard Kalman Filter Initialization t=t+1

ˆs  t t −1 = E [ s t y 1:t −1 ] = At ⋅ ˆs t −1 t −1  ' Pt t −1 = V [ s t y 1:t −1 ] = At ⋅ Pt −1 t −1 ⋅ At + Qt

State forecasts

ˆ ˆ  y t t −1 = E [ y t y 1:t −1 ] = Z t ⋅ s t t −1 η t = y t − ˆy t t −1  ' Gt = V [ y t y1:t −1 ] = Z t ⋅ Pt t −1 ⋅ Z t + H t

Measurement forecasts

Measurement error Log-likelihood calculation

Lt ( y t θ ) = −

N 1 1 ln( 2π ) − ln Gt − η t' ⋅ Gt−1 ⋅η t 2 2 2

State update

' ⋅ G −1 K = P ⋅ Z t t t t t −1  ˆ s t t = E [ s t y 1:t ] = ˆs t t −1 + K t ⋅η t  Pt t = V [ s t y 1:t ] = ( I − K t ⋅ Z t ) ⋅ Pt t −1

Yes t