Can Google Help Nowor Fore-casting French Unemployment?
Y. Fondeur 3
1 - EPEE-TEPP, université d’Evry Val d’Essonne (91) 2 - DYNARE Team, CEPREMAP (75) 3 - Centre d’Etude de l’Emploi (93)
F. Karamé 1,2,3
Introduction
•
Economic time-series are published with a significant delay and may still be revised afterwards.
•
Unemployment data are obviously subject to such delays.
•
In France, unemployment is published on a monthly basis using an administrative source, the claimant count (“Demandeurs d’Emploi en Fin de Mois”, DEFM hereafter).
•
Available on the 24th of the following month.
ETEPP 2011
Can Google Help… ?
2
•
Due to publication delay, providing real-time estimation of unemployment dynamics is a real stake.
•
Growing literature about nowcasting, i.e. predicting the present: – – – –
Giannone et al. (2008), Schumacher & Breitung (2008), Castle et al. (2009), Doornick (2008, 2009)
•
Nowcasting needs new kind of data, really real-time data.
•
In the query for such data, Internet may be a precious tool.
ETEPP 2011
Can Google Help… ?
3
•
Google Inc. publishes every monday weekly data for the volume of search for keywords.
•
In France, Google has a stable 90 % market share for several years (vs 60 % in 2006 and 70 % in 2010 in the US)
•
Seminal paper by Choi & Varian (2009) highlights tree points: – Models including relevant Google Trends variables tend to outperform models ignoring them in terms of predictions. – The gain can even reveal quite substantial in some cases. – In their research agenda, they also insist on the possibility to predict turning points.
ETEPP 2011
Can Google Help… ?
4
•
The aim of our paper is to apply this kind of approach to French DEFM to produce better forecasts and/or nowcasts.
•
Well-chosen keywords may be connected with the online job search behaviors of employed or unemployed people or capture the way firms are concerned by labor market situation.
•
We use unobserved variables approach to disentangle the components of variables and identify potential relations between some of them: the evolutions of their respective trends for instance.
•
In this paper, we model the DEFM slope as a function of the Google data slope.
•
The model is estimated with a modified version of the Kalman filter taking into account non-stationarity and multiple frequencies in our data.
ETEPP 2011
Can Google Help… ?
5
Plan
1.
Some examples in the literature
2.
Keywords and description of our dataset
3.
The model
4.
The estimation results
5.
The out-of-sample forecasting exercise
ETEPP 2011
Can Google Help… ?
6
The literature
Choi & Varian (2009) give examples of nowcasting for car and home sales in the US, and for travel to Hong-Kong.
Provide better predictions by 10% to 25%.
ETEPP 2011
Can Google Help… ?
7
Several papers use Google search data for influenza virus surveillance: Ginsberg et al. (2009), Doornik (2009).
ETEPP 2011
Can Google Help… ?
8
•
Askitas & Zimmermann (2009) for the unemployment rate in Germany.
•
They use four groups of keywords: a single keyword meaning “unemployment rate”.
•
two keywords related to the German federal employment agency
expected to be connected with people having contacted or being in the process of contacting the employment agency
two keywords relative to HR consulting
expected to proxy high-skilled workers reacting for fears of layoffs and companies preparing layoffs or reorganizations
eight keywords corresponding to the most popular job boards in Germany (“Stepstone” OR “Jobworld” OR “Jobscout” OR “Meinestadt” OR “meine Stadt” OR “Monster Jobs” OR “Monster.de” OR “Jobboerse”)
expected to capture job searching activities
The four resulting variables are used as regressors in a monthly model aiming at forecasting unemployment rate. ETEPP 2011
Can Google Help… ?
9
•
We choose a single and easy to interpret keyword: “EMPLOI” (“job”).
•
The main reason is the emergence of Pole Emploi in 2009, resulting from the merger of the ANPE (the national employment agency in charge of jobseekers assistance) and ASSEDIC (the unemployment compensation agencies).
•
Google activity along this term is expected to be directly connected with job searches, as it is the simplest way to find websites where jobs are posted. It may also reflect a more general concern about labor market situation, particularly when the situation deteriorates (or is expected so). 4100
95 85 75
Week 1
Week 2
Week 3
Week 4
3900
Week 5 3700
65 55 45
3500
3300
35 3100
25 15 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
ETEPP 2011
2900 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
Can Google Help… ?
10
The monthly data 95 85 75
Week 1
Week 2
Week 3
Week 4
Week 5
65 55 45 35 25 15 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
800
750
2800
700
2600
700
650
2400
600
600
550
2200
500
500
400 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
ETEPP 2011
2000 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
Can Google Help… ?
450 janv-04 janv-05 janv-06 janv-07 janv-08 janv-09 janv-10
11
The current literature using Google data mostly use standard time series models like AutoRegressive representations specified with automatic algorithms like Gets for instance.
Problems: • This approach kills long-term and seasonal information by first and seasonal differences. • It does not deal with this multi-frequency issue like weekly and monthly data. => The dataset is generally ‘impoverished’ by retaining the monthly frequency and sometimes using only one or two Google series (see Choi & Varian 2009, Doornik 2009 for instance).
How circumvent this limitation? ETEPP 2011
Can Google Help… ?
12
The weekly data
95
800
95
2800
95
800
75
700
75
2600
75
700
55
600
55
2400
55
600
35
500
35
2200
35
500
15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09
400
15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09
2000
15 janv-04 janv-05 janv-06 déc-06 déc-07 déc-08 déc-09
400
ETEPP 2011
Can Google Help… ?
13
•
DEFM series are monthly while the Google series is weekly.
•
There is a clear seasonal pattern, with an obvious break for the Google index from 2009; we then need a flexible representation of seasonality; 120 2004
2005
2006
2008
2009
2010
2007
100
80
60
40
20 1
5
9
13
17
21
25
29
33
37
41
45
49
53
•
Data are non-stationary;
•
It can be interesting to find a relation between the trends of the data that seem to be strongly related;
•
The Google index may contain an important noise since it may also include search queries unrelated to the labor market.
=> We choose unobserved-components models and the Kalman filter. ETEPP 2011
Can Google Help… ?
14
•
Components extraction is based on relatively general specification choices and not on a priori values for traditional non-parametric filters (like HP, Bandpass, …). – more congruent with the data – allows components “mixing” in multivariate representations – allows forecasting
•
The decomposition is based on a maximum likelihood estimation: – standard errors for unknown parameters and parameter testing – confidence bands for unobserved variables
•
The data are non-stationary => diffuse Kalman filter (Durbin & Koopman 2001, 2003) – efficient estimations for parameters and evaluation for state variables. – provides a specific treatment due to the diffuse initial conditions of the filter. – Once the effect of initial conditions vanished, the filter becomes a standard Kalman filter.
•
We consider monthly data as partially-observed weekly data => we use the univariate treatment version of the diffuse and standard Kalman filters (Durbin & Koopman 2000, 2001). – allows evaluating the state vector by incorporating information from observables when available. – considerably speeds up the estimation of large models by manipulating scalars instead of matrices
ETEPP 2011
Can Google Help… ?
15
The model Google t y t = ln DEFM t
t = 1,...T
y i ,t = Ti ,t + Si ,t + ε ty i
i = 1,2
ε ty i ≈ N ( 0 ,σ y2i )
S1 ,t =
[S / 2]
∑ j =1
2 jπ + b sinτ 2 jπ j ,t t a j ,t cos τ t S S
a j ,t = a j ,t −1 + ε ta j b b j ,t = b j ,t −1 + ε t j
S2 ,t
ETEPP 2011
S / 2 2 jπ a j cos τ t S = j =1 0
∑
i = 1,2
ε t j ≈ N ( 0 ,σ a2j )
a
b εt j
≈ N ( 0 ,σ b2j
j = 1,...
)
2 jπ + b j sin τ t S
Can Google Help… ?
S 2
S = 52.25
j = 1,...6
16
The point of the paper
Ti ,t = Ti ,t −1 + d i ,t −1 + ε tTi d d i ,t = d i ,t −1 + ε t i
ε tTi ≈ N ( 0 ,σ T2i ) ε tdi ≈ N ( 0 ,σ d2i )
i = 1,2
Integration of the potential Google effect in the DEFM T2 ,t = T2 ,t −1 + d 2 ,t + ε tT2 d d 2 ,t = α 0 + α 1 d 1 ,t + ε t 2
ETEPP 2011
ε tT2 ≈ N ( 0 ,σ T22 ) ε td2 ≈ N ( 0 ,σ d22 )
Can Google Help… ?
17
Estimation
st = g * ( s t −1 , e t ;θ ) st = At s t −1 + e t ≡ y t = m( st ,ε t ;θ ) y t = Z t s t + ε t
Kalman Filter =>
θˆ = arg max θ
ˆs = E [ s y ;θ ] t 1: t tt Pt t = V [ s t y 1:t ;θ ] L ( y ;θ ) t t
t = 1,...T
T
L ( y; θ ) = arg max ∑ Lt ( yt ; θ ) θ
t =1
ˆs = E [ s y ;θˆ ] t 1:T t T Kalman smoother => P = V [ s y ;θˆ ] t 1:T t T
ETEPP 2011
Can Google Help… ?
18
The standard Kalman Filter Initialization t=t+1
ˆs t t −1 = E [ s t y 1:t −1 ] = At ⋅ ˆs t −1 t −1 ' Pt t −1 = V [ s t y 1:t −1 ] = At ⋅ Pt −1 t −1 ⋅ At + Qt
State forecasts
ˆ ˆ y t t −1 = E [ y t y 1:t −1 ] = Z t ⋅ s t t −1 η t = y t − ˆy t t −1 ' Gt = V [ y t y1:t −1 ] = Z t ⋅ Pt t −1 ⋅ Z t + H t
Measurement forecasts
Measurement error Log-likelihood calculation
Lt ( y t θ ) = −
N 1 1 ln( 2π ) − ln Gt − η t' ⋅ Gt−1 ⋅η t 2 2 2
State update
' ⋅ G −1 K = P ⋅ Z t t t t t −1 ˆ s t t = E [ s t y 1:t ] = ˆs t t −1 + K t ⋅η t Pt t = V [ s t y 1:t ] = ( I − K t ⋅ Z t ) ⋅ Pt t −1
Yes t