npbr - Page Web de Thibault Laurent

Jul 13, 2018 - Presentation of the package npbr. Package on CRAN. On MAC OS: install the glpk library outside of R, using homebrew. (brew install glpk).
339KB taille 0 téléchargements 306 vues
Overview of the methods implemented in npbr

Presentation of the package npbr

npbr: A Package for Nonparametric Boundary Regression in R Abdelaati Daouia1 Thibault Laurent1,2 Hohsuk Noh3 1 TSE:

Toulouse School of Economics, University of Toulouse Capitole, France 2 CNRS 3 Sookmyung Women’s University

13th

Thibault LAURENT Nonparametric Boundary Regression in R

useR!2018 July 2018, Brisbane, Australia

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

What does “boundary regression” mean ?

OLS regression Boundary regression

6 5

Sales

Theory of production (economics) ●



4







3











2





1























2

4

6

X : input Y : output n observations (x1 , y1 ), . . . , (xn , yn )



Def: maximum producible quantity of Y for any given quantity of X

8

ϕ: boundary (or frontier)

0 0



Number of employees

Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Example of data ●

● ● ●

212





9.0



10







● ●











● ●

● ●

8

6 ● ●

4

208



● ●

2 ●

206



1980

1990



2000

year

8.0 ● ●

7.5

● ● ● ●

●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●

● ●

Deliveries





210

Activity produced







● ●



8.5

7.0

● ● ●







● ●



● ●

● ● ● ●







6.5

● ● ●

0

2010

● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ●●● ● ● ●● ●●● ●● ● ● ●● ●●● ●● ●●● ● ● ●● ● ●● ● ● ● ●●● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●● ●







● ●







● ● ●●

●●







● ●

1970







1500m records



12

● ●





0

2

4

6

8

Activity of the controllers

1000

2000

3000

4000

Number of employees

Annual sport records: data("records") European air controllers: data("air") French postal services: data("post") Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Constrained boundary regression ●

● ● ●

212





9.0



10











● ●











● ●

● ●

6 ● ●

4

208



●● ●

● ●



1990



2000

year

●● ●

● ●

7.5

● ● ●

7.0

● ●

● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●

2 ●

206

8.0



● ●

1980



8 Deliveries





8.5



● ●

210

Activity produced



1970







● ●



● ●



● ●



● ●



● ●

● ● ● ●







6.5

● ● ●

0

2010

● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●● ● ● ●● ● ● ● ●● ● ●● ● ●●● ●● ●●● ● ● ●● ● ●● ● ● ● ●●● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●● ●





● ● ●●

●●







● ●

1500m records



12

● ●





0

2

4

6

8

Activity of the controllers

1000

2000

3000

4000

Number of employees

Unconstrained: method = "u" Monotone: method = "m" Monotone and concave: method = "mc" Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Parametric model: polynomial estimators Model structure: ϕθ (x) = θ0 + θ1 x + . . . + θp x p ˆ = (θ ˆ0 , . . . , θ ˆp ) s.t. that ϕˆ Optimization problem: Find θ θ envelopes the full data and minimizes the area under its graph (Hall, 1998) 12

p=1 p=2 p=3

Activity produced

10



8

6 ●



4



● ● ● ●

●● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ●

2



● ● ●●● ●

0 0

2

4

6

8

Activity of the controllers

Choose p which minimizes the AIC/BIC (Daouia, 2015) Easily implemented, but no constained version Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Nonparametric model: FDH, DEA Nonparametric: model structure is not specified a priori but is instead determined from data Example: Free Disposal Hull (Deprins, Simar and Tulkens, 1984) or Data Envelopment Analysis (Farrell, 1957) ●

10

8

8 Activity produced

Activity produced



10

6 ● ● ●

4





● ●

●● ●

● ● ●

4



● ●

● ●

●●● ● ● ●● ● ● ● ● ●● ● ● ●

2

● ● ● ● ●● ●

0



● ●

●● ●



●●● ● ● ●● ● ● ● ● ●● ● ● ●

2

6



● ● ● ● ●● ●

2

4

Activity of the controllers

6

0

2

4

6

Activity of the controllers

Very famous and popular in the economic litterature, but too sensitive to the extreme values Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Nonparametric model: Local maximum frontier estimators ˜ (x) = max yi 1I{|xi −x|≤h} (Gijbels and Peng, 2000) ϕ i=1,...,n

A data-driven rule for selecting h: package np (Li, Lin and Racine, 2013) ●

h = 500



9.0

● ●



● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●● ● ● ●● ●● ●●● ● ●● ● ● ● ●● ●● ● ● ●● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●

8.0

7.5

7.0

●●







8.5

Deliveries









● ● ●





● ●



6.5

● ● ● ●

1000

2000

3000

4000

Number of employees

No constrained version and h should depend on x Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Nonparametric model: Localized linear fitting ˆ n,LL (x) = min{z : there exists θ s.t. yi ≤ z + θ(xi − x) for all ϕ i s.t. xi ∈ (x − h, x + h)}, (Hall et al., 1998) Optimal h: Hall and Park (2004) ●

h = 1000



9.0





● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●● ● ● ●● ● ● ●●● ● ● ● ● ●● ●●● ●● ●●● ● ●● ●●● ● ●● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ●●● ●

8.0

7.5

7.0

●●







8.5

Deliveries



● ●





● ● ●





● ●



6.5

● ● ● ●

1000

2000

3000

4000

Number of employees

No constained version Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Nonparametric model: Quadratic (or Cubic) spline fitting Daouia et al. (2016) Denote a partition of [a, b] by a = t0 < t1 < . . . < tkn = b, with a = min xi and b = max xi by considering the j/kn th quantiles i

i

tj = x[jn/kn ] of the distinct values of xi for j = 1, . . . , kn − 1 Let N = kn + 2 and π(x) = (π0 (x), . . . , πN−1 (x))> be the vector of normalized B-splines of order 2 (or 3) based on {tj } ˆ n (x) = π(x)> α ˆ , where α ˆ minimizes the same objective ϕ ˜ subject to the same envelopment constraints function as α and the additional monotonicity constraints π0 (tj )> α ≥ 0, j = 0, 1, . . . , kn , with π0 being the derivative of π. Number of inter-knot segments kn : AIC or BIC Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Quadratic spline fitting









8.0

7.5

7.0





9.0





● ●

8.0

7.5

7.0



6.5





● ●

2000

3000

4000

7.5

7.0









● ● ●





● ●



6.5

● ●

Number of employees

8.0

●●





● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●●● ● ●● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●



● ● ●



1000

● ● ● ●

8.5



● ●

9.0



● ● ●



6.5

● ● ●



● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●●● ● ●● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●



●●







8.5



● ●



● ● ●

Deliveries

Deliveries

8.5

●●





● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●● ●● ●●● ●● ● ●● ●●● ● ● ●● ●● ●●● ● ● ● ● ●● ●●● ●● ●●● ● ●● ●●● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●



● ●

● ● ●

Deliveries

9.0



1000

2000

3000

Number of employees

4000

1000

2000

3000

4000

Number of employees

Unconstrained (kn = 3) Monotone (kn = 2) Monotone and concave (kn = 1)

Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Package on CRAN

On MAC OS: install the glpk library outside of R, using homebrew Toulouse School of Economics, CNRS (brew install glpk)

Thibault LAURENT

Nonparametric Boundary Regression in R

Overview of the methods implemented in npbr

Presentation of the package npbr

npbr functions Function dea_est loc_est loc_est_bw poly_est poly_degree dfs_momt dfs_pick rho_momt_pick kopt_momt_pick dfs_pwm_regul loc_max pick_est quad_spline_est quad_spline_kn cub_spline_est cub_spline_kn kern_smooth kern_smooth_bw Thibault LAURENT

Description DEA, FDH and linearized FDH Local linear fitting Bandwidth choice for local linear fitting Polynomial estimation Optimal polynomial degree selection Moment type estimation Pickands type estimation Conditional tail index estimation Threshold selection for moment/Pickands frontiers Nonparametric frontier regularization Local constant estimation Local extreme-value estimation Quadratic spline fitting Knot selection for quadratic spline fitting Cubic spline fitting Knot selection for cubic spline fitting Nonparametric kernel boundary regression Bandwidth choice for kernel boundary regression

Nonparametric Boundary Regression in R

Reference Farrell (1957), Deprins et al. (1984), Hall and Park (2002) Hall et al. (1998), Hall and Park (2004) Hall and Park (2004) Hall (1998) Daouia et al. (2015) Daouia et al. (2010), Dekkers et al. (1989) Daouia et al. (2010), Dekkers et al. (1989) Daouia et al. (2010), Dekkers et al. (1989) Daouia et al. (2010) Daouia et al. (2012) Gijbels and Peng (2000) Gijbels and Peng (2000) Daouia et al. (2015) Daouia et al. (2015) Daouia et al. (2015) Daouia et al. (2015) Parmeter and Racine (2013), Noh (2014) Parmeter and Racine (2013), Noh (2014) Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Usage poly_est(xtab, ytab, x, deg, control = list("tm_limit" = 700)) xtab: a numeric vector containing the observed inputs x1 , . . . , xn . ytab: a numeric vector of the same length as xtab containing the observed outputs y1 , . . . , yn . x: a numeric vector of evaluation points in which the estimator is to be computed. deg: an integer (polynomial degree). control: a list of parameters to the GLPK solver.

Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Example of use

require("npbr") data(air) x_air + R> + R> + R> + R>

6 ● ● ●

4





● ●

●● ●



●● ● ● ● ● ●● ● ● ●● ● ●

2

● ●

●● ●● ●● ●

0

● ●

2

4

6

input

Toulouse School of Economics, CNRS

Overview of the methods implemented in npbr

Presentation of the package npbr

Conclusion

Available on CRAN Article published in Journal of Statistical Software (2017), http://dx.doi.org/10.18637/jss.v079.i09 Numerical illustrations given in the article: cubic spline fitting seems to be the best method whatever the shape of the data Hope to see you in UseR!2019, Toulouse, France, next year

Thibault LAURENT Nonparametric Boundary Regression in R

Toulouse School of Economics, CNRS