from data to modelling - Emmanuel Rachelson

Sep 3, 2013 - mathematical reasoning which leads to dealing with quantitative aspects of ... a new medicine, marketing research and planning . . . Different.
534KB taille 3 téléchargements 346 vues
Statistics and learning An introduction: from data to modelling

Emmanuel Rachelson and Matthieu Vignes ISAE SupAero

Wednesday 3rd September 2013

E. Rachelson & M. Vignes (ISAE)

SAD

2013

1 / 13

Statistical approach A quick, partial and not very comprehensive overview I

Goal of this course: not a recipe cooking handbook, rather a path to mathematical reasoning which leads to dealing with quantitative aspects of decision making from data and still accounting for uncertainty.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

2 / 13

Statistical approach A quick, partial and not very comprehensive overview I

Goal of this course: not a recipe cooking handbook, rather a path to mathematical reasoning which leads to dealing with quantitative aspects of decision making from data and still accounting for uncertainty.

I

Application of stats (and Machine Learning): pension matters, optimal insurance premium, stock market predictions, clinical trial for a new medicine, marketing research and planning . . . Different branches: geostatistics, statistical physics . . .

E. Rachelson & M. Vignes (ISAE)

SAD

2013

2 / 13

Statistical approach A quick, partial and not very comprehensive overview I

Goal of this course: not a recipe cooking handbook, rather a path to mathematical reasoning which leads to dealing with quantitative aspects of decision making from data and still accounting for uncertainty.

I

Application of stats (and Machine Learning): pension matters, optimal insurance premium, stock market predictions, clinical trial for a new medicine, marketing research and planning . . . Different branches: geostatistics, statistical physics . . .

I

Professional AND citizen interest: ad-hoc exploitation of available data and don’t be manipulated ?!

E. Rachelson & M. Vignes (ISAE)

SAD

2013

2 / 13

Statistical approach A quick, partial and not very comprehensive overview I

Goal of this course: not a recipe cooking handbook, rather a path to mathematical reasoning which leads to dealing with quantitative aspects of decision making from data and still accounting for uncertainty.

I

Application of stats (and Machine Learning): pension matters, optimal insurance premium, stock market predictions, clinical trial for a new medicine, marketing research and planning . . . Different branches: geostatistics, statistical physics . . .

I

Professional AND citizen interest: ad-hoc exploitation of available data and don’t be manipulated ?!

I

few prerequisites: basic/intermediate maths and probability calculus.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

2 / 13

Statistical approach A quick, partial and not very comprehensive overview I

Goal of this course: not a recipe cooking handbook, rather a path to mathematical reasoning which leads to dealing with quantitative aspects of decision making from data and still accounting for uncertainty.

I

Application of stats (and Machine Learning): pension matters, optimal insurance premium, stock market predictions, clinical trial for a new medicine, marketing research and planning . . . Different branches: geostatistics, statistical physics . . .

I

Professional AND citizen interest: ad-hoc exploitation of available data and don’t be manipulated ?!

I

few prerequisites: basic/intermediate maths and probability calculus.

I

Grail: linking data to mathematical modelling, objectivelly quantify and interpret conclusions and...awareness of limitations: statistics helps but won’t make decision for you !

E. Rachelson & M. Vignes (ISAE)

SAD

2013

2 / 13

Inspiring work / our bibliography A. Baccini, P. Besse, S. Canu, S. D´ ejean, B. Laurent, C. Marteau, P. Martin and H. Milhem Wikistat, le cours dont vous ˆ etes le h´ eros. http://wikistat.fr/, 2012.

T. Hastie, R. Tibshirani and J. Friedman. Elements of statistical learning. Springer, 2nd edition, 2009. E. Moulines, F. Roueff and J.-L. Pac (and formerly F. Rossi) Statistiques. Cours TelecomParisTech, 2008.

A.B. Dufour D. Chessel J.R. Lobry, S. Mousset and S. Dray Enseignements de Statistique en Biologie. http://pbil.univ-lyon1.fr/R/, 2012.

A. Garivier Statistiques avanc´ ees. Cours Centrale 2011, 2011.

F. Bertrand Page professionnelle - Enseignements. http://www-irma.u-strasbg.fr/∼fbertran/ enseignement/, 2012.

S. Cl´ emen¸con. Apprentissage statistique. Cours TELECOM ParisTech, 2011-2012.

V. Monbet Page professionnelle - Enseignements. http://perso.univ-rennes1.fr/valerie.monbet/ enseignement.html, 2013.

S. Arlot, Francis B., O. Catoni, G. Stolz and G. Obozinski Apprentissage. Cours ENS, 2012.

S. Lousteau Page professionnelle - Enseignements. http://www.math.univ-angers.fr/ loustau/, 2013.

N. Chopin, D. Rosenberg and G. Stolz El´ ements de statistique pour citoyens d’aujourd’hui et managers de demain. Cours L3 HEC, 2012–2013.

And many others we just forgot to mention. E. Rachelson & M. Vignes (ISAE)

SAD

2013

3 / 13

From data to modelling and back

Two different situations might occur for the same modelling: I

empirical approach to gaining knowledge from an experiment repeated many times,

E. Rachelson & M. Vignes (ISAE)

SAD

2013

4 / 13

From data to modelling and back

Two different situations might occur for the same modelling: I

empirical approach to gaining knowledge from an experiment repeated many times,

I

study of a sample drawn from a population.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

4 / 13

From data to modelling and back

Two different situations might occur for the same modelling: I

empirical approach to gaining knowledge from an experiment repeated many times,

I

study of a sample drawn from a population.

Preference between two possible configurations Consumer ID Opinion

1 A

2 A

3 B

4 A

5 B

6 B

... ...

We can denote by xi successive opinions taking (binary) values “A” (= 0) or “B” (= 1). Mathematician sees that as realisation of random variables denoted Xi . E. Rachelson & M. Vignes (ISAE)

SAD

2013

4 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

I

but hey, what is p0 actually ??

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

I

but hey, what is p0 actually ??

I

all consumers cannot be interviewed for obvious reasons !

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

I

but hey, what is p0 actually ??

I

all consumers cannot be interviewed for obvious reasons ! Statistics is a sound framework to

I

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

I

but hey, what is p0 actually ??

I

all consumers cannot be interviewed for obvious reasons ! Statistics is a sound framework to

I

1. describe sample using estimates

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Localising randomness Randomness... ...arises from the choice of the questioned persons, NOT from in each actual answer. Incidental reminder: Bernouilli distribution, with parameter 0 < p < 1... I

laid question: is p0 > 1/2 or < 1/2 ? This is a test.

I

but hey, what is p0 actually ??

I

all consumers cannot be interviewed for obvious reasons ! Statistics is a sound framework to

I

1. describe sample using estimates 2. quantitatively answer the question (generalising sample to full population conclusions)

E. Rachelson & M. Vignes (ISAE)

SAD

2013

5 / 13

Modelling a mandatory step

I

we start by modelling the problem (Sample/population description, variables and their distributions).

E. Rachelson & M. Vignes (ISAE)

SAD

2013

6 / 13

Modelling a mandatory step

I

we start by modelling the problem (Sample/population description, variables and their distributions).

I

in the example: “correct model” ∈ (Bern(p))p ; n realisations (xi )i of iid (indep. & identical. distr.) random variables (Xi )i ∼ Bern (p) are available.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

6 / 13

Modelling a mandatory step

I

we start by modelling the problem (Sample/population description, variables and their distributions).

I

in the example: “correct model” ∈ (Bern(p))p ; n realisations (xi )i of iid (indep. & identical. distr.) random variables (Xi )i ∼ Bern (p) are available.

I

remember the Jean Tib´eri vs. Lyne Cohen-Solal (+ Ph. Meyer) council election in Paris in 2008 between 20.45 and 21.15 ? At 20.45, (463; 409; 106) but after counting the votes : (11, 044; 11, 269; 2, 730).

E. Rachelson & M. Vignes (ISAE)

SAD

2013

6 / 13

Modelling a mandatory step

I

we start by modelling the problem (Sample/population description, variables and their distributions).

I

in the example: “correct model” ∈ (Bern(p))p ; n realisations (xi )i of iid (indep. & identical. distr.) random variables (Xi )i ∼ Bern (p) are available.

I

remember the Jean Tib´eri vs. Lyne Cohen-Solal (+ Ph. Meyer) council election in Paris in 2008 between 20.45 and 21.15 ? At 20.45, (463; 409; 106) but after counting the votes : (11, 044; 11, 269; 2, 730).

I

Construction of confidence intervals to answer the question.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

6 / 13

Useful tools a reminder ?

I

¯ := mean (central tendency): E[X] = X

E. Rachelson & M. Vignes (ISAE)

SAD

X1 +...+Xn , n

2013

7 / 13

Useful tools a reminder ?

I

¯ := mean (central tendency): E[X] = X

I

standard deviation (dispersion tendency): p p = σ(X) := E [(X − E[X])2 ] = E[X]2 − E[X 2 ],

E. Rachelson & M. Vignes (ISAE)

SAD

X1 +...+Xn , n

2013

7 / 13

Useful tools a reminder ?

I

¯ := mean (central tendency): E[X] = X

I

standard deviation (dispersion tendency): p p = σ(X) := E [(X − E[X])2 ] = E[X]2 − E[X 2 ],

I

(almost never use skewness and kurtosis)

E. Rachelson & M. Vignes (ISAE)

SAD

X1 +...+Xn , n

2013

7 / 13

Two important probabilistic tools in statistics Law of large numbers

Theorem Let X1 . . . Xn be iid random variables with mean µ. Then the empirical mean converges in probability towards µ, i.e.: Xn :=

E. Rachelson & M. Vignes (ISAE)

1 (X1 + . . . + Xn ) −→ µ. n

SAD

2013

8 / 13

Two important probabilistic tools in statistics Law of large numbers

Theorem Let X1 . . . Xn be iid random variables with mean µ. Then the empirical mean converges in probability towards µ, i.e.: Xn :=

1 (X1 + . . . + Xn ) −→ µ. n

 In other term, for all  > 0, P | Xn − µ |>  → 0

E. Rachelson & M. Vignes (ISAE)

SAD

2013

8 / 13

Two important probabilistic tools in statistics Central limit theorem

Theorem Let X1 . . . Xn be iid random variables which admit an order 2 moment. Denote by µ and σ the corresponding mean and standard deviation, then: √ n (Xn − µ) −→ N (0, 1). σ

E. Rachelson & M. Vignes (ISAE)

SAD

2013

9 / 13

Two important probabilistic tools in statistics Central limit theorem

Theorem Let X1 . . . Xn be iid random variables which admit an order 2 moment. Denote by µ and σ the corresponding mean and standard deviation, then: √ n (Xn − µ) −→ N (0, 1). σ In the case of distribution with density functions, this means that √ P

n (Xn − µ) ≤ x σ

E. Rachelson & M. Vignes (ISAE)

Rx

 := Fn (x) −→ P (Z ≤ x) =

SAD

−z 2 /2 dz −∞ e





2013

.

9 / 13

Deciding from a sample

I

Back to the “preference” example: A vs. B.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

10 / 13

Deciding from a sample

I

Back to the “preference” example: A vs. B.

Preference between two possible configurations Consumer ID Opinion

E. Rachelson & M. Vignes (ISAE)

1 A

2 A

SAD

3 B

4 A

5 B

6 B

... ...

2013

10 / 13

Deciding from a sample

I

Back to the “preference” example: A vs. B.

Preference between two possible configurations Consumer ID Opinion I

We compute x100 =

E. Rachelson & M. Vignes (ISAE)

1 A

x1 +···+x100 100

2 A

3 B

4 A

5 B

6 B

... ...

= 0.42.

SAD

2013

10 / 13

Deciding from a sample

I

Back to the “preference” example: A vs. B.

Preference between two possible configurations Consumer ID Opinion

1 A

x1 +···+x100 100

2 A

3 B

4 A

5 B

6 B

... ...

I

We compute x100 =

I

Our intuition and the LLN tell us that p0 is “close” to 0.42.

E. Rachelson & M. Vignes (ISAE)

= 0.42.

SAD

2013

10 / 13

Deciding from a sample

I

Back to the “preference” example: A vs. B.

Preference between two possible configurations Consumer ID Opinion

1 A

x1 +···+x100 100

2 A

3 B

4 A

5 B

6 B

... ...

I

We compute x100 =

I

Our intuition and the LLN tell us that p0 is “close” to 0.42.

I

Can we conclude ? Is this estimate enough ?

= 0.42.

Let’s play around the Central limit theorem...

E. Rachelson & M. Vignes (ISAE)

SAD

2013

10 / 13

Concluding the example at the price of a slight risk √

I

we directly derive √

E. Rachelson & M. Vignes (ISAE)

n (Xn p0 (1−p0 )

− p0 ) → N (0, 1) so

SAD

2013

11 / 13

Concluding the example at the price of a slight risk √

n (Xn p0 (1−p0 )

I

we directly derive √

I

with√a confidence level (what is that ??) of 95%: √ n |Xn − p0 | ≤ u = 1.96. This is equivalent to:

− p0 ) → N (0, 1) so

p0 (1−p0 )

E. Rachelson & M. Vignes (ISAE)

SAD

2013

11 / 13

Concluding the example at the price of a slight risk √

n (Xn p0 (1−p0 )

I

we directly derive √

I

with√a confidence level (what is that ??) of 95%: √ n |Xn − p0 | ≤ u = 1.96. This is equivalent to: p0 (1−p0 )   √ √ p0 (1−p0 ) p0 (1−p0 ) √ √ p0 ∈ In := Xn − 1.96 ; Xn + 1.96 , n n

I

E. Rachelson & M. Vignes (ISAE)

− p0 ) → N (0, 1) so

SAD

2013

11 / 13

Concluding the example at the price of a slight risk √

n (Xn p0 (1−p0 )

I

we directly derive √

I

with√a confidence level (what is that ??) of 95%: √ n |Xn − p0 | ≤ u = 1.96. This is equivalent to: p0 (1−p0 )   √ √ p0 (1−p0 ) p0 (1−p0 ) √ √ p0 ∈ In := Xn − 1.96 ; Xn + 1.96 , n n

I I

− p0 ) → N (0, 1) so

again, does this help ?

E. Rachelson & M. Vignes (ISAE)

SAD

2013

11 / 13

Concluding the example at the price of a slight risk √

n (Xn p0 (1−p0 )

I

we directly derive √

I

with√a confidence level (what is that ??) of 95%: √ n |Xn − p0 | ≤ u = 1.96. This is equivalent to: p0 (1−p0 )   √ √ p0 (1−p0 ) p0 (1−p0 ) √ √ p0 ∈ In := Xn − 1.96 ; Xn + 1.96 , n n

I

− p0 ) → N (0, 1) so

I

again, does this help ?

I

p yes, using the simplifying trick x(1 − x) ≤ 1/2:  √ √  In ⊆ X¯n − 1/ n; X¯n + 1/ n = [32%; 52%] in our scenario. Your final conclusion ? Again what is 95% here ?

E. Rachelson & M. Vignes (ISAE)

SAD

2013

11 / 13

Concluding the example at the price of a slight risk √

n (Xn p0 (1−p0 )

I

we directly derive √

I

with√a confidence level (what is that ??) of 95%: √ n |Xn − p0 | ≤ u = 1.96. This is equivalent to: p0 (1−p0 )   √ √ p0 (1−p0 ) p0 (1−p0 ) √ √ p0 ∈ In := Xn − 1.96 ; Xn + 1.96 , n n

I

− p0 ) → N (0, 1) so

I

again, does this help ?

I

p yes, using the simplifying trick x(1 − x) ≤ 1/2:  √ √  In ⊆ X¯n − 1/ n; X¯n + 1/ n = [32%; 52%] in our scenario. Your final conclusion ? Again what is 95% here ?

I

is the conclusion similar if n = 1, 000 ?

Note: 95% could have been replaced by 99%. How could this have affected the conclusion ? What about 100% ? E. Rachelson & M. Vignes (ISAE)

SAD

2013

11 / 13

Hypothesis testing decision depends on what is tested ?!

I

(H0 ) is the basic hypothesis. It will be rejected iif data strongly supports it (e.g. dangerous drug or alleged innocent). Then

E. Rachelson & M. Vignes (ISAE)

SAD

2013

12 / 13

Hypothesis testing decision depends on what is tested ?!

I

(H0 ) is the basic hypothesis. It will be rejected iif data strongly supports it (e.g. dangerous drug or alleged innocent). Then

I

(H1 ) is the alternative hypothesis which would be accepted iif (H0 ) was regognised to be unacceptable.

E. Rachelson & M. Vignes (ISAE)

SAD

2013

12 / 13

Hypothesis testing decision depends on what is tested ?!

I

(H0 ) is the basic hypothesis. It will be rejected iif data strongly supports it (e.g. dangerous drug or alleged innocent). Then

I

(H1 ) is the alternative hypothesis which would be accepted iif (H0 ) was regognised to be unacceptable.

I

Back on the example: do you want to test (H0 ) p0 ≥ 1/2 ? Or (H00 ) p0 ≤ 1/2 ? Which one is the sensible/aggressive choice ?

E. Rachelson & M. Vignes (ISAE)

SAD

2013

12 / 13

Hypothesis testing decision depends on what is tested ?!

I

(H0 ) is the basic hypothesis. It will be rejected iif data strongly supports it (e.g. dangerous drug or alleged innocent). Then

I

(H1 ) is the alternative hypothesis which would be accepted iif (H0 ) was regognised to be unacceptable.

I

Back on the example: do you want to test (H0 ) p0 ≥ 1/2 ? Or (H00 ) p0 ≤ 1/2 ? Which one is the sensible/aggressive choice ?

I

lessons from this: tests are not reductible to confidence intervals and...don’t be fooled by an obscure choice of hypotheses !

E. Rachelson & M. Vignes (ISAE)

SAD

2013

12 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

(linear) Regressions

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

(linear) Regressions

I

Trees and ensemble methods

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

(linear) Regressions

I

Trees and ensemble methods

I

Kernels and parcimony

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

(linear) Regressions

I

Trees and ensemble methods

I

Kernels and parcimony

I

Markov Chain Monte Carlo methods

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13

Outline of the Statistics and learning course From 18th September 2013 to 7th February 2014, you will hear about: I

descriptive statistics and modelling: estimation, dispersion measure, confidence intervals, PCA and perhaps more (CA, clustering, risk function).

I

Tests, ANOVA (one, several factors, analysis of covariance)

I

(linear) Regressions

I

Trees and ensemble methods

I

Kernels and parcimony

I

Markov Chain Monte Carlo methods

I

...and lots of R ;) !

I

Projects with some stats content from Nov. 2013 ’til May 2014...tbc

E. Rachelson & M. Vignes (ISAE)

SAD

2013

13 / 13