Deux méthodes que l'on devrait utiliser plus souvent - Laurent Thibault

Dec 16, 2011 - there in the autumn, maybe now is the time to set up some sort of hand-stretching programme. - perhaps on Wednesday afternoons, now that ...
1024KB taille 1 téléchargements 32 vues
Deux méthodes que l'on devrait utiliser plus souvent : Corrélations partielles Régression robuste

Sébastien Déjean Institut de Mathématiques de Toulouse www.math.univ­toulouse.fr/~sdejean/ Rencontre Ingénieurs-statisticiens 2 février 2015, UT3 Paul Sabatier

CORRÉLATION

Spurious correlations http://www.tylervigen.com/

cor(x,y) = 0.849

cor(x,z) = 0.883 cors(x,z) = 0.962

cor(x,x2) = -0.156 cors(x,x2) = -0.168 MI(x,x2) = 0.65

Numata J, Ebenhöh O, Knapp EW. Measuring correlations in metabolomic networks with mutual information. Genome Inform. 2008;20:112-22.

Package bioDist

The MINE application

http://www.exploredata.net/Downloads/MINE­Application

Terry Speed A Correlation for the 21st Century Science 16 December 2011: Vol. 334 no. 6062 pp. 1502-1503

Corrélation partielle Il est surprenant de constater qu'une technique statistique aussi puissante et aussi facile à obtenir que la corrélation partielle ne soit pas plus fréquemment utilisée en psychologie. Cette technique permet d'évaluer la corrélation entre deux variables après avoir contrôlé l'effet perturbateur d'une ou de plusieurs autres variables. Pr Jacques Baillargeon http://www.uqtr.uquebec.ca/~baillarg/srp­6001/cours3/partielle.htm

Wikipedia : Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z1, Z2, …, Zn}, written ρXY·Z, is the correlation between the residuals RX and RY resulting from the linear regression of X with Z and of Y with Z, respectively.

http://plus.maths.org/content/coincidence­correlation­and­chance

And talking of Jenny, there she is now, buying an ice cream from the local shop. With her family about to go out to Australia for a holiday, I ought to go and warn her that the more ice creams there are sold, the more shark attacks there are. Again, I've done my research quite thoroughly, and the numbers do not lie. Perhaps I should recommend an apple instead! Finally, let's pop into my local primary school to chat to the head teacher. I want to tell her about research I've uncovered which shows a clear and proven link between literacy levels and hand size in children. Bigger hands make better readers, it seems. With my son starting there in the autumn, maybe now is the time to set up some sort of hand-stretching programme - perhaps on Wednesday afternoons, now that PE's been scrapped? These examples may seem bizarre and improbable, but they are not the result of bad statistics. All the information is absolutely correct. Their strangeness comes from our own reasoning. We see two things changing together and our instinct is to assume that they are tied by cause and effect. Unfortunately, our instinct is often wrong. In all these examples a third "confounding" variable is actually the cause of two correlated variables. It is absolutely true that people who play loud music are more likely to suffer from acne, but only because teenagers make up a big part of both groups. Acne and loud music are certainly correlated. But correlation is not causation. The same thing is true with the sharks and ice cream. The number of shark attacks and ice creams sold both go up during the summer, with the good weather encouraging people both to go in swimming and to eat ice cream. And as for large hands? Older children are bigger, and can read better!

pairs(data.frame(x,y,z))

> x  y  z  cor(data.frame(x,y,z)) x y z x 1.000 0.942 0.721 y 0.942 1.000 0.649 z 0.721 0.649 1.000 > res.y.x  res.z.x  cor(res.y.x,res.z.x) [1] ­0.126402 > res.y.z  res.x.z  cor(res.x.z,res.y.z) [1] 0.8992353 Packages > library(ppcor) > pcor(data.frame(x,y,z)) $estimate       x      y      z x 1.000  0.899  0.426 y 0.899  1.000 ­0.126 z 0.426 ­0.126  1.000

● ● ● ●

R:

ppcor corpcor parcor ...

x

y

Vente de glaces

Température

z Attaque de requins

RÉGRESSION ROBUSTE





pas de points particuliers les courbes rouge et bleue sont très proches







par rapport au cas précédent, un point a été modifié (en haut à gauche) la courbe rouge est « attirée » par ce point : la pente est plus faible pour que le début de la courbe soit plus proche de ce point atypique la courbe bleue reste quasiment inchangée par rapport au cas précédent

Moindres carrés

Moindres écarts absolus

Moindres carrés médians

# Cas 1 : régression linéaire  simple sans valeur atypique

# Cas 2 : régression linéaire  simple avec valeur atypique

> x1  y1  library(MASS)

> x2  y2  y2[2]  regr1  rob1   regr2  rob2   plot(x1,y1,pch=16) > abline(regr1,col="red",lwd=2) > abline(rob1,col="blue",lwd=2) > legend("bottomright",   c("Régression classique",     "Régression robuste"),  col=c("red","blue"),lty=1,lwd=2)

> plot(x2,y2,pch=16) > abline(regr2,col="red",lwd=2) > abline(rob2,col="blue",lwd=2) > legend("bottomright",   c("Régression classique",     "Régression robuste"), col=c("red","blue"),lty=1,lwd=2)