False Discovery Rate Part I : introduction et enjeux

Feb 3, 2014 - now > 20,000 citations on google scholar ! E. Roquain ... training set = null distribution known (one-class classification). ▷ Classification rule ...
5MB taille 3 téléchargements 330 vues
False Discovery Rate Part I : introduction et enjeux E. Roquain1 1

Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France

Point de Vue, 3rd February 2014

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

1 / 33

1 Introduction

2 False discovery rate control

3 FDR in other statistical issues

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

2 / 33

1 Introduction 2 False discovery rate control 3 FDR in other statistical issues

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

3 / 33

A “multiple testing joke" (http://xkcd.com)

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

4 / 33

A “multiple testing joke" (http://xkcd.com)

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

4 / 33

A “multiple testing joke" (http://xkcd.com)

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

4 / 33

A “multiple testing joke" (http://xkcd.com)

Multiplicity problem P( make at least one false discovery )  P( the i-th is a false discovery ) A correction is needed to assess significancy! E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

4 / 33

Some other examples Paradoxes due to large scale experiments Probable facts appear significant

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

5 / 33

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]

Open access, freely available online

Essay

Why Most Published Research Findings Are False John P. A. Ioannidis

Summary There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

E. Roquain

factors that influence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles

It can be proven that most claimed research findings are false.

should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. ublished research findings are However, here we will target sometimes refuted by FDR subsequent : introduction, enjeux et perspectives. relationships that investigators claim

is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study finding a true relationship reflects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research finding is thus

Part I.

Introduction

6 / 33

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments ⇓ 1000 pure noise

30 perfect signal ⇓

publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I

Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature

I

30 true discoveries

' 14%

Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

6 / 33

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments ⇓ 1000 pure noise

30 perfect signal ⇓

publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I

Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature

I

30 true discoveries

' 14%

Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

6 / 33

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments ⇓ 1000 pure noise

30 perfect signal ⇓

publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I

Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature

I

30 true discoveries

' 14%

Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

6 / 33

Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:

Try many experiments ⇓ 1000 pure noise

30 perfect signal ⇓

publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I

Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature

I

30 true discoveries

' 14%

Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

6 / 33

Multiplicity in microarray [Hedenfalk et al. (2001)]

BRCA1 vs BRCA2

genes

I expression level (activity) I genes differentially activated? I 1 test for each gene I thousands of genes I nb replications  dimension I correlations

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

7 / 33

Other applications I Neuroimaging (FMRI) activated regions? I Econometrics winning strategies? I Astronomy directions with stars?

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

8 / 33

Canonical setting I Xi = avg group 2 - avg group 1 (rescaled) for genes i I Gaussian model :     

X1 X2 .. .





     = µ  

Xm

H1 H2 .. .





    +  

Hm

ε1 ε2 .. .

   , 

εm

with µ > 0, H ∈ {0, 1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1). I Γ = dependence structure = Im for now

Question: for each i, Hi = 0 or Hi = 1 ? Multiple testing : favors the “0" decision

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

9 / 33

Individual decision and errors I Test statistic: Xi I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0, 1)

pi such that if Hi = 0, pi ∼ U(0, 1) if Hi = 1, pi ∼ Φ(Φ

−1

(·) − µ)

b i = 1{pi ≤ t} for some threshold t I Choose H I Two errors:

Hi = 0 Hi = 1

bi = 0 H true negative false negative

bi = 1 H false positive true positive

I False positive more annoying E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

10 / 33

1.0

Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im

0.8

0 0 0 0 0 00

0.0

0.2

0.4

0.6

000 00 0 00 00 000 0 1 0 1 00 000 1 0 01 0 00 0 001 1 0001 0 0000 011 1 0 0 1 1 1 111111 11011 1000 111111 11111111111111111111

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

11 / 33

1.0

Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im

0.0

0.2

0.4

0.6

0.8

00 0 0 000 00 0 0 0 000 000 0 0 0 00 0 00 00 000 000 00 0000 00 000 0 0 0 0000 000 00 00 000 00 000 0000 0 000 000 00 00 00 0 0 0 0 0 001 1000 1110

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

12 / 33

1.0

Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im

0.0

0.2

0.4

0.6

0.8

10 100 0 1 0001 1 10 10001 1 0 111 110 1 1 0 01 1 0000 0110 1 1 0 0 10 00 0 0 0 11 0 011 1 01 00011 1 1 01 1 101 111 11 11 000 101 100 00 001 1 0 010

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

13 / 33

1.0

Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im

0.0

0.2

0.4

0.6

0.8

0 00 00 000 001 0000 00 0 000 0 0000 0 0 0000 000 0 0 0 0 0 0000 00 0000 0 0 00 00 00 0 0 00 0 0 00010 0 0 00 0 0 001 0 0100 0 00 000 0000 00 1 0000

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

14 / 33

0.0

0.2

0.4

0.6

0.8

1.0

Doing like for 1 test? t ≡ α = 0.1

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

15 / 33

1.0

Doing like for 1 test? t ≡ α = 0.1

0.8

0 100 0000 0 0 0 00 00 0 0 0 0 000 0000

0.0

0.2

0.4

0.6

000 0 00 0 00 00 0 00 000 00000 0 0 0 000 0 0 000 01 00 0 1 0 0 00 0000 00000 0 000 00 000 00 0 00 00 000011

0.0 E. Roquain

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

15 / 33

0 Doing like for 1 test? t ≡ α = 0.1 0

0.20

0 0 0 0 0

0.15

0 0000

0 000 0

0.00

0.05

0.10

0 0 0 0 0 0 0

E. Roquain

0 00 10 1 000 0

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

15 / 33

0 Union bound Bonferroni? t ≡ α/m = 0.1/100 0

0.20

0 0 0 0 0

0.15

0 0000

0 000 0

0.00

0.05

0.10

0 0 0 0 0 0 0

E. Roquain

0 00 10 1 000 0

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

16 / 33

0.20

Union bound Bonferroni? t ≡ α/m = 0.1/100 1 0 0 0

0.15

0 0 11 0

0.00

0.05

0.10

00 1 01 0 1 011

E. Roquain

111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

16 / 33

0.00

0.05

0.10

0.15

0.20

Do something in between! t` = α`/m = 0.1`/100

E. Roquain

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

17 / 33

0.20

Do something in between! t` = α`/m = 0.1`/100 1 0 0 0

0.15

0 0 11 0

0.00

0.05

0.10

00 1 01 0 1 011

E. Roquain

111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

17 / 33

0 Do something in between! t` = α`/m = 0.1`/100 0

0.20

0 0 0 0 0

0.15

0 0000

0 000 0

0.00

0.05

0.10

0 0 0 0 0 0 0

E. Roquain

0 00 10 1 000 0

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 Introduction

17 / 33

Smart ! . . . and rigorous ?

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

Introduction

18 / 33

1 Introduction 2 False discovery rate control 3 FDR in other statistical issues

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

False discovery rate control

19 / 33

BH procedure p-value view

c.d.f. view

kb = max{0 ≤ i ≤ m : p(i) ≤ αi/m}

b m (t) ≥ t/α} bt = max{t ∈ [0, 1] : G

bt = αkb/m b i = 1{pi ≤ bt} = 1{Xi ≥ Φ−1 (bt)} H E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

False discovery rate control

20 / 33

False discovery rate control

b i = 1{pi ≤ bt} (∀i), For a decision H b i = 1} #{i : Hi = 0, H FDP(bt) = b i = 1} #{i : H FDR(bt) = E[FDP(bt)]



0 =0 0



Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)] If Γ = Im and bt threshold of BH procedure, ∀µ, H, FDR(bt) = (m0 /m)α ≤ α

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

False discovery rate control

21 / 33

Simulations. m = 50; m0 = 25; µ = 3; 0 Γ = Im FDP(BH) = 0

00 0 0

0.3

0.4

0

0.0

0.1

0.2

00 000 0 1 0

E. Roquain

1 1 1 111111111111111111111 0.0 0.2 0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0.0

0.1

0.2

0.3

0.4

0 Γ=I Simulations. m = 50; m0 = 25; µ = 3; m

E. Roquain

FDP(BH) = 0.16

0 00 0 0

0 0 0 11 1 0 0 110 1110 0 1 1 1 1 1 1 1 1 1 1 111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ = 03; Γ = Im

E. Roquain

00

FDP(BH) = 0.0833

00 0 1 0

01 10 0 1 11 1 11 11 1 1 1 0 1 0 1 1 1 1111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ 0= 3; Γ = Im

E. Roquain

FDP(BH) = 0.08

1 010 0 0 0 1 1 1 1 00 1 111111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

E. Roquain

00 00

FDP(BH) = 0.12

0 1 0 0 1 0 10

0 101 101 11 1 1111 111111011111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

01 0 0 0 00

FDP(BH) = 0.167

0.3

0.4

Simulations. m = 50; m0 = 25; µ = 3; Γ = Im

1 0 1 1 1

0.0

0.1

0.2

0

E. Roquain

00 1 1 0 0111 111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0 FDP(BH) = 0.0435

0 0 1 10

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ0= 3; Γ = Im

E. Roquain

0 1 10 1 1 1 1 1 111 1111111111111 0.0 0.2 0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

FDP(BH) = 0

00 0

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ00= 3; Γ = Im

E. Roquain

1 0 11 1 00 11 1111 1 1 1 11 1111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

0 0 00 0 10

FDP(BH) = 0

0.0

0.1

0.2

0.3

0.4

Simulations. m = 50; m0 = 25; µ = 3;00Γ = Im

E. Roquain

0 0 010 0 01 0 1 0 1 11 1 1 1111111111111111 0.0 0.2 0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

00 0 0 0 0 0

FDP(BH) = 0.04

0.3

0.4

0 Γ=I Simulations. m = 50; m0 = 25; µ =003; m

0.2

0

0.0

0.1

1

E. Roquain

0 1 111 0111 11111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 False discovery rate control

22 / 33

800 600 400 0

200

Number of citations

1000

1200

sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in many diverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995), which isand shown in Table(1995). 1, is particularly interesting, it includes statistical suggesting Benjamini Hochberg Controlling the falsebecause discovery rate: a six practical andpapers, powerful approach to that further theoretical and methodological developments of the method have had significant influence. multiple testing

1996

1998

2000

2002

2004

2006

2008

Year

Fig. 1. Rapidly increasing number of citations of Benjamini and Hochberg (1995), suggesting that its influ[Benjamini (2010,JRSSB)] ence has not yet reached its peak (note that the figure for 2009 is only partially shown) 10 most cited papers that Benjamini and now > Table 20, 1.000 citations oncitegoogle scholar ! Hochberg (1995)

E. Roquain

Rank

Article citing Benjamini

Number of

FDR : introduction, enjeux et perspectives. Part I.

False discovery rate control

23 / 33

1 Introduction 2 False discovery rate control 3 FDR in other statistical issues

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

24 / 33

0.00

0.05

0.10

0.15

0.20

Why should FDR thresholding be adaptive to sparsity?

E. Roquain

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 FDR in other statistical issues

25 / 33

0 Why should FDR thresholding be adaptive to sparsity? 0

0.20

0 0 0 0 0

0.15

0 0000

0 000 0

0.00

0.05

0.10

0 0 0 0 0 0 0

E. Roquain

0 00 10 1 000 0

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 FDR in other statistical issues

25 / 33

0.20

Why should FDR thresholding be adaptive to sparsity? 1 0 0 0

0.15

0 0 11 0

0.00

0.05

0.10

00 1 01 0 1 011

E. Roquain

111 1 1 01 11 11 1 0 1110 11 11 1 1101 11 11111111111111

0.0

0.2

0.4

0.6

FDR : introduction, enjeux et perspectives. Part I.

0.8

1.0 FDR in other statistical issues

25 / 33

[Linnemann] - increasing signal strength

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

26 / 33

[Linnemann] - increasing signal strength

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

26 / 33

[Linnemann] - increasing signal strength

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

26 / 33

[Linnemann] - increasing signal strength

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

26 / 33

[Linnemann] - increasing signal strength

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

26 / 33

Adaptation to unknown sparsity

bt seems "adaptive" to the “quantity" of signal in the data I Classification : where is the signal ? [Bogdan et al. (2011)], [Neuvial and R. (2012)] I Detection: is there some signal ? [Ingster (2002)], [Donoho and Jin (2004)], etc I Estimation: what is the value EX of the signal ? d = Xi 1{|Xi | ≥ bt} (hard thresholding) EX [Abramovich et al. (2006)], [Donoho and Jin (2006)]

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

27 / 33

Classification Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I training set = null distribution known (one-class classification) bm : R → {0, 1}; I Classification rule h I Risk   m X −1 −1 ˆ ˆ Rm (hm ) = (1 − π0 ) E m 1{hm (Xi ) 6= Hi } . i=1

I Classification boundary in (sparsity, signal) space such that ˆm : Rm (h ˆm ) → 0 (perfect classification) Above the boundary, ∃h ˆ m , Rm ( h ˆm ) → 1 (unclassifiable) Under the boundary, ∀h

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

28 / 33

1.0

Classification boundary

µm =

0.8

Perfect classification

p 2r log m

π0,m = 1 − m−β

0.6

bBH (x) = 1{x ≥ Φ−1 (bt)} BH h m

0.4

r

with αm ∝ (log m)−1/2 I Classification boundary attained by BH.

0.2

Unclassifiable

0.0

On the boundary : risk BH ∼ Bayes risk. 0.5

0.6

0.7

0.8

0.9

1.0

β

[Bogdan et al. (2011)], [Neuvial and R. (2012)]

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

29 / 33

Detection : is there some signal ? Same model Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I Test H0 : “N (0, Im )" against H1 : “mixture". I Risk Rm (T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0) I Detection boundary in (sparsity, signal) space such that Above the boundary, ∃T : Rm (T ) → 0 (perfect detection) Under the boundary, ∀T , Rm (T ) → 1 (undetectable)

E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

30 / 33

Detection boundary 1.0

µm =

2r log m

π0,m = 1 − m−β

Perfect classification

0.8

p

T BH = 1{∃i : p(i) ≤ αm i/m}

0.6

with αm ∝ (log m)−1/2 I Detection boundary attained by BH when β ∈ (3/4, 1)

0.2

0.4

r

Perfect detection

0.0

Undetectable

0.5

0.6

0.7

0.8 β

E. Roquain

0.9

1.0

I Better to use “higher criticism" ( ) √ i/m − p(i) max mp i p(i) (1 − p(i) )

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

31 / 33

LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1

where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1

p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k ) I λk = λ = Φ

E. Roquain

Bonferroni BH !

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

32 / 33

LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1

where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1

p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k )

I λk = λ = Φ

E. Roquain

Bonferroni BH !

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

32 / 33

Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity

Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics

Main challenge What about dependence ? E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

33 / 33

Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity

Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics

Main challenge What about dependence ? E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

33 / 33

Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity

Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics

Main challenge What about dependence ? E. Roquain

FDR : introduction, enjeux et perspectives. Part I.

FDR in other statistical issues

33 / 33