False Discovery Rate Part I : introduction et enjeux E. Roquain1 1
Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France
Point de Vue, 3rd February 2014
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
1 / 33
1 Introduction
2 False discovery rate control
3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
2 / 33
1 Introduction 2 False discovery rate control 3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
3 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
A “multiple testing joke" (http://xkcd.com)
Multiplicity problem P( make at least one false discovery ) P( the i-th is a false discovery ) A correction is needed to assess significancy! E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
4 / 33
Some other examples Paradoxes due to large scale experiments Probable facts appear significant
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
5 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)]
Open access, freely available online
Essay
Why Most Published Research Findings Are False John P. A. Ioannidis
Summary There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
E. Roquain
factors that influence this problem and some corollaries thereof.
Modeling the Framework for False Positive Findings Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles
It can be proven that most claimed research findings are false.
should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. ublished research findings are However, here we will target sometimes refuted by FDR subsequent : introduction, enjeux et perspectives. relationships that investigators claim
is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R⁄(R + 1). The probability of a study finding a true relationship reflects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R⁄(R − βR + α). A research finding is thus
Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments ⇓ 1000 pure noise
30 perfect signal ⇓
publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments ⇓ 1000 pure noise
30 perfect signal ⇓
publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments ⇓ 1000 pure noise
30 perfect signal ⇓
publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Science-wise multiplicity issue - [Ioannidis (2005, PLoS Medicine)] [Talk Benjamini Southampton (2013)] ; modeling:
Try many experiments ⇓ 1000 pure noise
30 perfect signal ⇓
publish results with a p-value ≤ 0.05 ⇓ ' 50 false discoveries I
Jager and Leek (2013). An estimate of the science-wise false discovery rate and application to the top medical literature
I
30 true discoveries
' 14%
Ioannidis (2014). Discussion: Why "An estimate of the science-wise false discovery rate and application to the top medical literature" is false E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
6 / 33
Multiplicity in microarray [Hedenfalk et al. (2001)]
BRCA1 vs BRCA2
genes
I expression level (activity) I genes differentially activated? I 1 test for each gene I thousands of genes I nb replications dimension I correlations
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
7 / 33
Other applications I Neuroimaging (FMRI) activated regions? I Econometrics winning strategies? I Astronomy directions with stars?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
8 / 33
Canonical setting I Xi = avg group 2 - avg group 1 (rescaled) for genes i I Gaussian model :
X1 X2 .. .
= µ
Xm
H1 H2 .. .
+
Hm
ε1 ε2 .. .
,
εm
with µ > 0, H ∈ {0, 1}m (fixed) and ε ∼ N (0, Γ) (Γi,i = 1). I Γ = dependence structure = Im for now
Question: for each i, Hi = 0 or Hi = 1 ? Multiple testing : favors the “0" decision
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
9 / 33
Individual decision and errors I Test statistic: Xi I p-value: pi = Φ(Xi ), with Φ(z) = P(Z ≥ z), Z ∼ N (0, 1)
pi such that if Hi = 0, pi ∼ U(0, 1) if Hi = 1, pi ∼ Φ(Φ
−1
(·) − µ)
b i = 1{pi ≤ t} for some threshold t I Choose H I Two errors:
Hi = 0 Hi = 1
bi = 0 H true negative false negative
bi = 1 H false positive true positive
I False positive more annoying E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
10 / 33
1.0
Picture 1. m = 100; m0 = 50; µ = 2; Γ = Im
0.8
0 0 0 0 0 00
0.0
0.2
0.4
0.6
000 00 0 00 00 000 0 1 0 1 00 000 1 0 01 0 00 0 001 1 0001 0 0000 011 1 0 0 1 1 1 111111 11011 1000 111111 11111111111111111111
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
11 / 33
1.0
Picture 2. m = 100; m0 = 95; µ = 3; Γ = Im
0.0
0.2
0.4
0.6
0.8
00 0 0 000 00 0 0 0 000 000 0 0 0 00 0 00 00 000 000 00 0000 00 000 0 0 0 0000 000 00 00 000 00 000 0000 0 000 000 00 00 00 0 0 0 0 0 001 1000 1110
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
12 / 33
1.0
Picture 3. m = 100; m0 = 50; µ = 0.01; Γ = Im
0.0
0.2
0.4
0.6
0.8
10 100 0 1 0001 1 10 10001 1 0 111 110 1 1 0 01 1 0000 0110 1 1 0 0 10 00 0 0 0 11 0 011 1 01 00011 1 1 01 1 101 111 11 11 000 101 100 00 001 1 0 010
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
13 / 33
1.0
Picture 4. m = 100; m0 = 95; µ = 0.01; Γ = Im
0.0
0.2
0.4
0.6
0.8
0 00 00 000 001 0000 00 0 000 0 0000 0 0 0000 000 0 0 0 0 0 0000 00 0000 0 0 00 00 00 0 0 00 0 0 00010 0 0 00 0 0 001 0 0100 0 00 000 0000 00 1 0000
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
14 / 33
0.0
0.2
0.4
0.6
0.8
1.0
Doing like for 1 test? t ≡ α = 0.1
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
15 / 33
1.0
Doing like for 1 test? t ≡ α = 0.1
0.8
0 100 0000 0 0 0 00 00 0 0 0 0 000 0000
0.0
0.2
0.4
0.6
000 0 00 0 00 00 0 00 000 00000 0 0 0 000 0 0 000 01 00 0 1 0 0 00 0000 00000 0 000 00 000 00 0 00 00 000011
0.0 E. Roquain
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
15 / 33
0 Doing like for 1 test? t ≡ α = 0.1 0
0.20
0 0 0 0 0
0.15
0 0000
0 000 0
0.00
0.05
0.10
0 0 0 0 0 0 0
E. Roquain
0 00 10 1 000 0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
15 / 33
0 Union bound Bonferroni? t ≡ α/m = 0.1/100 0
0.20
0 0 0 0 0
0.15
0 0000
0 000 0
0.00
0.05
0.10
0 0 0 0 0 0 0
E. Roquain
0 00 10 1 000 0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
16 / 33
0.20
Union bound Bonferroni? t ≡ α/m = 0.1/100 1 0 0 0
0.15
0 0 11 0
0.00
0.05
0.10
00 1 01 0 1 011
E. Roquain
111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
16 / 33
0.00
0.05
0.10
0.15
0.20
Do something in between! t` = α`/m = 0.1`/100
E. Roquain
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
17 / 33
0.20
Do something in between! t` = α`/m = 0.1`/100 1 0 0 0
0.15
0 0 11 0
0.00
0.05
0.10
00 1 01 0 1 011
E. Roquain
111 1 1 01 1 1 11 01 1110 1 11 11 1101 11 11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
17 / 33
0 Do something in between! t` = α`/m = 0.1`/100 0
0.20
0 0 0 0 0
0.15
0 0000
0 000 0
0.00
0.05
0.10
0 0 0 0 0 0 0
E. Roquain
0 00 10 1 000 0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 Introduction
17 / 33
Smart ! . . . and rigorous ?
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
Introduction
18 / 33
1 Introduction 2 False discovery rate control 3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
19 / 33
BH procedure p-value view
c.d.f. view
kb = max{0 ≤ i ≤ m : p(i) ≤ αi/m}
b m (t) ≥ t/α} bt = max{t ∈ [0, 1] : G
bt = αkb/m b i = 1{pi ≤ bt} = 1{Xi ≥ Φ−1 (bt)} H E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
20 / 33
False discovery rate control
b i = 1{pi ≤ bt} (∀i), For a decision H b i = 1} #{i : Hi = 0, H FDP(bt) = b i = 1} #{i : H FDR(bt) = E[FDP(bt)]
0 =0 0
Theorem [Benjamini and Hochberg (1995)] [Benjamini and Yekutieli (2001)] If Γ = Im and bt threshold of BH procedure, ∀µ, H, FDR(bt) = (m0 /m)α ≤ α
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
21 / 33
Simulations. m = 50; m0 = 25; µ = 3; 0 Γ = Im FDP(BH) = 0
00 0 0
0.3
0.4
0
0.0
0.1
0.2
00 000 0 1 0
E. Roquain
1 1 1 111111111111111111111 0.0 0.2 0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
0 Γ=I Simulations. m = 50; m0 = 25; µ = 3; m
E. Roquain
FDP(BH) = 0.16
0 00 0 0
0 0 0 11 1 0 0 110 1110 0 1 1 1 1 1 1 1 1 1 1 111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 03; Γ = Im
E. Roquain
00
FDP(BH) = 0.0833
00 0 1 0
01 10 0 1 11 1 11 11 1 1 1 0 1 0 1 1 1 1111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ 0= 3; Γ = Im
E. Roquain
FDP(BH) = 0.08
1 010 0 0 0 1 1 1 1 00 1 111111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3; Γ = Im
E. Roquain
00 00
FDP(BH) = 0.12
0 1 0 0 1 0 10
0 101 101 11 1 1111 111111011111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
01 0 0 0 00
FDP(BH) = 0.167
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3; Γ = Im
1 0 1 1 1
0.0
0.1
0.2
0
E. Roquain
00 1 1 0 0111 111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0 FDP(BH) = 0.0435
0 0 1 10
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ0= 3; Γ = Im
E. Roquain
0 1 10 1 1 1 1 1 111 1111111111111 0.0 0.2 0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
FDP(BH) = 0
00 0
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ00= 3; Γ = Im
E. Roquain
1 0 11 1 00 11 1111 1 1 1 11 1111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
0 0 00 0 10
FDP(BH) = 0
0.0
0.1
0.2
0.3
0.4
Simulations. m = 50; m0 = 25; µ = 3;00Γ = Im
E. Roquain
0 0 010 0 01 0 1 0 1 11 1 1 1111111111111111 0.0 0.2 0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
00 0 0 0 0 0
FDP(BH) = 0.04
0.3
0.4
0 Γ=I Simulations. m = 50; m0 = 25; µ =003; m
0.2
0
0.0
0.1
1
E. Roquain
0 1 111 0111 11111111111111111 0.0 0.2 0.4 0.6 FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 False discovery rate control
22 / 33
800 600 400 0
200
Number of citations
1000
1200
sional application areas, such as neural imaging, are on the rise also, showing its ability to be applied in many diverse types of application. The list of the 10 highest cited papers that cite Benjamini and Hochberg (1995), which isand shown in Table(1995). 1, is particularly interesting, it includes statistical suggesting Benjamini Hochberg Controlling the falsebecause discovery rate: a six practical andpapers, powerful approach to that further theoretical and methodological developments of the method have had significant influence. multiple testing
1996
1998
2000
2002
2004
2006
2008
Year
Fig. 1. Rapidly increasing number of citations of Benjamini and Hochberg (1995), suggesting that its influ[Benjamini (2010,JRSSB)] ence has not yet reached its peak (note that the figure for 2009 is only partially shown) 10 most cited papers that Benjamini and now > Table 20, 1.000 citations oncitegoogle scholar ! Hochberg (1995)
E. Roquain
Rank
Article citing Benjamini
Number of
FDR : introduction, enjeux et perspectives. Part I.
False discovery rate control
23 / 33
1 Introduction 2 False discovery rate control 3 FDR in other statistical issues
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
24 / 33
0.00
0.05
0.10
0.15
0.20
Why should FDR thresholding be adaptive to sparsity?
E. Roquain
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 FDR in other statistical issues
25 / 33
0 Why should FDR thresholding be adaptive to sparsity? 0
0.20
0 0 0 0 0
0.15
0 0000
0 000 0
0.00
0.05
0.10
0 0 0 0 0 0 0
E. Roquain
0 00 10 1 000 0
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 FDR in other statistical issues
25 / 33
0.20
Why should FDR thresholding be adaptive to sparsity? 1 0 0 0
0.15
0 0 11 0
0.00
0.05
0.10
00 1 01 0 1 011
E. Roquain
111 1 1 01 11 11 1 0 1110 11 11 1 1101 11 11111111111111
0.0
0.2
0.4
0.6
FDR : introduction, enjeux et perspectives. Part I.
0.8
1.0 FDR in other statistical issues
25 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
[Linnemann] - increasing signal strength
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
26 / 33
Adaptation to unknown sparsity
bt seems "adaptive" to the “quantity" of signal in the data I Classification : where is the signal ? [Bogdan et al. (2011)], [Neuvial and R. (2012)] I Detection: is there some signal ? [Ingster (2002)], [Donoho and Jin (2004)], etc I Estimation: what is the value EX of the signal ? d = Xi 1{|Xi | ≥ bt} (hard thresholding) EX [Abramovich et al. (2006)], [Donoho and Jin (2006)]
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
27 / 33
Classification Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I training set = null distribution known (one-class classification) bm : R → {0, 1}; I Classification rule h I Risk m X −1 −1 ˆ ˆ Rm (hm ) = (1 − π0 ) E m 1{hm (Xi ) 6= Hi } . i=1
I Classification boundary in (sparsity, signal) space such that ˆm : Rm (h ˆm ) → 0 (perfect classification) Above the boundary, ∃h ˆ m , Rm ( h ˆm ) → 1 (unclassifiable) Under the boundary, ∀h
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
28 / 33
1.0
Classification boundary
µm =
0.8
Perfect classification
p 2r log m
π0,m = 1 − m−β
0.6
bBH (x) = 1{x ≥ Φ−1 (bt)} BH h m
0.4
r
with αm ∝ (log m)−1/2 I Classification boundary attained by BH.
0.2
Unclassifiable
0.0
On the boundary : risk BH ∼ Bayes risk. 0.5
0.6
0.7
0.8
0.9
1.0
β
[Bogdan et al. (2011)], [Neuvial and R. (2012)]
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
29 / 33
Detection : is there some signal ? Same model Xi ∼ π0,m N (0, 1) + (1 − π0,m ) N (µm , 1), 1 ≤ i ≤ m, i.i.d. but π0,m → 1 (sparse) and µm → ∞ (compensates sparsity). I Test H0 : “N (0, Im )" against H1 : “mixture". I Risk Rm (T ) = PH0 (T (X ) = 1) + PH1 (T (X ) = 0) I Detection boundary in (sparsity, signal) space such that Above the boundary, ∃T : Rm (T ) → 0 (perfect detection) Under the boundary, ∀T , Rm (T ) → 1 (undetectable)
E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
30 / 33
Detection boundary 1.0
µm =
2r log m
π0,m = 1 − m−β
Perfect classification
0.8
p
T BH = 1{∃i : p(i) ≤ αm i/m}
0.6
with αm ∝ (log m)−1/2 I Detection boundary attained by BH when β ∈ (3/4, 1)
0.2
0.4
r
Perfect detection
0.0
Undetectable
0.5
0.6
0.7
0.8 β
E. Roquain
0.9
1.0
I Better to use “higher criticism" ( ) √ i/m − p(i) max mp i p(i) (1 − p(i) )
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
31 / 33
LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1
where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1
p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k ) I λk = λ = Φ
E. Roquain
Bonferroni BH !
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
32 / 33
LASSO and FDR Regression with orthogonal design: X ∼ N (β, Im ) [Bogdan et al. (2013)]: sorted `1 penalized estimator (SLOPE) ) ( m X 1 2 b β = arg minm ||X − β|| + λk |β|(k ) β∈R 2 k =1
where λ1 ≥ λ2 ≥ · · · ≥ λm ; |β|(1) ≥ |β|(2) ≥ · · · ≥ |β|(m) Selection with {i : βbi = 6 0}: −1
p (α/(2m)) ' 2 log m p −1 I λk = Φ (αk /(2m)) ' 2 log(m/k )
I λk = λ = Φ
E. Roquain
Bonferroni BH !
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
32 / 33
Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity
Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics
Main challenge What about dependence ? E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33
Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity
Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics
Main challenge What about dependence ? E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33
Outlook Some conclusions for FDR ⊕ Very simple ⊕ Trade-off type I / power ⊕ Adaptive to sparsity
Some issues ! Sensitive to null hypothesis ! Choosing α ! Calibrating test statistics
Main challenge What about dependence ? E. Roquain
FDR : introduction, enjeux et perspectives. Part I.
FDR in other statistical issues
33 / 33