Concentration inequalities for order statistics Using the entropy method and Rényi’s representation
S. Boucheron1 1
M. Thomas1
LPMA Université Paris-Diderot
Probability and Harmonic Analysis, Angers, September 2012
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
1 / 30
Motivation
Concentration, asymptotics for order statistics
Background : order statistics Sample : X1 , . . . , Xn ∼i.i.d. F Order statistics X1,n ≥ . . . ≥ Xn,n non-increasing rearrengement of X1 , . . . , Xn . If n clear from context, X1,n , . . . , Xn,n denoted by X(1) , . . . , X(n) . X(1) : sample maximum X(n/2) : sample median ... Extreme value theory and classical statistics Asymptotic distributions Convergence of moments ....
Goal: derive simple, non-asymptotic variance/tail bounds for order statistics
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
2 / 30
Motivation
Concentration, asymptotics for order statistics
Background : concentration Concentration of measure phenomenon Any function of many independent random variables that does not depend too much on any of them is concentrated around its mean value. A new (non-asymptotic) look at independence Example: Gaussian concentration (Bonami,Beckner, Nelson, Gross, Borell, Ehrhard, Bobkov, Ledoux, ...)
X = (X1 , . . . , Xn ) a standard Gaussian vector Poincaré’s inequality: Var f (X ) ≤ Ek∇f k2 Gross logarithmic Sobolev inequality: Ent(f (X )2 ) ≤ 2Ek∇f k2 Cirelson’s inequality: P{f (X ) ≥ Ef (X ) + t} ≤ exp(−t 2 /(2L2 )) if k∇f k ≤ L
Product spaces: Talagrand’s inequalities Order statistics are not (usually) sums of independent random variables
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
3 / 30
Motivation
Off the shelf
Off-the shelf concentration inequalities and order statistics
f (X1 , . . . , Xn ) = max(X1 , . . . , Xn ): a simple function of many independent random variables that does not depend too much on any of them.
Scenario : Xi are standard Gaussian Almost surely, k∇f k = 1. Poincaré’s inequality ⇒ Var(f (X1 , . . . , Xn )) ≤ 1 Extreme Value Theory asserts : Var(max(X1 , . . . , Xn )) = O(1/ log n)
We do not understand (clearly) in which way the maximum is a smooth function of the sample.
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
4 / 30
Order statistics
Central, intermediate and extreme order statistics X1 , . . . , Xn ∼i.i.d. F Order statistics X1,n ≥ . . . ≥ Xn,n non-increasing rearrengement of X1 , . . . , Xn . If n clear from context, X1,n , . . . , Xn,n denoted by X(1) , . . . , X(n) . (Xk ,n ) is a sequence of extreme order statistics, central order statistics, intermediate order statistics,
if k fixed, n → ∞; if k /n → p ∈ (0, 1) while, n → ∞; if k /n → 0, k → ∞.
Different asymptotics Central and intermediate order statistics (often): Extreme order statistics (sometimes):
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Gaussian Generalized Extreme Value
Angers 2012
5 / 30
Variance bounds
Order statistics and spacings
Variance bounds, order statistics and spacings
A connection The variance (and more generally the higher moments) of the k th order statistics can be upper-bounded by moments of the k th spacing X(k ) − X(k +1) . Lemma (Jackknife bounds) Var[X(k ) ] ≤ k E
h
X(k ) − X(k +1)
2 i
.
Convention ∆k = X(k ) − X(k +1)
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
6 / 30
Variance bounds
Order statistics and spacings
Proof (i) Theorem (Efron-Stein inequalities, 1981) Let f : Rn → R be measurable, and let Z = f (X1 , . . . , Xn ). Let Zi = fi (X1 , . . . , Xi−1 , Xi+1 , . . . , Xn ) where fi : Rn−1 → R is an arbitrary measurable function. Suppose Z is square-integrable. Then " n # X 2 Var[Z ] ≤ E (Z − Zi ) . i=1
Efron-Stein inequalities provide a key ingredient in the derivation of Poincaré’s inequality. Pn
i=1
2
(Z − Zi ) is a jackknife estimate of variance.
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
7 / 30
Variance bounds
Order statistics and spacings
Proof (ii)
Z = X(k ) Zi as the rank k statistic from subsample X1 , . . . , Xi−1 , Xi+1 , . . . , Xn : ( Zi = X(k +1) if Xi ≥ X(k ) Zi = Zi = Z otherwise. Jackknife estimate of variance of X(k ) : n X (Z − Zi )2 = i=1
S. Boucheron & M. Thomas (LPMA)
X
(X(k ) − X(k +1) )2 = k ∆2k
i:Xi ≥X(k )
Concentration & order statistics
Angers 2012
8 / 30
Variance bounds
Asymptotics for extremes
Asymptotic assessment for extreme order statistics Definition (Quantile function) F ← (p) = inf {x : F (x) ≥ p} Definition (MDA(γ), γ ∈ R) F ∈ MDA(γ) if the exists a function a : R+ → R+ , such that max(X1 , . . . , Xn ) − F ← (1 − 1/n) P ≤ x → exp −(1 + γx)−1/γ a(n)
> 0 according to the sign of extreme value index γ = 0 0, letting Xn+1,n = 0, Xk ,n =
n X
(Xi,n − Xi+1,n )
i=k
where the spacings ∆i = (Xi,n − Xi+1,n )i=1,...,n form an independent family of random variables and i × (Xi,n − Xi+1,n ) ∼ F
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
12 / 30
Rényi’s representation
Quantile transformations
Quantile transformation Definition (Quantile function (bis)) F ← (p) = inf {x : F (x) ≥ p} , p ∈ (0, 1)
U(t) = F ← (1 − 1/t), t ∈ (1, ∞)
Representation for order statistics If Y(1) , . . . , Y(n) are the order statistics of an exponential sample, then F ← 1 − exp(−Y(i) ) i=1,...,n is distributed as the order statistics of a sample drawn according to F . Representation for order statistics If Y(1) , . . . , Y(n) are the order statistics of an exponential sample, then U(eY(1) ) ≥ U(eY(2) ) ≥ . . . ≥ U(eY(n) ) is distributed as the order statistics of a sample drawn according to F . S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
13 / 30
Rényi’s representation
Hazard rates
Hazard rate, spacings and order statistics Definition (Hazard rate) The hazard rate of a differentiable distribution function F is F 0 /F = F 0 /(1 − F ). Lemma The distribution function F has non-decreasing hazard rate, iff U ◦ exp is concave. Lemma If the distribution function F has non-decreasing hazard rate, then X(k +1) and ∆k = X(k ) − X(k +1) are negatively associated. Negative association For increasing functions f , g E f (X(k +1) g(∆k ) ≤ E f (X(k +1) E [g(∆k )] S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
14 / 30
Rényi’s representation
Hazard rates
Gaussian hazard rate U(t) = Φ← (1 − 1/t) for t > 1. 4
4
3
U(2 exp(x))
U(exp(x))
3 2 1 0
Gaussian distribution
2
Absolute value of Gaussian random variable
1
−1 0 0
2
4
6
8
10
0
2
4
6
8
10
Gaussian and absolute value of Gaussian have non-decreasing hazard rate. The absolute value of a Gaussian random variable has both non-decreasing hazard rate and bounded inverse hazard rate. S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
15 / 30
Rényi’s representation
Hazard rates
Taking advantage of increasing hazard rate Lemma If F has non-decreasing hazard rate h, then for 1 ≤ k ≤ n/2, 2 h Var X(k ) ≤ EVk ≤ E k
2 1 h(X(k +1) )
i
,
Lemma Let n ≥ 3, let X(1) ≥ . . . ≥ X(n) be the order statistics of absolute values of a standard Gaussian sample, For 1 ≤ k ≤ n/2,
S. Boucheron & M. Thomas (LPMA)
Var[X(k ) ] ≤
8 1 2n k log 2 log k − log(1 +
Concentration & order statistics
4 k
log log 2n k )
.
Angers 2012
16 / 30
Exponential Efron-Stein inequalities
Modified logarithmic Sobolev inequalities
Goal
Context If F has increasing hazard rate (more concentrated than exponential), extreme and intermediate order statistics have exponentiel moments. Target Derive Establishing Exponential Efron-Stein inequalities Bernstein-like deviation inequalities statistics. for order statitics
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
17 / 30
Exponential Efron-Stein inequalities
Modified logarithmic Sobolev inequalities
Modified logarithmic Sobolev inequalities Theorem (M ODIFIED LOGARITHMIC S OBOLEV INEQUALITY. L. W U, P. M ASSART, 2000) Let τ (x) = ex − x − 1. Then for any λ ∈ R, Ent eλZ = E eλZ log eλZ − E eλZ log E eλZ = λE ZeλZ − E eλZ log E eλZ " n # X ≤ E eλZ τ (−λ(Z − Zi )) i=1
Remark Logarithmic-Sobolev inequalities and Efron-Stein inequalities are derived in a similar way, proofs rely on variational representations of variance and entropy. S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
18 / 30
Exponential Efron-Stein inequalities
Entropy, order statistics and spacings
Application to order statistics
Notation ψ(x) = ex τ (−x) = 1 + (x − 1)ex Lemma For all λ ∈ R, Ent eλX(k ) ≤ k E eλX(k +1) ψ(λ(X(k ) − X(k +1) )) = k E eλX(k +1) ψ(λ∆k ) Proof parallels the variance bounds derived from Efron-Stein inequalities.
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
19 / 30
Exponential Efron-Stein inequalities
Entropy, order statistics and spacings
Bernstein bounds, sub-Gamma distributions
Sub-gamma on the right tail with variance factor v and scale parameter c log Eeλ(X −EX ) ≤
λ2 v for every λ 2(1 − cλ)
such that
0 < λ < 1/c .
Bernstein’s inequality n o √ for t > 0, P X ≥ EX + 2vt + ct ≤ exp (−t) .
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
20 / 30
Exponential Efron-Stein inequalities
Entropy, order statistics and spacings
Exponential Efron-Stein inequality for order statistics
Vk = k ∆2k : the Efron-Stein estimate of the variance of X(k ) . Theorem If F has non-decreasing hazard rate h, then for λ ≥ 0, and 1 ≤ k ≤ n/2, log Eeλ(X(k ) −EX(k ) )
S. Boucheron & M. Thomas (LPMA)
k ≤ λ E ∆k eλ∆k − 1 2 " # r k Vk λ√Vk /k = λ E e −1 . 2 k
Concentration & order statistics
Angers 2012
21 / 30
Exponential Efron-Stein inequalities
Entropy, order statistics and spacings
Assessment
Does not follow from exponential Efron-Stein inequality from B., Lugosi and Massart (Ann. Probab. 2003). log Eeλ(X(k ) −EX(k ) ) ≤
λθ log EeλVk /θ for θ > 0, 0 ≤ λ ≤ 1/θ 1 − λθ
as Vk may not have exponential moments! Sharp (up to constants) for exponential samples. Works both for central, intermediate and extreme order statistics.
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
22 / 30
Exponential Efron-Stein inequalities
Hazard rates, association, Herbst’s arguments
Proof (i)
ψ(x) = x(ex − 1) is non-decreasing over R+ , X(k +1) and ∆k are negatively associated: Ent eλX(k ) ≤ ≤ ≤
k E eλX(k +1) ψ(λ∆k ) k E eλX(k +1) × E [ψ(λ∆k )] k E eλX(k ) × E [ψ(λ∆k )] .
Multiplying both sides by exp(−λEX(k ) ), leads to h i h i Ent eλ(X(k ) −EX(k ) ) ≤ k E eλ(X(k ) −EX(k ) ) × E [ψ(λ∆k )] .
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
23 / 30
Exponential Efron-Stein inequalities
Hazard rates, association, Herbst’s arguments
Proof (ii) Herbst’s argument
Let G(λ) = Eeλ∆k . Obviously, G(0) = 1, and as ∆k ≥ 0, G and its derivatives are increasing on [0, ∞), E [ψ(λ∆k )] = 1 − G(λ) + λG0 (λ) =
Z
λ
sG00 (s)ds ≤ G00 (λ)
0
λ2 . 2
Hence, for λ ≥ 0, Ent eλ(X(k ) −EX(k ) ) d 1 log Eeλ(X(k ) −EX(k ) ) k dG0 λ(X −EX ) = λ ≤ . (k ) dλ 2 dλ λ2 E e (k )
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
24 / 30
Exponential Efron-Stein inequalities
Hazard rates, association, Herbst’s arguments
Proof (iii) solving the differential inequality
Integrating both sides, using the fact that 1 log Eeλ(X(k ) −EX(k ) ) = 0, λ→0 λ lim
leads to 1 log Eeλ(X(k ) −EX(k ) ) λ
≤ =
S. Boucheron & M. Thomas (LPMA)
k 0 (G (λ) − G0 (0)) 2 k E ∆k eλ∆k − 1 . 2
Concentration & order statistics
Angers 2012
25 / 30
Exponential Efron-Stein inequalities
Gaussian order statistics
Maxima of Gaussians Lemma For n such that the solution vn of equation 16/x + log(1 + 2/x + 4 log(4/x)) = log(2n) is smaller than 1, for all 0 ≤ λ < √1vn , log Eeλ(X(1) −EX(1) ) ≤ For all t > 0,
vn λ2 √ . 2(1 − vn λ)
n √ o √ P X(1) − EX(1) > vn (t + 2t) ≤ e−t .
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
26 / 30
Exponential Efron-Stein inequalities
Gaussian order statistics
Median of Gaussians ... The same approach works for extreme, intermediate and central order statistics Lemma Let vn = 8/(n log 2). √ For all 0 ≤ λ < n/(2 vn ), log Eeλ(X(n/2) −EX(n/2) ) ≤
vn λ2 p . 2(1 − 2λ vn /n)
For all t > 0, n o p p P X(n/2) − EX(n/2) > 2vn t + 2 vn /nt ≤ e−t .
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
27 / 30
Caveat
Assessment
Rényi’s representation : order statistics are functions of sums of independent random variables (spacings of exponential samples).
If the function is concave, concavity may be used twice.
What about plugging tail bounds for order statistics of exponential samples ?
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
28 / 30
Caveat
Ad hoc arguments
What can be obtained from Rényi’s representation and exponential inequalities for sums of Gamma-distributed random variables ? Lemma Let X(1) be the maximum of the absolute values of n independent standard e Gaussian random variables, and let U(s) = Φ← (1 − 1/(2s)) for s ≥ 1. For t > 0, n o √ e e P X(1) − EX(1) ≥ t/(3U(n)) + t/U(n) + δn ≤ exp (−t) , 3 e where δn > 0 and limn (U(n)) δn =
S. Boucheron & M. Thomas (LPMA)
π2 12
.
Concentration & order statistics
Angers 2012
29 / 30
Caveat
References
S. B. and M. Thomas. Concentration inequalities for order statistics. http://arxiv.org/abs/1207.7209 P. Massart. Concentration inequalities and model selection Springer. 2006. Lecture Notes in Mathematics 1896. L. de Haan and A. Ferreira. Extreme value theory. Springer. 2006
S. Boucheron & M. Thomas (LPMA)
Concentration & order statistics
Angers 2012
30 / 30