CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning R´emi Coulom
November, 2011 Advances in Computer Games 13
Introduction The CLOP Algorithm Experiments Conclusion
Problem Definition Presentation Outline
Noisy Black-Box Optimization Problem: Optimizing a game-playing Program Heuristic parameters: evaluation, search, . . . Observation: game outcomes (win or loss (or draw)) Objective: maximize probability of winning 1 Win Loss
Two sub-problems x2
-1 -1
x1
1
Estimate optimal ~x
2
Choose next ~x
1 R´ emi Coulom
CLOP: Confident Local Optimization
2 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Problem Definition Presentation Outline
Presentation Outline
CLOP algorithm
Experiments on artificial data
R´ emi Coulom
CLOP: Confident Local Optimization
3 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
CLOP: Method for Optimum Estimation
1
P
0 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
4 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 1: Data
1
511 /
Win/Loss
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
5 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 2: Quadratic Logistic Regression
1
511 /
Win/Loss Regression
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
6 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 3: Lower Confidence Bound
1
511 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
7 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 4: Discard Samples Below LCB = µ − Hσ
1
511 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
8 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 5: Re-compute Regression with Remaining Samples
1
511 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
9 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Step 6: Iterate Until Convergence
1
511 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
10 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1 /
Win/Loss Regression Mean LCB
P
0 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
11 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
1 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
12 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
3 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
13 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
7 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
14 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
15 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
15 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
31 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
16 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
63 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
17 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
127 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
18 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
255 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
19 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
511 /
Win/Loss Regression Mean LCB
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
20 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Maximum Estimation Sampling Policy
9999
Sampling Policy: Density = Weight
1
511 /
Win/Loss
P
N
0
1 -1
Parameter Value
R´ emi Coulom
CLOP: Confident Local Optimization
1
21 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Smooth 1D function 1
P
0 -1
Parameter
R´ emi Coulom
CLOP: Confident Local Optimization
+1
22 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Smooth 1D function
100 CLOP UCT CEM
regret
10− 5 10
games
R´ emi Coulom
CLOP: Confident Local Optimization
107
23 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Rosenbrock 1
x2
-1 x1
-1 R´ emi Coulom
1
CLOP: Confident Local Optimization
24 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Rosenbrock
100 CLOP UCT CEM
regret
10− 5 10
games
R´ emi Coulom
CLOP: Confident Local Optimization
107
25 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Angle 1
P
0 -1
Parameter
R´ emi Coulom
CLOP: Confident Local Optimization
+1
26 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Angle
100 CLOP UCT CEM
regret
10− 5 10
games
R´ emi Coulom
CLOP: Confident Local Optimization
107
27 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Discontinuous Function 1
P
0 -1
Parameter
R´ emi Coulom
CLOP: Confident Local Optimization
+1
28 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Smooth 1D Function Rosenbrock Angle Discontinuous Function
Discontinuous Function
100 CLOP UCT CEM
regret
10− 5 10
games
R´ emi Coulom
CLOP: Confident Local Optimization
107
29 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Conclusion Summary of CLOP Much faster black-box optimizer than state of the art in games Foolproof: no tricky meta-parameters Popular freeware: http://remi.coulom.free.fr/CLOP/ Future Work High-dimensional problems: more regularization, sparsity Apply to less noisy or noiseless problems (BBOB) Apply CLOP principle to other forms of regression Optimization from self-play Prove convergence R´ emi Coulom
CLOP: Confident Local Optimization
30 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Extra Slide: Code procedure QuadraticCLOP(H, ~x1 , y1 , . . . , ~xN , yN ) w0 ← λ~x .1 . a function of ~x that returns 1 W0 ← N k←0 repeat w ← λ~x . minki=0 wi (~x ) . weight function k ←k +1 qk ← QuadraticLogisticRegression(w , ~x1 , y1 , . . . , ~xN , yN ) µk ← LogisticMean(w , ~x1 , y1 , . . . , ~xN , yN ) σk ← ConfidenceDeviation(w , ~x1 , y1 , . . . , ~xN , yN ) wk ← λ~x .e (qk (~x)−µk )/(Hσk ) Wk ← ΣN xi ), wk (~xi ) i=1 min w (~ until Wk > 0.99 × Wk−1 ~xN+1 ← Random(w ) ~x˜ ← ΣN+1 xi ) xi )~xi /ΣN+1 i=1 w (~ i=1 w (~ end procedure
R´ emi Coulom
. next sample, distributed like w . estimated optimal
CLOP: Confident Local Optimization
31 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Asymptotic Rate of Convergence (Intuitively)
O N −1/2 δ
δ
Bias only. Regret = O
δ δ4
δ
Variance only. Regret = O N −1 δ −2 .
.
Optimal asymptotic bias-variance tradeoff: regret = O N −2/3 .
R´ emi Coulom
CLOP: Confident Local Optimization
32 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Effect of Meta-Parameter H TODO: name of axes. 10−1
10−2
10−3
10−4
10−5 1 10
H H H H H H H H H
= 12 = 10 =8 =6 =4 =3 =2 =1 = 0.8N 1/6
102
103
104
105
106
107
Conclusion (of many other experiments): H = 3 works well in practice. R´ emi Coulom
CLOP: Confident Local Optimization
33 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Extra Slide: Many algorithms 100
10−1
10−2
10−3
10−4
10−5
10−6 1 10
Quadratic CLOP, H = 3 Quadratic CLOP, H = 0.8N 1/6 Cubic CLOP, H = 0.8N 1/4 RSPSA SPSA∗ CEM (Chaslot et al.) CEM (Hu & Hu) UCT 102
103
104
R´ emi Coulom
105
106
CLOP: Confident Local Optimization
107
34 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Extra Slide: 1D Problems
(a) Log
(b) Flat
(c) Power
(d) Angle
(e) Step
R´ emi Coulom
CLOP: Confident Local Optimization
35 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Effect of Meta-Parameter H on Power 10−1
10−2
10−3
10−4
10−5 1 10
H H H H H H H H
= 12 = 10 =8 =6 =4 =3 =2 =1
102
103
R´ emi Coulom
104
105
106
CLOP: Confident Local Optimization
107
36 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Extra Slide: 2D Problems
(f) Rosenbrock
(g) Correlated
R´ emi Coulom
(h) Log2
CLOP: Confident Local Optimization
37 / 38
Introduction The CLOP Algorithm Experiments Conclusion
Extra Slide: Performance on many problems
(i) Log
(j) Log2
(k) Log5
(l) Flat
(m) Rosenbrock
(n) Rosenbrock2
(o) Rosenbrock5
(p) Power
(q) Correlated
(r) Correlated2
(s) Angle
Quadratic CLOP (H = 3),
UCT,
R´ emi Coulom
(t) Step
CEM (Hu & Hu).
CLOP: Confident Local Optimization
38 / 38