The Monte-Carlo Revolution in Go - Remi Coulom

MoGo beats Kim Myungwan (9p) at H9. 2008-09: Crazy Stone beats Kaori Aoba (4p) at H8. 2008-12: Crazy Stone beats Kaori Aoba (4p) at H7. Rémi Coulom.
601KB taille 49 téléchargements 409 vues
The Monte-Carlo Revolution in Go R´emi Coulom Universit´ e Charles de Gaulle, INRIA, CNRS, Lille, France

January, 2009 JFFoS’2008: Japanese-French Frontiers of Science Symposium

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

Game Complexity

Game Tic-tac-toe Connect 4 Checkers Chess Go

Complexity∗ 103 1014 1020 1050 10171

∗ Complexity:

Status Solved manually Solved in 1988 Solved in 2007 Programs > best humans Programs  best humans

number of board configurations

R´ emi Coulom

The Monte Carlo Revolution in Go

2 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

How can we deal with complexity ?

Some formal methods Use symmetries Use transpositions Combinatorial game theory

R´ emi Coulom

The Monte Carlo Revolution in Go

3 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

How can we deal with complexity ?

Some formal methods Use symmetries Use transpositions Combinatorial game theory When formal methods fail Approximate evaluation Reasoning with uncertainty

R´ emi Coulom

The Monte Carlo Revolution in Go

3 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

Dealing with Huge Trees

Full tree

R´ emi Coulom

The Monte Carlo Revolution in Go

4 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

Dealing with Huge Trees

E

E

E

E

E

E

E

E

E

Classical approach = depth limit + pos. evaluation (E) (chess, shogi, . . . ) Full tree

R´ emi Coulom

The Monte Carlo Revolution in Go

4 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Game Complexity How can we deal with complexity ?

Dealing with Huge Trees

E

E

E

E

E

E

E

E

E

Classical approach = depth limit + pos. evaluation (E) (chess, shogi, . . . ) Full tree

Monte-Carlo approach = random playouts R´ emi Coulom

The Monte Carlo Revolution in Go

4 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

A Random Playout

R´ emi Coulom

The Monte Carlo Revolution in Go

5 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

Principle of Monte-Carlo Evaluation

Root Position

Random Playouts MC Evaluation +

+

R´ emi Coulom

=

The Monte Carlo Revolution in Go

6 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

Basic Monte-Carlo Move Selection

Algorithm N playouts for every move Pick the best winning rate 5,000 playouts/s on 19x19

9/10

3/10

4/10 R´ emi Coulom

The Monte Carlo Revolution in Go

7 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

Basic Monte-Carlo Move Selection

Algorithm N playouts for every move Pick the best winning rate 5,000 playouts/s on 19x19 Problems Evaluation may be wrong

9/10

3/10

4/10 R´ emi Coulom

For instance, if all moves lose immediately, except one that wins immediately.

The Monte Carlo Revolution in Go

7 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

Monte-Carlo Tree Search

Principle More playouts to best moves Apply recursively Under some simple conditions: proven convergence to optimal move when #playouts→ ∞ 9/15

2/6

3/9 R´ emi Coulom

The Monte Carlo Revolution in Go

8 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Principle of Monte-Carlo Evaluation Monte-Carlo Tree Search Patterns

Incorporating Domain Knowledge with Patterns

Patterns Library of local shapes Automatically generated Used for playouts Cut branches in the tree Examples (out of ∼30k)

Good

Bad

to move R´ emi Coulom

The Monte Carlo Revolution in Go

9 / 12

Introduction Monte-Carlo Tree Search History Conclusion

History (1/2)

Pioneers 1993: Br¨ ugmann: first MC program, not taken seriously 2000: The Paris School: Bouzy, Cazenave, Helmstetter

R´ emi Coulom

The Monte Carlo Revolution in Go

10 / 12

Introduction Monte-Carlo Tree Search History Conclusion

History (1/2)

Pioneers 1993: Br¨ ugmann: first MC program, not taken seriously 2000: The Paris School: Bouzy, Cazenave, Helmstetter Victories against classical programs 2006: Crazy Stone (Coulom) wins 9 × 9 Computer Olympiad 2007: MoGo (Wang, Gelly, Munos, . . . ) wins 19 × 19

R´ emi Coulom

The Monte Carlo Revolution in Go

10 / 12

Introduction Monte-Carlo Tree Search History Conclusion

History (2/2)

Victories against professional players 2008-03:

MoGo beats Catalin Taranu (5p) on 9 × 9

2008-08:

MoGo beats Kim Myungwan (9p) at H9

2008-09:

Crazy Stone beats Kaori Aoba (4p) at H8

2008-12:

Crazy Stone beats Kaori Aoba (4p) at H7

R´ emi Coulom

The Monte Carlo Revolution in Go

11 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Conclusion Summary of Monte-Carlo Tree Search A major breakthrough for computer Go Works similar games (Hex, Amazons) and automated planning

R´ emi Coulom

The Monte Carlo Revolution in Go

12 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Conclusion Summary of Monte-Carlo Tree Search A major breakthrough for computer Go Works similar games (Hex, Amazons) and automated planning Perspectives Path to top-level human Go ? Adaptive playouts (far from the root) ?

R´ emi Coulom

The Monte Carlo Revolution in Go

12 / 12

Introduction Monte-Carlo Tree Search History Conclusion

Conclusion Summary of Monte-Carlo Tree Search A major breakthrough for computer Go Works similar games (Hex, Amazons) and automated planning Perspectives Path to top-level human Go ? Adaptive playouts (far from the root) ? More information: http://remi.coulom.free.fr/CrazyStone/ Slides, papers, and game records Demo version of Crazy Stone (soon) R´ emi Coulom

The Monte Carlo Revolution in Go

12 / 12