Monte-Carlo Tree Search in Crazy Stone R´emi Coulom Universit´ e Charles de Gaulle, INRIA, CNRS, Lille, France
November 8th-10th, 2007 UEC and 12th Game Programming Workshop, Japan
Talk Outline
1
Introduction
2
Crazy Stone’s Algorithm Principles of Monte-Carlo Evaluation Tree Search Patterns
3
Playing Style
4
Conclusion
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
A New Approach to Go The Challenge of Go strongest programs weaker than amateur humans Difficulty of Position Evaluation has to be dynamic unlike quiescence search + static evaluation of western chess local search lacks global understanding The Monte-Carlo Approach random playouts dynamic evaluation with global understanding
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
The Monte-Carlo Revolution: Pioneers
1993: Bernd Br¨ ugmann (Gobble) Not considered seriously 2000-2005: The Paris School Bernard Helmstetter (Oleg) Tristan Cazenave (Golois) Bruno Bouzy (Indigo) Guillaume Chaslot (Mango), joined in 2005
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
The Monte-Carlo Revolution: Success
2006: Success on small boards Crazy Stone wins 9 × 9 Computer Olympiad Viking (Magnus Persson), then Crazy Stone, then MoGo (Yizao Wang and Sylvain Gelly) lead 9 × 9 CGOS 2007: Success on all boards MoGo wins 19 × 19 Computer Olympiad Steenvreter (Erik van der Werf) wins 9 × 9 Crazy Stone beats KCC Igo with a score of 15-4 on 19 × 19
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Principle: Random Playouts
One Playout Play at random Don’t fill-up eyes Position Evaluation Run many playouts Average them
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Move-Selection Method
Algorithm N playouts for every move pick the best winning rate Cost
√ accurate like 1/ N 0.01 precision requires ∼ 10, 000 playouts
9/10
3/10
4/10 R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Efficient Playout Allocation Idea more playouts to best moves UCB: Upper Confidence Bound r Wi log t UCBi = +c Ni Ni Wi : wins (move i) Ni : playouts (move i) c: exploration parameter 14/15
2/6
4/9 R´ emi Coulom
t: playouts (all moves) Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Recursive Tree Search: UCT
Apply UCB to every position visited more than N0 times No min-max backup: backup average outcome Proved convergence to min-max value Best-first tree growth
9/15
2/6
3/9 R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Efficiency of Tree Search Successes gold in Turin Olympiad on 9 × 9 9 × 9 level on KGS: about 10k strength scales with thinking time only domain knowledge: don’t fill eyes, and in atari, extend Limits Not deep enough, even on 9 × 9 Too many moves on 19 × 19 19 × 19 level on KGS: about 30k
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Patterns
learnt from human games Combine several features:
High probability
shape (surrounding stones) distance to previous move capture, extension ...
Probability distribution over moves Used in playouts Low probability
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Random playout with patterns
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Comparison 1
no patterns
patterns R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Comparison 2
no patterns
patterns R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Progressive Widening
Sort moves with patterns Keep best moves only Progressively add more
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Principles of Monte-Carlo Evaluation Tree Search Patterns
Playing Strength
Stronger than classical programs on 19 × 19 Ranked 2k on KGS
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Crazy Fuseki
MoGo Crazy Stone
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Play in the Center
GNU Go Crazy Stone
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Win by 0.5, Lose by a lot
Crazy Stone Jimmy
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Speculative Attacks: Provoke Opponent Blunder
Go Intellect Crazy Stone
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Speculative Attacks: Another Tricky Move
Miel (human) Crazy Stone
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Ugly Blunder
Crazy Stone Human
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Future of Monte-Carlo Search
Improving Crazy Stone further More knowledge: playouts + progressive widening Adaptive playouts
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Adaptive playouts
adaptive UCB policy
static playout policy
Interesting ideas in RLGO (David Silver) R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
Future of Monte-Carlo Search
Application to Other Domains Other games (Hex, Clobber) Automated book learning (for chess?) Automated Planning in general
R´ emi Coulom
Monte-Carlo Tree Search in Crazy Stone
Introduction Crazy Stone’s Algorithm Playing Style Conclusion
If You Wish to Know More
http://remi.coulom.free.fr/Hakone2007/ Download these slides Download papers Connect to KGS and play against Crazy Stone
lions of nodes (up to 200 million nodes per second for Deep Blue [3]), and traditional node-and-link diagrams do not allow to represent them conveniently.
In this paper, an enhanced formula and some heuristics are proposed. A state-of-the-art Go-playing program Erica was used to run the experiments on 19Ã19 ...
Mar 13, 2007 - 2 log(p) ni. (1). In this paper we consider a max search (the minimax problem ..... the leaf i is chosen, this means that Xâ,nâ +cnâ ⤠Xi,ni +cni .
MCTS has also substantially advanced the state-of-the-art in .... It is, for example, successfully applied in the Go program. MANGO. ...... tree search,â New Math.
q RMI sends parameters to a remote object and gets result back, just like local method .... q To use Java 2, security policy file needed (called âjava.policyâ) grant {.
9 mars 2018 - Par Katie Deneault, CRHA. Collaboratrice. Association minière du Québec. De par leur nature, les métiers liés à l'industrie minière fascinent et.
in [5,6]. Issues arising when MPLS techniques are applied to IP multicast are overviewed. Following ...... Fall, K., Varadhan, K.: The NS Manual. UC Berkeley, LBL,USC/ISI, ... Juniper Networks T640 Performance Test Report. Technical Report ...
Sep 11, 2011 - digits correct within 110 minutes (after clicking on Claim Bonus). 4. Answer key for all sudokus is one row (left-to-right) and one column (top-to-.
grams. The basic idea of this paper is to use an averaged win proba- ..... pending on the progress of the game and the number of play- outs. The curves for 500 ...
9 mars 2018 - Un groupe de travail composé de gestionnaires et de spécialistes œuvrant au cœur de l'industrie a été constitué pour valider chaque étape du ...
4 juil. 2017 - tre-vingt-quatre espèces d'oiseaux, six es- pèces de reptiles et d'amphibiens et dix- neuf espèces de mammifères, dont un lynx et un coyote, y ...
This document assumes that you are using Java 2 SDK, v1.2 or later. ... RMISocketFactory which is abstract and implements both interfaces. Just be aware that ...
4 juil. 2017 - sont toujours accessibles au public. Ce ... des communications n'est pas toujours simple à ... des communications (PGCC) et encore, qu'est-ce ...
A manager's job is adding business value to his or her organization. .... It can be argued convincingly that the nature of business value has evolved ..... pointed to the top line and continued: how can you convince sales managers that.
les coulisses de montecarlo par hector henriett ancien | Read & Download Ebook les coulisses de montecarlo par hector henriett ancien for free at our Online ...
I miss the power. I'd like to go out taking a shower. But there's a heavy cloud ... Baby anyhow I'll get another toy. And everything will happen. And you'll wonder.