Two Plus Two Poker Forums Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72
 Register FAQ Search Today's Posts Mark Forums Read Video Directory TwoPlusTwo.com

 Notices

 Chess and Other Board Games Discussion of chess and other board game strategy.

12-29-2017, 11:55 PM   #126
ChrisV
Carpal \'Tunnel

Join Date: Jul 2004
Posts: 35,455
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Nah that's off base in a number of ways. The process is:

- It seeds its evaluation function with random weightings. This will produce random evaluations of positions and hence random play.
- In each position, it does a Monte-Carlo tree search, using its evaluation function to evaluate future positions
- When the game ends, if it lost, it alters its weightings such that its evaluation function will evaluate the positions in the game more negatively, and the opposite if it wins
- The evaluation function is now very slightly better than random. It plays another game.

It doesn't learn heuristic rules in the way you suggest, it simply improves its evaluation function gradually over time. It doesn't either apply rules or choose a random move; it just applies its evaluation function every time and over time it gradually becomes less random. It's likely that material advantage is one of the things it first learns to value, because in the set of all possible chess positions, that's one of the most obvious markers of a winning position. But as I say, it's not a heuristic, it's inherent in the weightings in the network. It's difficult to explain that in a way that is conceptual rather than mathematical. The only way to figure out what it is valuing is to change positions slightly and see how its evaluation changes. If you gave it a whole pile of quiet positions and removed a knight from one side and a bishop from the other and asked it to re-evaluate, you could figure out what it thinks those pieces are worth relative to each other on average, but it's only an average.

I think e4 and d4 would be preferred quite early because the number of possible moves your pieces have is an easy metric and e4 and d4 open up the queen and bishop.

Quote:
 It also teaches itself how to prune searches. It looks 20 moves ahead each move which means looking 1 move, 2 moves etc. ahead first. It knows that sometimes "This move looks really ****ty if you look 3 moves ahead, but it's the best move looking 20 moves ahead". But it also knows *under what conditions such sacrifice type plays tend to occur*. So it teaches itself when to consider sacrifice a piece and when not to, and it's much better at recognising this (and pruning) that current engines.
None of this makes any sense. All A0 ever changes is its evaluation function. It has no capacity to change anything about the way it does its tree search. I read the paper and have a better understanding of this now:

Quote:
 Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilises a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board position s as an input and outputs a vector of move probabilities p with components pa = P r(a|s) for each action a, and a scalar value v estimating the expected outcome z from position s, v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from selfplay; these are then used to guide its search. Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.
The following all concerns actual play, not learning:

Here's a super contrived example to try to explain this to the best of my understanding. Let's say you have a position A with only 2 legal moves, 1 and 2. AlphaZero's initial evaluation is some scalar value, let's say +0.5, and a vector of move probabilities for each move, say { 0.75, 0.25 }. But let's say that move 2 leads to a reasonably easily found forced mate. So initially, when it does Monte Carlo rollouts, it's selecting move 1 75% of the time and move 2 25% of the time, but it keeps track of visit counts as well. So if it randomly picks move 1 like 10 times in a row, that 25% chance of selection for move 2 will be modified to become more and more likely to be chosen. The exact details of the move selection algorithm aren't in the preprint, but will presumably be in the full paper.

Each time it selects a move, it plays out a full game from that point, using the same process recursively. The wins and losses in these games form the basis of a score which it gives to that move - what they call "value" in the quote above. Over time, it will notice that move 2 is actually getting a really great value. This then becomes a basis to choose it more often during the Monte Carlo rollouts, basically to investigate more thoroughly and see if the value is legit. (That's what the quote means when it says the three criteria for what moves are selected during MCTS are "low visit count, high move probability and high value" - exactly how these criteria are balanced against each other is unclear). In turn that leads to position A's value rapidly improving. You can see how if position A was actually down the search tree a bit, the good news about the forced win in move 2 would propagate up the tree, gradually making it more and more likely that the sequence of moves leading to it will be selected. So basically AlphaZero makes an initial evaluation of what moves are a good idea, that initial guess guides a series of self-play games, and its opinion of what moves are a good idea is modified based on the results of those games. Those modifications gradually propagate from the leaves of the tree back to the root.

Last edited by ChrisV; 12-30-2017 at 12:09 AM.

 01-01-2018, 08:58 AM #127 Yeti Abominable     Join Date: Jun 2004 Posts: 22,698 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 AlphaGo doc was just added to Netflix
 01-01-2018, 10:36 AM #128 ChrisV Carpal \'Tunnel     Join Date: Jul 2004 Location: Adelaide, Australia Posts: 35,455 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Thanks, will definitely watch, looks interesting. Well reviewed.
01-01-2018, 10:51 AM   #129
ChrisV
Carpal \'Tunnel

Join Date: Jul 2004
Posts: 35,455
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

lol, I looked at one of the reviews on RT. This is the opening couple sentences:

Quote:
 The game of go is over 3,000 years old. In all that time, it has never been 'solved' as chess has; there are some preferred starting strategies but no optimal overall strategy has ever been developed.
Not off to a good start.

01-01-2018, 12:17 PM   #130
daveopie
old hand

Join Date: Jul 2009
Posts: 1,426
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
 Originally Posted by Yeti AlphaGo doc was just added to Netflix
Thanks. I didn't know about it, and just watched it. I enjoyed it. There wasn't much in there about the computer science used or specific Go strategy - instead it focused on the human aspect, such as the drama of the outcome, and the feelings during the matches of the 2 men who played against Alpha Go. I guess that's to be expected from a Netflix show, and I thought it was well done.

01-01-2018, 01:45 PM   #131
well named
poorly undertitled

Join Date: Jun 2007
Location: esse est coesse
Posts: 73,640
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
 Originally Posted by ChrisV No, it starts its evaluation function with random weights, which will produce random play, but the evaluation function is updated as it learns.
Yeah, that sounds right. Thanks for the correction. I need to go back and read the article(s) Google published that I only skimmed before. Still, I'm having trouble with the idea of it getting stuck in some local part of the search space for bridge but not for chess. But at this point probably it's up to me to go read all that again, rather than on you to convince me

Quote:
 Originally Posted by ChrisV In chess, good moves are good moves even if play thereafter is suboptimal.
Sure, but part of my point was that it doesn't know whether a move is good until the end of the game, as far as closing the loop on training feedback. Obviously the point is to develop the evaluation function which then does have an opinion on moves in isolation, but during training the only feedback is the end of the game.

Quote:
 Originally Posted by ChrisV In bridge bidding, 1S is a better opening bid with a strong hand and spades than 4S, but only with highly specific followups.
It sounds to me like saying that a particular opening is good, but only some variations? But it looked to me like it was able, over time, to work some of that out in training.

Quote:
 Originally Posted by ChrisV By the "easy path" I mean that neural networks will select for whatever gives the largest initial gains, rather than routes that will ultimately produce the best outcomes.
And yet this would seem to imply a fairly large problem for it to learn to play chess well also, but it seems to do much better in deeper or more closed positions like this than other engines do? That was my impression based on a couple of the annotated games I saw.

Anyway, it's all cool stuff. Sorry if I'm just being annoying out of ignorance. When I have more time I'll try to educate myself a little more, especially on bidding in bridge.

01-01-2018, 02:23 PM   #132
Louis Cyphre
Carpal \'Tunnel

Join Date: Jun 2006
Posts: 9,828
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
 Originally Posted by well named Sure, but part of my point was that it doesn't know whether a move is good until the end of the game, as far as closing the loop on training feedback. Obviously the point is to develop the evaluation function which then does have an opinion on moves in isolation, but during training the only feedback is the end of the game.
Is it possible that A0 can do things like "if I make move x when I see the current pattern it often leads pattern y. Pattern y is correlated with winning therefore I should make move x"?

01-01-2018, 02:55 PM   #133
well named
poorly undertitled

Join Date: Jun 2007
Location: esse est coesse
Posts: 73,640
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
 Originally Posted by Louis Cyphre Is it possible that A0 can do things like "if I make move x when I see the current pattern it often leads pattern y. Pattern y is correlated with winning therefore I should make move x"?
Yes, as I understand it. Here's how the Arxiv article puts it:

Quote:
 Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general- purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simu- lated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.
A "state s" means a complete board state, i.e. the position of every piece, and as I understand it pattern recognition is one of the main strengths of artificial neural networks, in the sense that the trained algorithm will favor similar kinds of moves across board states that are distinct but have various commonalities. It will come to analyze those positions as being similar, hence pattern recognition.

That paragraph is also probably pretty essential to the questions I was asking Chris. I'm reading the part about it returning a probability distribution for the next move 'a' as a way of avoiding the kind of pigeon-holing he thinks will occur, but he's pointing out that the probability distribution will become heavily weighted and might effectively prune a lot of options too early.

 01-01-2018, 05:17 PM #134 br3nt00 veteran     Join Date: Jan 2009 Posts: 2,565 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 AlphaGo documentary was great Magnus documentary not as good but still good
 01-08-2018, 08:12 PM #135 Lateralus journeyman   Join Date: Nov 2005 Posts: 285 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Just bumping this to agree that the AlphaGo Netflix movie was awesome, and is now available in most EU countries too...
 01-11-2018, 02:50 AM #136 DanSmithHolla newbie   Join Date: Oct 2017 Posts: 16 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Are there any plans of AlphaGo playing in the future? Surely someone must have looked up what it plays against the Najdorf...
 01-12-2018, 06:45 PM #137 jalfrezi Carpal \'Tunnel     Join Date: May 2012 Location: Long gone, long long gone Posts: 6,671 Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 You mean AlphaZero. It's interesting that we've only been allowed to see 10 of the 100 games. It played 44 million games against itself while learning, which isn't a huge number for a game such as chess with ~10^100 possible games. Is it possible that most of the other 90 games were substandard in some way that Google found embarrassing (eg drawing technically won endings, playing suboptimal openings), though not sufficiently so for Stockfish to be able to win? (And as any Najdorf player knows, there's no good way of playing against it. )

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home Links to Popular Forums     News, Views, and Gossip     Beginners Questions     Marketplace & Staking     Casino & Cardroom Poker     Internet Poker     NL Strategy Forums     Poker Goals & Challenges     Las Vegas Lifestyle     Sporting Events     Politics     Other Other Topics Two Plus Two     About the Forums     Two Plus Two Magazine Forum     The Two Plus Two Bonus Program     Two Plus Two Pokercast     The Best of Two Plus Two Marketplace & Staking     Commercial Marketplace     General Marketplace     Staking - Offering Stakes     Staking         Staking - Offering Stakes         Staking - Seeking Stakes         Staking - Selling Shares - Online         Staking - Selling Shares - Live         Staking Rails         Transaction Feedback & Disputes     Transaction Feedback & Disputes Coaching & Training     Coaching Advice     Cash Game Poker Coach Listings     Tournament/SNG Poker Coach Listings Poker News & Discussion     News, Views, and Gossip     Poker Goals & Challenges     Poker Beats, Brags, and Variance     That's What She Said!     Poker Legislation & PPA Discussion hosted by Rich Muny     Twitch - Watch and Discuss Live Online Poker     Televised Poker     Two Plus Two Videos General Poker Strategy     Beginners Questions     Books and Publications     Poker Tells/Behavior, hosted by: Zachary Elwood     Poker Theory     Psychology No Limit Hold'em Strategy     Medium-High Stakes PL/NL     Micro-Small Stakes PL/NL     Medium-High Stakes Full Ring     Micro-Small Stakes Full Ring     Heads Up NL     Live Low-stakes NL Limit Texas Hold'em Strategy     Mid-High Stakes Limit     Micro-Small Stakes Limit Tournament Poker Strategy     STT Strategy     Heads Up SNG and Spin and Gos     Mid-High Stakes MTT     Small Stakes MTT     MTT Community     Tournament Events Other Poker Strategy     High Stakes PL Omaha     Small Stakes PL Omaha     Omaha/8     Stud     Draw and Other Poker Live Poker     Casino & Cardroom Poker         Venues & Communities         Regional Communities     Venues & Communities     Tournament Events         WPT.com     Home Poker     Cash Strategy     Tournament Strategy Internet Poker     Internet Poker         Winning Poker Network         nj.partypoker.com         Global Poker     Commercial Software     Software         Commercial Software         Free Software General Gambling     Backgammon Forum hosted by Bill Robertie.     Probability     Sports Betting     Other Gambling Games 2+2 Communities     Other Other Topics         OOTV         Game of Thrones     The Lounge: Discussion+Review     EDF     Las Vegas Lifestyle     BBV4Life         omg omg omg     House of Blogs Sports and Games     Sporting Events         Single-Team Season Threads         Fantasy Sports     Fantasy Sports         Sporting Events     Wrestling     Golf     Chess and Other Board Games     Video Games         League of Legends         Hearthstone     Puzzles and Other Games Other Topics     Politics     History     Business, Finance, and Investing     Science, Math, and Philosophy     Religion, God, and Theology     Travel     Health and Fitness     Laughs or Links!     Computer Technical Help     Programming International Forums     Deutsch         BBV [German]     Français     Two Plus Two en Espańol

All times are GMT -4. The time now is 04:38 AM.

 Contact Us - Two Plus Two Publishing LLC - Privacy Statement - Top