Two Plus Two Publishing LLC Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > >

Notices

Chess and Other Board Games Discussion of chess and other board game strategy.

Reply
 
Thread Tools Display Modes
Old 12-29-2017, 11:55 PM   #126
ChrisV
Carpal \'Tunnel
 
ChrisV's Avatar
 
Join Date: Jul 2004
Location: Adelaide, Australia
Posts: 36,220
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Nah that's off base in a number of ways. The process is:

- It seeds its evaluation function with random weightings. This will produce random evaluations of positions and hence random play.
- In each position, it does a Monte-Carlo tree search, using its evaluation function to evaluate future positions
- When the game ends, if it lost, it alters its weightings such that its evaluation function will evaluate the positions in the game more negatively, and the opposite if it wins
- The evaluation function is now very slightly better than random. It plays another game.

It doesn't learn heuristic rules in the way you suggest, it simply improves its evaluation function gradually over time. It doesn't either apply rules or choose a random move; it just applies its evaluation function every time and over time it gradually becomes less random. It's likely that material advantage is one of the things it first learns to value, because in the set of all possible chess positions, that's one of the most obvious markers of a winning position. But as I say, it's not a heuristic, it's inherent in the weightings in the network. It's difficult to explain that in a way that is conceptual rather than mathematical. The only way to figure out what it is valuing is to change positions slightly and see how its evaluation changes. If you gave it a whole pile of quiet positions and removed a knight from one side and a bishop from the other and asked it to re-evaluate, you could figure out what it thinks those pieces are worth relative to each other on average, but it's only an average.

I think e4 and d4 would be preferred quite early because the number of possible moves your pieces have is an easy metric and e4 and d4 open up the queen and bishop.

Quote:
It also teaches itself how to prune searches. It looks 20 moves ahead each move which means looking 1 move, 2 moves etc. ahead first. It knows that sometimes "This move looks really ****ty if you look 3 moves ahead, but it's the best move looking 20 moves ahead". But it also knows *under what conditions such sacrifice type plays tend to occur*. So it teaches itself when to consider sacrifice a piece and when not to, and it's much better at recognising this (and pruning) that current engines.
None of this makes any sense. All A0 ever changes is its evaluation function. It has no capacity to change anything about the way it does its tree search. I read the paper and have a better understanding of this now:

Quote:
Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilises a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board position s as an input and outputs a vector of move probabilities p with components pa = P r(a|s) for each action a, and a scalar value v estimating the expected outcome z from position s, v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from selfplay; these are then used to guide its search.

Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.
The following all concerns actual play, not learning:

Here's a super contrived example to try to explain this to the best of my understanding. Let's say you have a position A with only 2 legal moves, 1 and 2. AlphaZero's initial evaluation is some scalar value, let's say +0.5, and a vector of move probabilities for each move, say { 0.75, 0.25 }. But let's say that move 2 leads to a reasonably easily found forced mate. So initially, when it does Monte Carlo rollouts, it's selecting move 1 75% of the time and move 2 25% of the time, but it keeps track of visit counts as well. So if it randomly picks move 1 like 10 times in a row, that 25% chance of selection for move 2 will be modified to become more and more likely to be chosen. The exact details of the move selection algorithm aren't in the preprint, but will presumably be in the full paper.

Each time it selects a move, it plays out a full game from that point, using the same process recursively. The wins and losses in these games form the basis of a score which it gives to that move - what they call "value" in the quote above. Over time, it will notice that move 2 is actually getting a really great value. This then becomes a basis to choose it more often during the Monte Carlo rollouts, basically to investigate more thoroughly and see if the value is legit. (That's what the quote means when it says the three criteria for what moves are selected during MCTS are "low visit count, high move probability and high value" - exactly how these criteria are balanced against each other is unclear). In turn that leads to position A's value rapidly improving. You can see how if position A was actually down the search tree a bit, the good news about the forced win in move 2 would propagate up the tree, gradually making it more and more likely that the sequence of moves leading to it will be selected. So basically AlphaZero makes an initial evaluation of what moves are a good idea, that initial guess guides a series of self-play games, and its opinion of what moves are a good idea is modified based on the results of those games. Those modifications gradually propagate from the leaves of the tree back to the root.

Last edited by ChrisV; 12-30-2017 at 12:09 AM.
ChrisV is offline   Reply With Quote
Old 01-01-2018, 08:58 AM   #127
Yeti
Abominable
 
Yeti's Avatar
 
Join Date: Jun 2004
Posts: 22,792
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

AlphaGo doc was just added to Netflix
Yeti is offline   Reply With Quote
Old 01-01-2018, 10:36 AM   #128
ChrisV
Carpal \'Tunnel
 
ChrisV's Avatar
 
Join Date: Jul 2004
Location: Adelaide, Australia
Posts: 36,220
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Thanks, will definitely watch, looks interesting. Well reviewed.
ChrisV is offline   Reply With Quote
Old 01-01-2018, 10:51 AM   #129
ChrisV
Carpal \'Tunnel
 
ChrisV's Avatar
 
Join Date: Jul 2004
Location: Adelaide, Australia
Posts: 36,220
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

lol, I looked at one of the reviews on RT. This is the opening couple sentences:

Quote:
The game of go is over 3,000 years old. In all that time, it has never been 'solved' as chess has; there are some preferred starting strategies but no optimal overall strategy has ever been developed.
Not off to a good start.
ChrisV is offline   Reply With Quote
Old 01-01-2018, 12:17 PM   #130
daveopie
old hand
 
Join Date: Jul 2009
Posts: 1,437
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
Originally Posted by Yeti View Post
AlphaGo doc was just added to Netflix
Thanks. I didn't know about it, and just watched it. I enjoyed it. There wasn't much in there about the computer science used or specific Go strategy - instead it focused on the human aspect, such as the drama of the outcome, and the feelings during the matches of the 2 men who played against Alpha Go. I guess that's to be expected from a Netflix show, and I thought it was well done.
daveopie is offline   Reply With Quote
Old 01-01-2018, 01:45 PM   #131
well named
poorly undertitled
 
well named's Avatar
 
Join Date: Jun 2007
Location: esse est coesse
Posts: 73,797
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
Originally Posted by ChrisV View Post
No, it starts its evaluation function with random weights, which will produce random play, but the evaluation function is updated as it learns.
Yeah, that sounds right. Thanks for the correction. I need to go back and read the article(s) Google published that I only skimmed before. Still, I'm having trouble with the idea of it getting stuck in some local part of the search space for bridge but not for chess. But at this point probably it's up to me to go read all that again, rather than on you to convince me

Quote:
Originally Posted by ChrisV View Post
In chess, good moves are good moves even if play thereafter is suboptimal.
Sure, but part of my point was that it doesn't know whether a move is good until the end of the game, as far as closing the loop on training feedback. Obviously the point is to develop the evaluation function which then does have an opinion on moves in isolation, but during training the only feedback is the end of the game.

Quote:
Originally Posted by ChrisV View Post
In bridge bidding, 1S is a better opening bid with a strong hand and spades than 4S, but only with highly specific followups.
It sounds to me like saying that a particular opening is good, but only some variations? But it looked to me like it was able, over time, to work some of that out in training.

Quote:
Originally Posted by ChrisV View Post
By the "easy path" I mean that neural networks will select for whatever gives the largest initial gains, rather than routes that will ultimately produce the best outcomes.
And yet this would seem to imply a fairly large problem for it to learn to play chess well also, but it seems to do much better in deeper or more closed positions like this than other engines do? That was my impression based on a couple of the annotated games I saw.

Anyway, it's all cool stuff. Sorry if I'm just being annoying out of ignorance. When I have more time I'll try to educate myself a little more, especially on bidding in bridge.
well named is offline   Reply With Quote
Old 01-01-2018, 02:23 PM   #132
Louis Cyphre
Carpal \'Tunnel
 
Louis Cyphre's Avatar
 
Join Date: Jun 2006
Location: Porada Ninfu, Lampukistan
Posts: 10,038
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
Originally Posted by well named View Post
Sure, but part of my point was that it doesn't know whether a move is good until the end of the game, as far as closing the loop on training feedback. Obviously the point is to develop the evaluation function which then does have an opinion on moves in isolation, but during training the only feedback is the end of the game.
Is it possible that A0 can do things like "if I make move x when I see the current pattern it often leads pattern y. Pattern y is correlated with winning therefore I should make move x"?
Louis Cyphre is offline   Reply With Quote
Old 01-01-2018, 02:55 PM   #133
well named
poorly undertitled
 
well named's Avatar
 
Join Date: Jun 2007
Location: esse est coesse
Posts: 73,797
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Quote:
Originally Posted by Louis Cyphre View Post
Is it possible that A0 can do things like "if I make move x when I see the current pattern it often leads pattern y. Pattern y is correlated with winning therefore I should make move x"?
Yes, as I understand it. Here's how the Arxiv article puts it:

Quote:
Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general- purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simu- lated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.
A "state s" means a complete board state, i.e. the position of every piece, and as I understand it pattern recognition is one of the main strengths of artificial neural networks, in the sense that the trained algorithm will favor similar kinds of moves across board states that are distinct but have various commonalities. It will come to analyze those positions as being similar, hence pattern recognition.

That paragraph is also probably pretty essential to the questions I was asking Chris. I'm reading the part about it returning a probability distribution for the next move 'a' as a way of avoiding the kind of pigeon-holing he thinks will occur, but he's pointing out that the probability distribution will become heavily weighted and might effectively prune a lot of options too early.
well named is offline   Reply With Quote
Old 01-01-2018, 05:17 PM   #134
br3nt00
veteran
 
br3nt00's Avatar
 
Join Date: Jan 2009
Posts: 2,647
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

AlphaGo documentary was great

Magnus documentary not as good but still good
br3nt00 is offline   Reply With Quote
Old 01-08-2018, 08:12 PM   #135
Lateralus
journeyman
 
Join Date: Nov 2005
Posts: 287
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Just bumping this to agree that the AlphaGo Netflix movie was awesome, and is now available in most EU countries too...
Lateralus is offline   Reply With Quote
Old 01-11-2018, 02:50 AM   #136
DanSmithHolla
newbie
 
Join Date: Oct 2017
Posts: 22
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Are there any plans of AlphaGo playing in the future? Surely someone must have looked up what it plays against the Najdorf...
DanSmithHolla is offline   Reply With Quote
Old 01-12-2018, 06:45 PM   #137
jalfrezi
Carpal \'Tunnel
 
jalfrezi's Avatar
 
Join Date: May 2012
Location: Long gone, long long gone
Posts: 6,909
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

You mean AlphaZero.

It's interesting that we've only been allowed to see 10 of the 100 games.

It played 44 million games against itself while learning, which isn't a huge number for a game such as chess with ~10^100 possible games. Is it possible that most of the other 90 games were substandard in some way that Google found embarrassing (eg drawing technically won endings, playing suboptimal openings), though not sufficiently so for Stockfish to be able to win?

(And as any Najdorf player knows, there's no good way of playing against it. )
jalfrezi is offline   Reply With Quote
Old 01-22-2018, 11:35 AM   #138
The Yugoslavian
STTF HUC II Winner
 
The Yugoslavian's Avatar
 
Join Date: Sep 2004
Posts: 24,011
Re: Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

Well, it also played like 1200 other games (iirc) vs. Stockfish in various fixed openings. It'd be interesting to see all of them!
The Yugoslavian is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -4. The time now is 11:33 AM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Copyright © 2008-2017, Two Plus Two Interactive
 
 
Poker Players - Streaming Live Online