Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 - Page 4 - Chess and Board Game Strategy

An interesting aspect of how A0 works that hasn't been explained ITT so far is the use of Monte-Carlo Tree Search (MCTS) in contrast to alpha-beta search. From the paper:

Quote:

MCTS and Alpha-Beta Search

For at least four decades the strongest computer chess programs have used alpha-beta search. AlphaZero uses a markedly different approach that averages over the position evaluations within a subtree, rather than computing the minimax evaluation of that subtree. However, chess programs using traditional MCTS were much weaker than alpha-beta search programs, while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions.

AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful,domain-independent search.

Here's what this means translated into English:

A program like Stockfish seeks to build as close to a complete game tree as it can, evaluating positions at the base of the tree using its rough evaluation function. It assumes that players will make what it thinks is the best move for them at every juncture. The value of any immediate move is the evaluation of the position at the bottom of the tree, when you follow the tree down making best moves for each player. Assuming Stockfish has evaluated all the moves, it will therefore correctly evaluate a move as good even if there is only one specific 15-move combination that leads to a positive outcome and the rest are disastrous. That's what the text means by "alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree". By "approximation errors" they mean evaluation error - that the weakness of Stockfish is that its evaluation of the final position it considers could easily be badly wrong. Stockfish hopes that this evaluation is so many moves ahead that this inaccuracy will not matter.

AlphaZero does not do this. The way it does its tree search is a Monte-Carlo evaluation, where it makes what it thinks are "reasonable" moves for each player (perhaps not the best) and sees where that leads. I'm necessarily being vague here with "reasonable", because the details will have to wait until the full paper. It does that multiple times and averages the results. That's what the text means by "MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree". It's like rollouts at backgammon. That's all well and good, and accounts for the improved positional play of A0, but one would suspect that the weakness of A0 would be long forcing sequences. Instead, it appeared able to find tactical moves which Stockfish only finds with a long think. That's impressive.

What that makes me wonder about is how strong AlphaZero would be on sheer evaluation alone, without any tree search at all. I have read that at Go, simply evaluating a position without any search at all, it plays at the level of a human professional. I don't know for sure that's true, but it sounds right. I'd love to see how strong A0 would be at chess under those conditions.

Quote

12-20-2017 , 05:01 AM

#77

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Reading back, that was mentioned upthread a bit. There's a little confusion between AlphaZero's method of playing itself to learn and the Monte-Carlo search used during evaluation by the mature network. During training, AlphaZero does not use rollouts, per the paper:

Quote:

It learns from self-play reinforcement learning, starting from random initial weights, without using rollouts, with no human supervision, and using only the raw board history as input features.

This was found to be an improvement on the older AlphaGo programs, which used rollouts during training. But during evaluation, it uses MCTS not because that's an improvement on traditional alpha-beta search, but because its evaluation function is expensive and it doesn't have enough processing power for alpha-beta search to be viable.

Quote

12-20-2017 , 02:32 PM

#78

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,771

Neural nets have always performed terribly at chess compared to alpha-beta search. But processing power seems to be the answer, as usual. One programmer estimated he would need 10,000 years for his program to do the same amount of learning that AlphaZero did in 4 hours. In any event it was an accomplishment to even be in the same league as Stockfish. It was not necessary to taint the results, and AZ probably wins the match even against a properly configured Stockfish.

Quote

12-20-2017 , 03:06 PM

#79

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,084

Quote:

Originally Posted by ChrisV

No. I don't know why people keep saying this. I guess it's because AlphaZero learnt to prefer certain openings over time. The basis for this preference is simply that it evaluates those positions as better. It's no more "in book" on the first move of the game than it is on the 50th. Of course, because it sees the early positions over and over, it probably has more practice evaluating them than it does later positions. But all it's doing is tree-searching and evaluating positions, same as Stockfish. It just has a vastly better evaluation function.

People keep mentioning SF's lack of opening book when arguing that the match was unfair. But if A0 has no book, then I couldn't see why Stockfish should, so I assumed A0's memory of past matches was similar to a book (giving those critics benefit of the doubt that they're making a logical point). A0 basically got to partially analyze the openings beforehand, whereas SF starts from scratch every time.

Come to think of it, A0 has a similar "pre-analysis" advantage in every move of the game, so I wonder if SF should simply be given some more time per move than A0 to compensate. More time earlier in the game of course.

Regarding "(human+engine) > engine", it was from lurking this forum that I learned that fact, probably a few years ago. I think someone said it's because humans have a better understanding of early/middle game, or certain positions in general (albeit perhaps a minority of positions). At the time, I was compiling an opening book by letting SF ponder opening positions up to depth 30-something. I recall reading that post and thinking DOH, I wasted a bunch of time because the SF openings were probably flawed.

Quote

12-20-2017 , 03:56 PM

#80

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,771

Stockfish needs a book because it was developed assuming it would have a book. Otherwise, opening knowledge could have been added directly to the code. This addition of human knowledge is no different than the addition of human knowledge into the evaluation function, such as piece values, pawn structure, etc.

Quote

12-20-2017 , 05:02 PM

#81

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,084

Quote:

Originally Posted by heehaww

A0 basically got to partially analyze the openings beforehand

Hm I think this is the wrong way of looking at it. A0 taught itself patterns. When it plays a game, it is treating each position as new but using the patterns/"understanding" it acquired. These patterns aren't like having a book, they're part of the evaluation function itself.

Therefore I was about to conclude that it was fair for Stockfish not to have a book. However, TimM makes a good point:

Quote:

Originally Posted by TimM

Otherwise, opening knowledge could have been added directly to the code. This addition of human knowledge is no different than the addition of human knowledge into the evaluation function, such as piece values, pawn structure, etc.

So now I'm not sure where I stand. Counter-point to the bolded: hard-coding human heuristics into the eval isn't quite the same as hard-coding actual moves to make. If the heuristics generate bad openings, that's a flaw with the heuristics and eval. And wasn't the point of this match to see which engine's eval was better? Not which one had better human moves whispered into its ear?

Taking this to the extreme, if humans were to hard-code an entire strategy tree into a "full game book", I wouldn't call that an engine. It would just be a database. Similarly, opening book moves aren't engine moves (unless generated by the engine in advance).

Last edited by heehaww; 12-20-2017 at 05:10 PM.

Quote

12-20-2017 , 06:03 PM

#82

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by heehaww

Yeah. That's why "AlphaZero learnt its own opening book!" is a weird complaint. What they're actually saying is "AlphaZero's evaluation function is too strong in the early game, that's not fair".

Quote:

Originally Posted by heehaww

Counter-point to the bolded: hard-coding human heuristics into the eval isn't quite the same as hard-coding actual moves to make. If the heuristics generate bad openings, that's a flaw with the heuristics and eval. And wasn't the point of this match to see which engine's eval was better? Not which one had better human moves whispered into its ear?

Taking this to the extreme, if humans were to hard-code an entire strategy tree into a "full game book", I wouldn't call that an engine. It would just be a database. Similarly, opening book moves aren't engine moves (unless generated by the engine in advance).

Right. This is where I think there's a difference in philosophy between the DeepMind guys and the chess engine community. The engine community's aim is to make a computer play chess as strongly as possible. DeepMind's aim is developing chess AI. The two are related but subtly different. In the first view, an opening book is an obvious necessity for Stockfish. In the second view, it's a way of escaping having to use AI, rather than an AI technique. From DeepMind's point of view, the complaint is "No fair, you won't let us play moves for the engine in an area where it's weak!".

I think it's a bit dubious to say that Stockfish uses an opening book because it's designed that way. There's some truth in that, but mostly it's designed that way because engines are simply not strong enough in the opening. If it were possible to have an engine play the opening at a high level, it would be being done. Whether you use an opening book, or bake opening moves into the code, or have a theory-expert grandmaster play the first 10-15 moves for Stockfish, it's still admitting defeat on the AI front and using a method other than AI to get Stockfish to a point where it can take over.

Quote

12-20-2017 , 06:24 PM

#83

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

It's the same with endgame tablebases. Pretty sure DeepMind would view having giant databases of domain-specific knowledge as being the antithesis of AI. Depends whether your question is "How can a computer play chess most strongly?" or "What is the best AI technique going forward for chess-like problems where we may not have much domain knowledge"?

Quote

12-20-2017 , 06:27 PM

#84

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by TimM

This addition of human knowledge is no different than the addition of human knowledge into the evaluation function, such as piece values, pawn structure, etc.

I disagree with this, btw. This is saying that a database is no different to an algorithm.

Quote

12-20-2017 , 08:57 PM

#85

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,771

Quote:

Originally Posted by ChrisV

I disagree with this, btw. This is saying that a database is no different to an algorithm.

The distinction isn't meaningful philosophically, it's just a matter of programming convenience. You can include any amount of data in an algorithm. What if Stockfish programmers put the entire opening book into its executable file, and did not offer an option to disable its use?

Quote

12-20-2017 , 09:21 PM

#86

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by TimM

Enumerating specific instances of a problem and providing the solution is philosophically different to an algorithmic approach providing a general solution to a problem. The former is not "teaching a machine to think" in any sense. I guess whether you consider this distinction meaningful is a matter of opinion. One pretty clear distinction is that the former technique can never come up with any answer to any problem that was not already known to the programmer. IMO it's pretty hard to justify calling that "artificial intelligence".

Quote

12-21-2017 , 12:06 PM

#87

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,084

Memorizing an answer key for a math test is not the same as doing math, and good luck getting the right answer if given a new problem to solve.

Quote

12-21-2017 , 02:49 PM

#88

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

The "sport" I'm most interested in seeing A0 tackle is bridge. Many many many times harder to master than chess, imo. It's a partnership game and would require A0 to develop it's own "language" for bidding as well as (to a lesser extent) play. And while A0 can on the fly "optimise" how to respond to different chess openings, how would A0 "self teach" how to "optimise" against Strong Club, ACOL, SAYC etc?

First instinct is "can't be done" but A0's achievement in Go, Chess etc. utterly astonished me so maybe?

Quote

12-21-2017 , 04:18 PM

#89

Louis Cyphre

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 11,025

Quote:

Originally Posted by heehaww

Memorizing an answer key for a math test is not the same as doing math, and good luck getting the right answer if given a new problem to solve.

Doesn't A0's algorithm contain an opening book of sorts? It sees the starting position and recognizes a pattern it encountered before. Then it looks up what the best type of moves are when seeing this pattern, e.g. move one of the c to e pawns two squares forward. Or how does A0 handle the starting position for example?

Quote

12-21-2017 , 04:23 PM

#90

Alpha Fish

Pooh-Bah

Join Date: Aug 2008 Posts: 3,917

Quote:

Originally Posted by PartyGirlUK

The "sport" I'm most interested in seeing A0 tackle is bridge. Many many many times harder to master than chess, imo

don't agree with the bolded at all

I did some mucking with bridge here and there about 15 years ago and within a couple of years aided by some expert advice made Russia's top 100

to make same claim in chess I basically needed to become a GM

I and my equally skilled partner can have a decent showing against #1 bridge pair in the world, and even have some chance to win due to variance

some mid range IM will lose 100 out of 100 to Magnus

if by harder to master you've meant that humans are further away from GTO play in bridge than in chess then I agree. but that just makes it easier for a bot to crush

Quote

12-21-2017 , 07:18 PM

#91

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by Louis Cyphre

That's not how AlphaZero works. All it is once trained is a very complicated mathematical function that is extremely strong at evaluating chess positions. During training, it saw the positions after 1. e4 and 1.d4 (for example) many, many times, so a lot of training went into evaluating those positions, meaning that the function it ended up with is probably pretty good at it. But AlphaZero never "recognizes a position it encountered before" in any sense. Put it this way: if you gave Google nothing but the trained AlphaZero and a position and said "Has AlphaZero seen this position before?", that is a completely impossible question to answer. (Of course, in the real world, Google also has access to AlphaZero's training records).

Similarly, A0 doesn't "look up what the best type of moves are". Like Stockfish, it looks forward to what the consequences will be if certain moves are made and then evaluates the resulting positions.

We know that ultimately the way AlphaZero's evaluation function works must involve pattern recognition, in some sense, but it's far too complex to be analysed reductively. From a practical point of view AlphaZero is a black box that is position in, evaluation out, plus a Monte-Carlo tree search nailed on top. I think the word "recognition" in "pattern recognition" is misleading. In humans, when we "recognise" something, we apprehend it consciously, as a separate entity. That's not a good analogy for AlphaZero.

Last edited by ChrisV; 12-21-2017 at 07:41 PM.

Quote

12-21-2017 , 07:24 PM

#92

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

The problem with bridge is that it's not a game of complete information, which makes it significantly harder for this approach. Neural networks are actually typically very good at dealing with incomplete information. For instance, a facial recognition neural net should cope well if part of the face is obscured. In the case of bridge, though, the idea is not to cope with missing information, it's to reconstruct it. I have no idea how you'd go about achieving that with a neural network. But then, my knowledge in the field is shallow and I thought chess would be hard to get very strong at with this approach as well, so who knows.

Quote

12-21-2017 , 07:35 PM

#93

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Random thought: I'd love to see AlphaZero v Stockfish at Chess960.

Quote

12-21-2017 , 07:43 PM

#94

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

I'd like to see A0 have a go at Minibridge, which works as follows:

- All four players declare their points.
- The partnership with the most points declares.
- The player of the declaring partnership who has the least points puts his hand down as dummy.
- His partner selects a contract.
- Lead is made. The rest continues as normal.

That strikes me, with limited knowledge, as the kind of game A0 could get good at. Although I don't know how it would develop leading and signalling conventions. Would be really fascinating to see if it did & how it handled falsecards and the like.

Quote

12-21-2017 , 07:49 PM

#95

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Never heard of that before but will keep it in mind, sounds like a good way of playing bridge with people who have never played before.

Quote

12-21-2017 , 10:31 PM

#96

Melkerson

Carpal \'Tunnel

Join Date: Nov 2008 Posts: 11,189

Quote:

Originally Posted by ChrisV

I think if the question is can you build a bot to crush bridge, I don't think it would be that hard.

If the question is can an A0-like AI play a bunch of bridge against itself and become expert. I guess if all it knows is the rules and no one programs in the bidding conventions, then that might be quite difficult. But that's not very realistic as opponents have to reveal their bidding conventions.

Quote

12-21-2017 , 11:12 PM

#97

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

Others feel free to chime in but currently the best bridge AI fine players but a long way from elite. Maybe something like a 2000 ELO chess player?

What would be interesting would be to pit A0 against WBridge5. After playing enough hands, A0 would presumably be able to decipher WBridge5's bidding conventions & signalling. Responding to that is interesting. Some stuff, like deducing the opponent's hands, finding the best way to take tricks etc. should come pretty easily. I'd like to see how A0 deals with stuff like "This contract can't make double dummy. But if I establish spades now, when my opponents are in with the Ace they might not know which suit to switch to". That actually doesn't sound *that* hard, assuming A0 is programmed correctly (I presume it's currently coded just to solve perfect information games?).

Then A0 has to come up with it's own bidding system. And it has to share that with it's opponents, which makes playing a bunch of hands against WBridge5 impractical (A0's initially explanations of what a 1d opening bid are likely to be extremely bizarre). But I *am* really curious how long it would take A0 to deduce WBridge5's bidding system, starting from zero.

Quote

12-21-2017 , 11:57 PM

#98

Melkerson

Carpal \'Tunnel

Join Date: Nov 2008 Posts: 11,189

Quote:

Originally Posted by PartyGirlUK

Others feel free to chime in but currently the best bridge AI fine players but a long way from elite. Maybe something like a 2000 ELO chess player?

What would be interesting would be to pit A0 against WBridge5. After playing enough hands, A0 would presumably be able to decipher WBridge5's bidding conventions & signalling. Responding to that is interesting. Some stuff, like deducing the opponent's hands, finding the best way to take tricks etc. should come pretty easily. I'd like to see how A0 deals with stuff like "This contract can't make double dummy. But if I establish spades now, when my opponents are in with the Ace they might not know which suit to switch to". That actually doesn't sound *that* hard, assuming A0 is programmed correctly (I presume it's currently coded just to solve perfect information games?).

Then A0 has to come up with it's own bidding system. And it has to share that with it's opponents, which makes playing a bunch of hands against WBridge5 impractical (A0's initially explanations of what a 1d opening bid are likely to be extremely bizarre). But I *am* really curious how long it would take A0 to deduce WBridge5's bidding system, starting from zero.

Obviously whether A0 "has to" do that is in the eye of the beholder. I don't see what purpose it serves to not input bidding conventions just as the basic rules of bridge would be. While not exactly rules, they are pretty close.

Maybe after millions of hands A0 could suggest improvements to the conventions that exist.

Quote

12-22-2017 , 12:03 AM

#99

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

I suspect if you wanted A0 to learn bridge you'd do something along those lines but bidding conventions are not remotely close to rules (unless you were treating it as if it were a high level individual tournament where everyone must play from a set convention card in which case they are rules). It would still be really interested but it wouldn't be authentic "Alpha Zero", imo.

Quote

12-22-2017 , 12:32 PM

#100

Melkerson

Carpal \'Tunnel

Join Date: Nov 2008 Posts: 11,189

Quote:

Originally Posted by PartyGirlUK

I suspect if you wanted A0 to learn bridge you'd do something along those lines but bidding conventions are not remotely close to rules (unless you were treating it as if it were a high level individual tournament where everyone must play from a set convention card in which case they are rules). It would still be really interested but it wouldn't be authentic "Alpha Zero", imo.

Yeah, bolded is along the lines of what I was thinking, since the endgame would be to play the best humans under tournament conditions.

However, I really don't know exactly how these high level tournaments work. I don't think the cards cover every possible scenario and there are some gray areas. But I don't know for sure.

Quote

Page 4 of 6

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 4 of 6

First

1 2 3 4 5 6

Last