Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 - Chess and Board Game Strategy

Stockfish, which for most top players is their go-to preparation tool, and which won the 2016 TCEC Championship and the 2017 Chess.com Computer Chess Championship, didn't stand a chance. AlphaZero won the closed-door, 100-game match with 28 wins, 72 draws, and zero losses.

Oh, and it took AlphaZero only four hours to "learn" chess. Sorry humans, you had a good run.

That's right -- the programmers of AlphaZero, housed within the DeepMind division of Google, had it use a type of "machine learning," specifically reinforcement learning. Put more plainly, AlphaZero was not "taught" the game in the traditional sense. That means no opening book, no endgame tables, and apparently no complicated algorithms dissecting minute differences between center pawns and side pawns.

There were criticisms of the rules of the match.

Quote:

The player with most strident objections to the conditions of the match was GM Hikaru Nakamura. While a heated discussion is taking place online about processing power of the two sides, Nakamura thought that was a secondary issue.

The American called the match "dishonest" and pointed out that Stockfish's methodology requires it to have an openings book for optimal performance. While he doesn't think the ultimate winner would have changed, Nakamura thought the size of the winning score would be mitigated.

"I am pretty sure God himself could not beat Stockfish 75 percent of the time with White without certain handicaps," he said about the 25 wins and 25 draws AlphaZero scored with the white pieces.

I'd like to see a rematch with an ideal version of Stockfish.

Here's a BBC story.

Google's 'superhuman' DeepMind AI claims chess crown

Quote:

Google is not commenting on the research until it is published in a journal.

However, details published on Cornell University's Arxiv site state that an algorithm dubbed AlphaZero was able to outperform Stockfish just four hours after being given the rules of chess and being told to learn by playing simulations against itself.

In the 100 games that followed, each program was given one minute's worth of thinking time per move.

And with better time controls.

Last edited by Dynasty; 12-06-2017 at 11:58 PM.

Quote

12-07-2017 , 04:39 AM

AltruisticRaven

newbie

Join Date: Jul 2016 Posts: 48

I would love to bet against anything Hikaru says or does outside of playing the standard variant of blitz chess.

Quote

12-07-2017 , 06:43 AM

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Allegedly AlphaZero's performance goes up faster than Stockfish's does with more thinking time.

Quote

12-07-2017 , 04:50 PM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,204

Isn't this "new technique" of merely playing against itself what poker computers have been doing for years?

Quote

12-07-2017 , 05:25 PM

The Yugoslavian

STTF HUC II Winner

Join Date: Sep 2004 Posts: 25,102

Monte Carlo is a technique mainly abandoned in chess programming. So perhaps it is a matter of more processing power since AlphaZero had access to *tons* more when teaching itself. Also, it's not clear (at least to me, to say the least) how Alphazero's neural network allows it to analyze so many fewer positions than a program like Stockfish.

Quote

12-07-2017 , 07:06 PM

lkasigh

adept

Join Date: Jun 2013 Posts: 953

Can anyone explain in layman's terms how the machine learning process works? Or more precisely how it is possible without inputting at least some basic chess knowledge?

I can see how a computer playing hundreds of games against choosing the moves at random could quickly learn to prefer 1.d4 to 1.f4 for example..but how would this apply to novel middle-game positions?

I guess the idea is that by mining the data from previous games, the computer identifies features of a position (material advantage, pawn structure, king position, etc.) that tend to be correlated with winning positions, and then tinkers its "random" search to head toward good positions and away from bad ones. But how does it "know" to count material or to recognize pawn structures, or to check whether the king is on an open file? Wouldn't this basic structure have to be pre-programmed in order to get the learning process off the ground?

Quote

12-07-2017 , 08:59 PM

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

The conceptual answer is that what these systems are designed to do is recognise commonalities and patterns between things. One of the classic uses of machine learning is optical character recognition. There, nobody has to tell it "the letter o has a single curved continuous line with a gap in the middle". It figures it out by itself by seeing a lot of examples. Chess is the same, except that whatever patterns it's seeing are going to be a lot more abstract and complex and probably couldn't be described in a way that makes sense to humans. Human instruction would actually be counterproductive, as they found with AlphaGo, because we know what a letter "o" is but we don't really have a good handle on what a desirable chess position looks like.

As a super basic overview of how it works, there are layers of "neurons" and connections between them which have numerical weights. There's a witchcraft "backpropagation" algorithm that takes the desired result (eg "this was the letter o") and goes back through the network adjusting the weightings. Over time, the desired patterns to be recognised are encoded, abstractly, in the weightings of the connections. That's as deeply as I understand it, really.

These systems are pretty good at recognising patterns and commonalities. Google let an extremely general AI loose on 10 million YouTube thumbnails and, by itself, it recognised the concept of a "cat" without any instruction at all. Not by name, but it said basically "I have found a class of thing, here are the images that have one of those things" and they were cats.

Quote

12-07-2017 , 09:52 PM

lkasigh

adept

Join Date: Jun 2013 Posts: 953

Quote:

Originally Posted by ChrisV

So rather than learning to prefer 1.d4 over 1.f4, it would look at patterns in won games and lost games and notice that a configuration of empty f2 square and K on e1 (however that is encoded in the program) is correlated with losses, and thus underweight options leading to this configuration?

It seems counterintuitive that this could be so effective, since the number of potential patterns is astronomically high. But then again it's pretty much how humans learn and play chess, so who knows?

Quote

12-07-2017 , 09:57 PM

lkasigh

adept

Join Date: Jun 2013 Posts: 953

And in your example of visual pattern recognition, wouldn't a basic framework of colour and line detection have to be hard-coded in?

I would have thought in a similar way that "material count" and "piece activity" variables would have to be hard-coded in a chess evaluation algorithm.

Quote

12-07-2017 , 11:24 PM

#10

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by lkasigh

Yeah, that's basically it. The patterns it recognises are probably a fair bit more complex and abstract than that, but in the two games I've watched so far it has smashed Stockfish positionally, so it obviously has very good "understanding" of positional ideas.

That's not all it does though, it's still looking at 70,000 positions per second so it obviously does a bunch of lookahead. From what they said in the paper, one of the big advantages it has is that it tends to deeply investigate "candidate moves" and not waste its time on dead ends, which again is more human-like than an engine. It's basically an engine still but with a vastly improved (but slower) evaluation function.

It's not just how humans learn, by the way, it's also more or less how the brain does it. Artificial neural networks were modelled from how the brain works.

In the case of pattern recognition, it breaks the image down into pixels (probably larger pixels than the actual resolution of the image) and the color values of those pixels are the input to the neural network. "Line detection" is a trivial exercise for a pattern recognition machine, so it doesn't have to be hardcoded in. While hardcoding stuff in might be a help to the machine, it can also be a hindrance. Anything other than basic data given to the machine will tend to shape the outcome. For instance, you could supply information to the machine about weak pawns, but if you do that, it will latch on to that information and not find out about weak pawns itself. That might not be a good thing. For instance, it might turn out that pawn structures need to be considered holistically and that breaking them down reductively into things like "weak pawns" is the wrong way to go about it.

More is often not better with neural networks. For instance, if you create a gigantic, extremely high powered neural network and then give it a training set of 5,000 example letters so it can learn character recognition, with a neural network that powerful the result will be that it "overtrains" and learns to just recognise all 5,000 images. So instead of being like "a single circular line with a hole in the middle, that's the letter o" it's like "I recognise this, it's image #1,342 of my training set, which is an o". Then when you give it images it hasn't seen before, it has no clue. There's a balancing act to making it learn in a generalized way.

Last edited by ChrisV; 12-07-2017 at 11:29 PM.

Quote

12-08-2017 , 04:56 PM

#11

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,204

Quote:

Originally Posted by ChrisV

It must be nice to know a subject like this better than Sklansky.

oops, maybe not

http://www.springer.com/us/book/9781461258407

Quote

12-08-2017 , 06:52 PM

#12

Lateralus

journeyman

Join Date: Nov 2005 Posts: 308

Time to close this forum down and go home, boys and girls

Joking aside, this is unbelievable, even taking into account after we saw what DeepMind was capable of in the AlphaGo matches. 4 hours, and then just obliterates the strongest computer running on hardware orders of magnitude more capable...

Have you checked the examples from games 5 and 9, given in the
Chessbase article? Wow ...

Without hyperbole, this is pretty much the biggest news in the history of chess. Does anyone know if it can also spot and avoid the classic "engine problems" (like the locked pawn chain positions being up two rooks and a bishop, but it is still a draw?

Quote

12-08-2017 , 07:33 PM

#13

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Saying that Stockfish's hardware was much better is misleading. AlphaZero runs on specialized tensor processors that are cutting edge enough that they aren't publicly available. It's apples and oranges. The articles I've seen that talk about processing power difference tend to be extrapolating that from the fact that Stockfish considers so many more positions per second, which is a ridiculous measure. AlphaZero has a more computationally expensive evaluation function, that's really the whole point of it.

My guess will be that it will still mis-evaluate those classic engine problems, but we'll see.

Quote

12-09-2017 , 05:15 AM

#14

watergun7

veteran

Join Date: Jun 2012 Posts: 2,796

I'm guessing that it won't have problems in anti-ai set ups. Look at this win after move 44.Bb3:

http://www.chessgames.com/perl/chessgame?gid=1899416

Alphazero sacs a pawn, then goes on to sac an exchange on c5 setting up a bind. Normal computers might not see this at all, since in the resulting position black has a lot of moves with his a and c pawns (also his R after those pawns are gone), before white makes any progress. The point is for a human, we can evaluate the position by planning ahead.

Machine learning/deep learning from what I understand is just a blackbox. It is fully capable of having things in its algorithm which are strong in evaluating such positions. For example, it has played similar positions like the one at move 44 thousands/millions of times against itself, and "learnt" by assigning win prob as value function what moves are best, instead of the old way of doing it via min-max as many ply as possible.

It's also impossible to find out what algorithms/value assignments alphazero ended up with, but the final algorithm (after 4 hrs of training) would make sense to solve e.g. 6 piece or 7 piece endgames to maximise winrate against itself as well (but we don't know and can't know if this is true except by test playing vs it).

If you followed the alphago match vs lee sedol- apparently the computer was best at evaluating calm, slow positions, and it took a complicated fight sequence in game 4 for lee sedol to win (alphago mis-evaluated something, and made bigger and bigger errors based on that until it was too late). I think the same is true in chess- so the opposite of traditional computer weakness. The reason is such complex tactical positions are rarely well trained (idea is alphazero won't have reached them/played enough times vs itself in similar positions to have correct evaluations). However, chess is just such a simpler game when it comes to tactics compared with go (potentially- since human go usually won't have as many ongoing "fights") that if you just have a computer with e.g. 10ply/5 moves it would destroy humans in tactical fights.

I haven't read the exact papers etc and there is talk of stockfish being handicapped, but just coming up with the openings after 4 hrs of training is impressive enough for me. Don't forget with deep learning- there is no reason that alphazero would have "spent more time on the opening". Assuming that it was just set to maximize win % and minimise loss prob based on win = 1, draw = 0.5, loss = 0, it would have even "known" what to study first and optimised its own training. My intuition is that openings are the least useful in terms of max winrate so alphazero must be a beast at endgames, but who knows.

Quote

12-09-2017 , 08:47 AM

#15

lkasigh

adept

Join Date: Jun 2013 Posts: 953

Isn't the "move selection function" just to play out the board position against itself a large number of times (with some sort of randomness built in so that it doesn't play the same game over and over) and then choose the move that ends up with the highest win rate?

In this case, normal anti-computer weaknesses like blocked positions would seem less relevant. Conventional chess computers have a horizon effect, such that if they see a position a piece up, they will evaluate it highly even if the position is completely blocked and there is no way to make progress. But if Alpha Zero plays out the position 500 times and it is a draw every time, it will evaluate it correctly.

Although there could easily be weaknesses in other areas, such as highly tactical positions or certain endgames where brute force search is more relevant.

Quote

12-10-2017 , 12:06 AM

#16

Judit Bowlgar

centurion

Join Date: Aug 2014 Posts: 104

Quote:

Originally Posted by ChrisV

These systems are pretty good at recognising patterns and commonalities. Google let an extremely general AI loose on 10 million YouTube thumbnails and, by itself, it recognised the concept of a "cat" without any instruction at all. Not by name, but it said basically "I have found a class of thing, here are the images that have one of those things" and they were cats.

Quote

12-10-2017 , 12:45 PM

#17

The Yugoslavian

STTF HUC II Winner

Join Date: Sep 2004 Posts: 25,102

I'm most interested in how it is evaluating positions and making moves. For instance, I think it declined a repetition a couple of times...but how would it decide to do this without intervention from the programmers. Has it developed some concept of "I want to win and repeating obv doesn't win, and this other move is slightly worse but I'm still doing well"? It's hard to know b/c Stockfish is giving those types of positions 0.00 so it of course would just go for the repetition every time unless it has a high "contempt" value programmed.

Quote

12-10-2017 , 02:11 PM

#18

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,771

From what I'm hearing, the match was unfair. Stockfish was crippled by the removal of its opening book, reduced RAM for hash tables, probably no endgame tablebases, and unequal hardware. I'm also hearing there was a fixed time per move? Chess programs are not optimized for that, since that's not how the rest of the world plays chess.

Think of the opening book this way: Brute-force style chess programmers supply all of the chess knowledge (with some limited tuning based on trial and error playing). The programmers could just as easily put the opening knowledge into the code, rather than a separate database. But since it's much more efficient to use a database for openings, that is what happens, and the programmers don't put a whole lot of effort into tuning the non-book opening play of their engine. Disabling the opening book is tantamount to deleting a portion of the engine's code IMO.

The only fair way to run a match like this is in a competitive situation. Stockfish would have it's own team that controls its hardware, settings, books and tablebases, etc. Equal hardware would be preferred, but since that may not be possible, the next best thing is that each side simply provide the best hardware available to them.

Quote

12-10-2017 , 05:01 PM

#19

The Yugoslavian

STTF HUC II Winner

Join Date: Sep 2004 Posts: 25,102

That is kind of what Stockfish's creator said. Although it is somewhat irrelevant since the feat is impressive either way and the play was very interesting/creative. But, yeah, it seems stockfish may have had clear handicaps.

Quote

12-10-2017 , 11:45 PM

#20

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Watched game 3. Am enjoying all of them. Interesting that it's a sharp attacking game from AlphaZero but the endpoint of the tactical stuff is establishing a positional bind again.

Quote

12-11-2017 , 12:05 AM

#21

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by The Yugoslavian

I assume it just evaluates the resulting position at better than 0.00. It tends to repeat a couple times in those positions because it's all upside to do so. Gives the opponent additional chances to go wrong.

Quote

12-11-2017 , 12:09 AM

#22

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by TimM

The hardware is always going to be unequal because AlphaZero runs on specialized Tensor Processing Units which are optimized for the sort of processing required for machine learning. The machine Stockfish was on had plenty of grunt, but Stockfish hadn't been optimized for running on a multi-core machine. Stockfish was definitely crippled in the match, but AlphaZero is still crazy strong for a first attempt at this sort of engine.

Quote

12-11-2017 , 03:38 AM

#23

Louis Cyphre

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 11,025

Quote:

Originally Posted by The Yugoslavian

Quote:

Originally Posted by ChrisV

Something like this probably. Initially the best move might have had like 51.6% win probablity and the second best 51.5%. Now that repeating drops the win percentage of the best move down to 50%, the initial second best move is now superior.

Quote

12-11-2017 , 03:42 AM

#24

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by Louis Cyphre

Well both moves have the same win probability against optimal opponent play because they transpose. Trying the repetition first offers the opponent more chances to screw up though, which I assume is a tiebreaker when moves evaluate identically (more formally, trying the repetition has a better outcome against the second-best opponent response, while versus best responses they are the same).

Quote

12-11-2017 , 03:45 AM

#25

Louis Cyphre

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 11,025

That makes sense.

Quote

Page 1 of 6

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 1 of 6

First

1 2 3 4 5 6

Last