Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 - Page 2 - Chess and Board Game Strategy

That probably sounds like I'm splitting hairs but engines evaluate win probability based on best responses. So if move A is +0.5 against best response and +0.6 against second best response, and move B is +0.5 against best response and +3 against second best response, the engine will evaluate both of them as +0.5. But I assume (just speculating) that the second best response evaluation would be a tiebreaker and it would play move B.

Quote

12-11-2017 , 11:08 AM

#27

MakeBelieve

adept

Join Date: Oct 2010 Posts: 1,021

would missing the openingbook really make that much difference? the general gist I got from watching youtube channels dissect the games is that both play theoretical lines until A0 does something different (I just take their word for it) I assume stockfish is 'out of the book' when that happens anyway

Quote

12-11-2017 , 11:57 AM

#28

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,766

Computer opening books are very thorough, and actually getting the program out of book in a way that is favorable is difficult. That said, I did see some chess programmers criticizing the Stockfish book (maybe because it doesn't force Stockfish to only go down lines that lead to positions that favor computers).

Quote

12-11-2017 , 07:21 PM

#29

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

I mean, people have done analysis of the games with a properly running Stockfish. There are some games (eg game 4) where Stockfish makes some dubious moves, but there's also a lot A0 does which is not seen by top engines.

Quote

12-13-2017 , 03:03 AM

#30

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,073

How high does your rating have to be such that you fully appreciate these super duper unexpected moves that Alpha made even after being told what they were and seeing how they turned out (but not reading a detailed explanation).

Quote

12-13-2017 , 03:22 AM

#31

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

As high as AlphaZero's, I guess. There's no critical point at which you suddenly move from understanding to non understanding. I am not particularly good at chess (rated around 1800) but I'm able to notice obvious differences in the way A0 plays compared to other engines. It's more aggressive and has better positional understanding. Engine v engine games can often look a bit directionless, with each side just playing moves that don't lose and lacking any real long term plan or understanding. When A0 drops one of its positional binds, you can look back and see that it was working towards that for some time.

Quote

12-13-2017 , 04:37 PM

#32

TimTimSalabim

Carpal \'Tunnel

Join Date: Oct 2002 Posts: 14,522

I don't think understanding the individual moves is that hard, especially if you watch some of the videos out there that explain them well, e.g. Jerry from ChessNetwork is doing a good series on it. If there's an overarching theme to AlphaZero's play it appears to include grabbing space and not being afraid to sacrifice material.

Needless to say, understanding the moves after they're made is one thing, and finding them during a game is another.

Quote

12-13-2017 , 05:35 PM

#33

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,073

I was talking specifically about those few moves that had two exclamation marks and articles made a big deal of. How good do you need to be to appreciate that specific move if you had been following the game to that point, see that move (which presumably might look strange) are told it is brilliant, but are not told why?

Quote

12-13-2017 , 07:40 PM

#34

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

The moves that are getting publicity aren't really the point.

Here AlphaZero played Bg5!! Now, this move is not hard to understand conceptually. Black's minor pieces are a disaster, huddled on the queen's side doing nothing. The knight on b8 is hemming in the rook and has literally no moves. Black's king theoretically should be weak, and Bg5 is trying to get more pieces around it. The threat if Black does nothing is Nf6, where gxf6 Bxf6+ is mating and if the queen moves say to d3, there's Be4 and it's slaughter time. But Black has various replies to consider to Bg5 (f5, hxg6, Bd3).

So, Black's pieces are not helping the defence of his king and Bg5 is putting more pieces around the king and issuing a direct threat. That's all easy. But understanding that the move actually works - working through all the tactics and figuring out that they work for White - well, other engines take a very long time to find Bg5. Which means that the tactics will be beyond human ability to calculate. So Bg5 can be understood conceptually, but not verified (without the assistance of an engine). But that's true of tactical moves Stockfish makes as well. The tactical capabilities of engines have outstripped humans for ages. The reason people are making a big deal out of it is that one might expect that a pattern-recognition approach might have tactical capabilities inferior to that of a brute-force engine, but in fact here it is finding a tactical move which brute-force engines struggle to find.

The impressive thing for me about AlphaZero is its positional understanding, though. And there, you can't pull out single moves. You have to look at a game holistically. When AlphaZero drops one of its positional binds, you can look back through the game and note moves which contributed to A0 getting exactly the pawn structure it wanted, say. And again, while it's easy to pull out features of a move and say "oh yes, see, moving this pawn here establishes control of these squares" - that doesn't mean the moves are really understood. There might be second, third, and fourth more subtle aspects to the move that didn't become apparent in the game.

Last edited by ChrisV; 12-13-2017 at 07:45 PM.

Quote

12-13-2017 , 07:56 PM

#35

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

That position above is actually a good illustration of the way the games went. AlphaZero disdains material and goes for positional advantage instead. White is a pawn down but has huge positional compensation - Stockfish (not seeing Bg5) thinks the position is about equal, and in the released 10 games, AlphaZero is always choosing to take the positional advantage side of these imbalances. It's exciting to see because the message from engines so far has been "Human abstract positional ideas are subordinate to the concrete moves of a position; lol 'positional advantage', I have an extra pawn and can see 20 moves into the future". Engines grab material in positions where it looks too dangerous to human and then defend tenaciously, repeatedly finding the only defensive move available. But A0's play is demonstrating that maybe engines are wrong about that and that positional ideas do trump concrete material a lot of the time. And here, the punishment for Stockfish's greediness comes in the form of a tactical shot, but in other games, AlphaZero simply continues to improve its positional advantage until it becomes overwhelming and something breaks in Stockfish's position.

Last edited by ChrisV; 12-13-2017 at 08:03 PM.

Quote

12-13-2017 , 11:20 PM

#36

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,073

Those are nice educational posts But they didn't answer my general question. I'm just curious how good a chess player you have to be to quickly appreciate a double exclamation point move, regardless of who makes it.

Quote

12-13-2017 , 11:27 PM

#37

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,766

There's not some kind of standard for awarding exclamation points.

Quote

12-13-2017 , 11:41 PM

#38

rockfsh

veteran

Join Date: Mar 2006 Posts: 2,530

Quote:

Originally Posted by David Sklansky

Given ChrisV's post ITT, I'd guess 1800 (USCF Class A)

Quote

12-13-2017 , 11:48 PM

#39

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

"Moves awarded a double exclamation point" isn't a class of things with enough internal consistency to generalise much about. It's an aesthetic award more than anything. An analogy in maths/physics would be that "double exclamation mark breakthroughs" in mathematics would include both Einstein's theories of relativity, which were conceptually revolutionary but not particularly mathematically complex, and something like Andrew Wiles' proof of Fermat's Last Theorem, which features extremely complex pure maths. "How good do I need to be at maths to understand these breakthroughs" isn't a coherent question, depends on the individual instance.

Quote

12-14-2017 , 10:18 AM

#40

The Yugoslavian

STTF HUC II Winner

Join Date: Sep 2004 Posts: 25,040

It depends. Some 1200s could appreciate it. Some 2000s probably wouldn't. It also depends on the type of move.

My guess is the person would have to be a fairly serious chess player (which I consider a 1200 player to be), likely having played in some tournaments and/or studied a decent amount.

Quote

12-15-2017 , 02:03 PM

#41

daveopie

old hand

Join Date: Jul 2009 Posts: 1,506

In the paper, it says "To evaluate performance in chess, we used Stockfish verison 8 (official Linux release) as a baseline program, along with 64 CPU threads and a hash size of 1GB." I don't see anything in the paper (and Google isn't commenting until it is officially published) about Stockfish being crippled in any other way.

Quoting from Wikipedia, "Stockfish can use up to 512 CPU cores in multiprocessor systems. The maximal size of its transposition table is 1 TB."

I'm not sure if 1 thread = 1 CPU (you can have multi-threaded CPUs), nor if the "hash size" which is 1G is the same thing as the "transposition table." I do think Stockfish could have run on a bigger computer, but I also think they picked hardware good enough so they could have a tough opponent for AlphZero.

For the evaluation, AlphaZero ran on a platform with 4 TPUs (Tensor Processing Units - specialty hardware designed by Google). However, during the training phase they had AlphaZero using more hardware. "Training proceeded for 700,000 steps (multi-batches of size 4,096) starting from randomly initialized parameters using 5,000 first generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks."

AlphaZero had quite a bit of hardware for the training phase ( the "4 hours of training" claim depends on how much hardware was used), but they limited the hardware during the implementation. I don't know of a good comparison between TPUs and CPUs - they are different and hard to compare fairly. But it does seem that Google could have thrown even more hardware into AlphaZero during the evaluation phase if they really wanted to.

I think the main point Google was trying to show is that it is possible to start with only knowledge of the rules of the game (they are careful to point out their exact assumptions, none of which contained inputting any chess strategy into AlphaZero) and derive a great chess algorithm from scratch.

Quote

12-15-2017 , 02:34 PM

#42

Louis Cyphre

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 11,025

1GB hash size is really small, isn't it? I use more than 8GB on my home computer.

Quote

12-15-2017 , 05:28 PM

#43

TimM

Carpal \'Tunnel

Join Date: Jan 2004 Posts: 9,766

Yes hash table is the same as transposition table. It makes it unnecessary to evaluate the same position multiple times if they occur via different move orders. This allows greater search depth in the given time. And I also use 8GB, don't see why anyone would use 1GB on today's machines. The chess programmers group I read says that adding cpu cores beyond just a few can only add 10 Elo at best due to diminishing returns.

Quote

12-16-2017 , 08:59 AM

#44

Shandrax

Pooh-Bah

Join Date: Mar 2005 Posts: 4,818

Don't overreact to this story! AlphaZero is based on a chess program named Giraffe. Giraffe took 72 hours to calibrate on a workstation. AlphaZero used 64 Google TPUs to do it in 4 hours. During the match AlphaZero used 4 TPUs while Stockfish ran on 64 cores with 1GB hash. The fact that Stockfish drew so many games is the really impressive part.

The whole thing was a nice marketing stunt by Google to gain a few points on the stock-market, just like IBM did with the match between Deep Blue and Kasparov.

Quote

12-16-2017 , 10:02 AM

#45

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

If that's satire, nice job. But I don't think it is.

Quote

12-16-2017 , 03:43 PM

#46

Shandrax

Pooh-Bah

Join Date: Mar 2005 Posts: 4,818

Nope, it's the short version of these articles.

http://forced-draw.com/page/even-more-on-alphazero/
https://translate.google.com/transla...eln&edit-text=

Quote

12-16-2017 , 04:05 PM

#47

MakeBelieve

adept

Join Date: Oct 2010 Posts: 1,021

I appreciate that move, and i am chess nothing, I play a bit of tactical training on lichess fluctuating around 1700-1800...whatever that means for true rating. and play afew games here and there on the app.

appreciation can come from something you dont understand at all and see it turn out to be successful

Quote

12-16-2017 , 07:39 PM

#48

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

Quote:

Originally Posted by Shandrax

Nope, it's the short version of these articles.

http://forced-draw.com/page/even-more-on-alphazero/
https://translate.google.com/transla...eln&edit-text=

Quote:

As I wrote in the previous article, my Stockfish, casually running on just one core on an Intel i7 4760 3.60 GHz, took roughly 75 minutes to find the star move at depth 41. Hardware is the bottleneck. Just for comparison: Massive hardware upgrades almost doubled the playing-strength of AlphaGo. It simply expands the search-horizon.

Looking at the difference in pure hardware power this reminds me of David vs. Goliath. Running on identical machines, Stockfish should beat AlphaZero easily.

This is just utter nonsense from start to finish. For starters, when he says "massive hardware upgrades almost doubled the playing-strength of AlphaGo", his link says no such thing and he has no way of knowing that. AlphaGo Master played on better hardware than AlphaGo Lee, but it was also a new version of the neural net. From the publicly available facts, here's no way of knowing which of these things caused the improvement.

Secondly, we know what effect better hardware has on Stockfish's rating. Doubling processor speed adds about 60 ELO, doubling cores 40 ELO. There's huge diminishing returns on brute-force search because the search space expands exponentially. It's certainly true that AlphaZero requires much more processing power than Stockfish does to play at a high level. It would get annihilated if both ran on a desktop computer. It does not follow that AlphaZero's advantage over Stockfish was simply processing power, nor that Stockfish would "beat AlphaZero easily" on similar top-end hardware. It can simply be the case that AlphaZero improves much more as hardware gets better than Stockfish does.

It may also be the case that the ELO added by hardware improvement doesn't translate to much better performance against AlphaZero specifically. The kind of positions engines don't understand, such as closed positions where one side can't actually use their material advantage, aren't solved by adding additional processing power. It may be that the positional weakness in Stockfish's game which AlphaZero exploits is precisely the aspect of its game that improves least with additional processing power.

The guy writing this blog has absolutely no idea how AlphaZero works btw, from his previous post on the subject:

Quote:

This sounds rather easy in theory, but it’s not that easy to code. While Magriel could make the deliberate decision to play for certain points or use the cube in a certain way, AlphaZero modifies each player based on what? There is certain difference in style between Tal and Petrosian, but how do you formulate this in numbers? In other words, it’s not easy to describe a style in a formal language or as an object. Stockfish is much easier to configure, because you can just give weights to certain positional features and you can modify the value of pieces. I guess the solution to this problem is worth the 400 million dollars that Google paid for DeepMind in 2014.

Why take seriously the writings of someone on AlphaZero when he doesn't know the most basic things about how it functions? His implication that AlphaZero is just a more powerful version of Giraffe is also completely wrong, by the way. The only similarity between the two is that they both use machine learning. Giraffe was taught from human games and its neural nets were structured in a domain-specific way:

Quote:

The story, which described Lai's accomplishment as "a world first," explained the layers of Giraffe's neural network: "The first looks at the global state of the game, such as the number and type of pieces on each side, which side is to move, castling rights and so on. The second looks at piece-centric features such as the location of each piece on each side, while the final aspect is to map the squares that each piece attacks and defends."

AlphaZero learnt from scratch, playing itself, and its neural network was not specifically structured to handle chess. This is a good place to point out that the breakthrough here is not that Google made a chess engine which is stronger than Stockfish. The breakthrough is how it works and how it was done.

Your second article is in English without needing to be translated here. I'm not sure what in it I need to refute though.

Last edited by ChrisV; 12-16-2017 at 07:51 PM.

Quote

12-16-2017 , 08:26 PM

#49

ChrisV

Carpal \'Tunnel

Join Date: Jul 2004 Posts: 40,336

To expand a bit on what I mean about a Stockfish ELO increase maybe not mattering: in that position I linked upthread, it's clear that with enough processing power, Stockfish will eventually be able to find Bg5. Finding moves like that will improve its ELO. It's very clear that this is something brute force will be able to achieve.

But then there's things like this game (the one where Stockfish gets its queen trapped). By the time Stockfish realises something is wrong in that game, it's far, far too late. The consequences of AlphaZero's positional manouevring are not just a little beyond Stockfish's horizon, they are way beyond it. Additional brute-force processing may not help in that case. Like, maybe it will, but it's not clear to me that that is the case.