Open Side Menu Go to the Top
Register
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72

12-22-2017 , 01:40 PM
The cards would cover most eventualities & programmers could code most of what isn't explicitly mentioned but there would be roughly 1 or 2 boards a set where come unusual bidding sequence occurs where it's normally 'obvious' to top level humans what various bid "must" mean but even the top level computers struggle with. I'd like to see Google have a go at interpreting these unusual situations - one method would be to simply feed it a bunch of hands played by elite players & let it decode how to bid, however AFAIK there's very little elite level bridge player with standardised convention cards so getting a big enough data set to "understand" the non-explicitly-specified bidding situations is tricky.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-24-2017 , 04:32 AM
Would it be possible for the DeepMind folks to look at A0's "brain state" and reverse-engineer an explicit evaluation algorithm?

Would said algorithm be a bunch of tensor operations?
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-24-2017 , 07:51 AM
That's what AlphaZero is. An evaluation function that is a bunch of tensor operations. You can't turn it into something more understandable. One thing you can do is change subtle things about a position and see how the nodes in the network change in response. That gives you a little insight into what the network considers important, but in the case of a deep neural network like A0, not a ton of insight.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-24-2017 , 08:51 AM
No but I mean, now that A0 has learned its evaluation function, couldn't that function be used on a normal (CPU-based) computer? And the function itself would be fast (just that function, no search).
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-24-2017 , 09:31 AM
Crazy thought: if the rules of Chess can be represented by tensors, is it possible that Chess can solved by math alone? I would think each tensor operation A0 does is analogous to something happening in a multidimensional geometric space, so one could think about Chess in a purely mathematical way without imagining pieces or a board.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-25-2017 , 07:11 PM
I have no doubt that this program would crush bridge if it tried to
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-27-2017 , 03:20 PM
I built a supercomputer capable arguing with Shandrax.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-27-2017 , 05:03 PM
Quote:
Originally Posted by feedmykids2
I have no doubt that this program would crush bridge if it tried to
For those who don't know - ^^^ is a world class bridge player.

Justin I'm interested in how you think it would "learn" bidding. I think it would need restrictions - e.g I just don't think you could just tell it "Here are the rules of bridge, good luck!" & expect whatever came out the other side to "know" how to defend against Aspro, Asptro, Cappelletti etc. You might need to "teach it" SAYC, 2/1, Meckwell etc. (or give it a huge number of sample hands) and let it work out the optimal way to bid.

I think you'd also need to place restrictions on what A0's bids could mean (if you wanted it to compete against humans). The way A0 works at chess, from what I undestand, is to develop a really, really complex algorithm to evaluate positions and prune from there. I believe the algorithm might be so complicated that humans would have trouble understanding it? Well, that's not a problem. Compare it to bridge:

Let's say A0 determines that v. Meckwell (1)-2 should mean a weak two in hearts, or 5-5 in the minors with 5-7 points, or 3-3-3-4 with 10-11 or 16-17 points or ... list of like seven different things ... when red v. white. When white v. red it means seven completely different things, and when white v. white it means something different which is actually different again from red v. red. And of course, it probably wouldn't evaluate anything as simple as "points". I'd absolutely love to see what that cc would look like (turns out humans did really well working out HU LHE & chess, would AOs CC look anything like human's?) but the cc they produced would surely be illegal by the standards of international bridge? Even a simple 1c opening might mean completely different things in the 64 combinations of total points/rubber/IMPs/MPs, opening position & vulnerability.

It would impractical for a human pairing to prepare a defence to such a complex bidding system. So if you wanted to build an A0 to compete against humans you'd need to program it so that it could "explain" any bid in a way that was simple enough to meet the standards of, say, the Bermuda Bowl.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-27-2017 , 05:19 PM
Quote:
Originally Posted by PartyGirlUK
I'm interested in how you think it would "learn" bidding. I think it would need restrictions - e.g I just don't think you could just tell it "Here are the rules of bridge, good luck!" & expect whatever came out the other side to "know" how to defend against Aspro, Asptro, Cappelletti etc. You might need to "teach it" SAYC, 2/1, Meckwell etc. (or give it a huge number of sample hands) and let it work out the optimal way to bid.
I'm not a world-class (or neighborhood-class even) bridge player but if I understand the algorithm properly I would guess that it's only really a question of how long it would take to learn via self-training. I don't see any reason why the general approach AlphaGoZero uses wouldn't work. It may need a comparatively much larger or smaller number of training games though, I'm not familiar enough with bridge to know. But in effect "a huge number of sample hands" (played against itself) is exactly what it does.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 03:47 AM
Forcing A0 to derive bidding conventions on its own is an interesting intellectual exercise, but completely pointless.

That's not really how anyone learns them. They are learned by someone telling you what they are. So if someone had a goal to build a computer to play bridge (either A0-like or stockfish-like) as well as possible, they're going to put them in there. Leaving them out is a restriction that would serve no practical purpose. Why on earth would someone try to build a bot to crush bridge and leave them out?

But it would definitely be interesting to see what happened if you left them out.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 03:58 AM
It's entirely possible that the GTO bidding conventions are completely different from the conventions that have evolved. I believe something like this happened in backgammon where computers were able to show that certain conventional opening strategies were inferior. I don't know a ton about bridge but I don't believe that experts believe bidding to be close to solved by the human analysis that's gone into it so far.

The main point of the AlphaZero project was that the program achieved superhuman play without any human input besides the rules of the game. If you were designing a bot to learn to play bridge with a human partner then it would make sense to code it with well known human convetions, but if the bot is designed to play only with another copy of itself as a partner there would be no need for it to be restricted to only the known human-used conventions.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 04:23 AM
Human bidding systems are extremely inefficient. An AI bidding system would be vastly more complex.

I'm pretty sure though that it would be impossible to generate a bidding system with a reinforcement learning approach. Reinforcement learning relies on there being a "right answer", like for character recognition, to train the neural network you have to know what all the letters actually are. What's the "right answer" in bidding? If you reach the wrong contract, without some existing information on what bids mean, it's not possible to tell where the bidding went wrong or how it should be improved.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 06:13 PM
Quote:
Originally Posted by ballin4life
It's entirely possible that the GTO bidding conventions are completely different from the conventions that have evolved. I believe something like this happened in backgammon where computers were able to show that certain conventional opening strategies were inferior. I don't know a ton about bridge but I don't believe that experts believe bidding to be close to solved by the human analysis that's gone into it so far.

The main point of the AlphaZero project was that the program achieved superhuman play without any human input besides the rules of the game. If you were designing a bot to learn to play bridge with a human partner then it would make sense to code it with well known human convetions, but if the bot is designed to play only with another copy of itself as a partner there would be no need for it to be restricted to only the known human-used conventions.
Just because you tell A0 what they are doesn't mean it has to use them. If it discovers improvements then it can use those. I guess the only problem is that if somehow making it aware of existing conventions would hinder the development of more efficient ones, but that seems unlikely to occur.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 06:20 PM
Quote:
Originally Posted by ChrisV
I'm pretty sure though that it would be impossible to generate a bidding system with a reinforcement learning approach. Reinforcement learning relies on there being a "right answer", like for character recognition, to train the neural network you have to know what all the letters actually are. What's the "right answer" in bidding?
I am ignorant enough that it's possible you're right here and I don't see it yet, but I'm not convinced this is true from what I understand about the rules of bridge. To me it sort of sounds like saying that reinforcement learning doesn't work for chess because you can't tell immediately after making a move whether or not the move was good, since it depends upon the playing of the rest of the game. It seems to me that the feedback loop is closed in the learning stage based on the scoring of each hand, which is defined in the rules.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 07:26 PM
Quote:
Originally Posted by well named
I am ignorant enough that it's possible you're right here and I don't see it yet, but I'm not convinced this is true from what I understand about the rules of bridge. To me it sort of sounds like saying that reinforcement learning doesn't work for chess because you can't tell immediately after making a move whether or not the move was good, since it depends upon the playing of the rest of the game. It seems to me that the feedback loop is closed in the learning stage based on the scoring of each hand, which is defined in the rules.
By far the major consideration in whether a bid is "right" or not is that it is understood. Whether or not 2C is a good opening bid with a strong hand depends simply on whether my partner understands what it means, basically to the exclusion of all other factors. During training, the network would latch on to any arbitrary internal consistency and prefer that as the "correct" answer to the exclusion of any other consideration. Any later variation would be reinforced against, since the penalty for bidding misunderstanding is so large and the benefits of incremental improvements in bid meanings are so marginal. You would end up with an internally consistent but inefficient bidding system which can't be fundamentally changed because varying from the established system is always the "wrong answer".

For a real world example of what I'm talking about, look at natural language. Completely arbitrary, complex systems which are inefficient and do not trend towards efficiency over time. The bidding system you would get out of reinforcement learning would be similar.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 07:31 PM
Quote:
Originally Posted by ChrisV
By far the major consideration in whether a bid is "right" or not is that it is understood.
I understand. I should have said, I've been assuming that AGZ would always play with itself as a partner (a copy, rather; same trained algorithm, no knowledge it shouldn't have), and wouldn't work as a partner for anyone else. So it wouldn't have any trouble understanding in that sense.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 08:10 PM
Right, but it's still going to be stuck in its original arbitrary choices. If through random chance it develops an understanding that opening 1H means you have spades, it's going to build an entire bidding structure around that and it will become impossible to change it. If you imagine a 2D graph of bidding efficiency, with local minima and maxima, it's going to find some local maximum or other, but getting from there to a better local maximum won't be possible because it will involve traversing a region of less efficiency.

Competitive bidding would also be a huge problem. If A0 plays against itself as opponents as well, it will learn how to cope with one system of opposition bidding and nothing else. In fact the whole system of bridge, in which opponents disclose their agreements, is kind of impossible for A0. How do you "disclose agreements" to a set of tensor functions that is already out of its learning phase and into evaluation phase?
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 08:12 PM
It is not my expectation that AGZ will attach any "meaning" to its choices in the way you are thinking, nor be stuck with anything. It's my expectation that it will try a great deal of randomly chosen options and eventually converge on something that looks like a system, similar to the way it "discovered" various chess openings.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 08:28 PM
You'd have to "teach" A0 conventions in order for it to develop any sort of bidding system that could compete with humans.

Bidding is really complicated. When you change the definition of a bid, you by necessity change the range of other bids too. You have to consider how your opponents would react, how it would affect the lead, the play of the hand etc. It just boggles the mind. I'd love to see what A0 came up with.

Q. Let's say you gave A0 some sort of standard SAYC or 2/1 system. E-W had to play that system, N-S were free to develop their own system. You gave A0 a week to do what it could. What do you think A0's system would look like? Would it symmetric-relayesque? Totally alien? I'm not sure on much. I'm confident on the following:

i) A0's system would usual takeout doubles.
ii) A0's system would use preempts.

Not much else I'm confident on. Would it invent a Blackwoodesque convention? It might. It might not. Would a 1N opener show a balanced hand with a narrowly defined point range? I feel like it would. But maybe it wouldn't. If 1N denoted a balanced hand, would A0 develop Staymanesque & Transferesque conventions? I think there's a good chance it would - those conventions make so much sense there.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-28-2017 , 08:58 PM
Quote:
Originally Posted by well named
It is not my expectation that AGZ will attach any "meaning" to its choices in the way you are thinking, nor be stuck with anything. It's my expectation that it will try a great deal of randomly chosen options and eventually converge on something that looks like a system, similar to the way it "discovered" various chess openings.
Yes, but what it converges on will be a local maximum, a system based on its initial consensus about what to do. When they have a choice, neural networks will choose the easy path over the best path, and they need to be able to improve incrementally to go from good to better.

As an example, I would guess one of the first things A0 would learn to do in bidding is to just open 4S with a hand with, say, 17+ points and 6+ spades, maybe even 5+ spades. That's usually the contract you want there, and bypassing partner is going to be heavily rewarded, because partner doesn't know how to bid. Then what? How do we make incremental improvement from there? Anytime it tries something other than 4S with those hands, results are immediately worse and that is rejected as an alternative. To make matters worse, its partner A0 is busily learning to bid based on the idea that 4S means 17+, 6+ spades, thus making that even more entrenched as the "correct answer" and even less likely to be changed in future.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-29-2017 , 09:38 AM
Quote:
Originally Posted by ChrisV
Yes, but what it converges on will be a local maximum, a system based on its initial consensus about what to do. When they have a choice, neural networks will choose the easy path over the best path, and they need to be able to improve incrementally to go from good to better.

As an example, I would guess one of the first things A0 would learn to do in bidding is to just open 4S with a hand with, say, 17+ points and 6+ spades, maybe even 5+ spades. That's usually the contract you want there, and bypassing partner is going to be heavily rewarded, because partner doesn't know how to bid. Then what? How do we make incremental improvement from there? Anytime it tries something other than 4S with those hands, results are immediately worse and that is rejected as an alternative. To make matters worse, its partner A0 is busily learning to bid based on the idea that 4S means 17+, 6+ spades, thus making that even more entrenched as the "correct answer" and even less likely to be changed in future.
What I do not understand is why it is the case that the training is limited by its "initial consensus", or even what that refers to precisely. Remember that it randomizes choices during learning. To me, your example sounds like saying that it won't learn chess well because it will first start playing 1. e4 and thus get locked into never trying other first moves, or that it will get locked into playing the french defense and only discover the best possible subvariations. Could you explain what makes bridge different from chess in this respect? It's also not clear to me why the property of neural nets "choosing the easy path" (I'm not sure that this is actually true, but assuming it is) would be more of a problem in bridge than in chess, either.

My understanding of AGZ's learning algorithm is that it probably requires three things to work well:

1) There must be a well defined set of options to choose from at each decision point.

2) There must be a way to close the feedback loop on the outcome associated with a given set of decisions.

3) Properties of the search space (e.g. incomplete information, or just the sheer size) make it more or less efficient, i.e. in that it spent less time learning chess to get to a high level than it spent learning go. I imagine there might be games where the algorithm becomes so inefficient that learning becomes infeasible.

I believe that bridge satisfies both (1) and (2), but it's not clear to me what challenges might lie in (3). I feel like the conversation about bidding conventions probably misunderstands the way the algorithm would actually learn to play, given the restriction that it could only partner itself, but maybe I'm missing something.

edit: also an important idea. I'm assuming that both copies of A0 playing together had the exact same training. There was only one training, so no possibility of divergence in the sense you write here.

Quote:
To make matters worse, its partner A0 is busily learning to bid based on the idea that 4S means 17+
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-29-2017 , 10:44 AM
For chess, it's much easier to see if a certain type of move you're making is a mistake. If A0 loses a match it made a mistake along the way*. And each move happens in a vacuum - there's no "language" element.

e.g let's say A0 thinking it's not bluffing enough on the river. It bluffs more - but but this also affects how often it should value bet, how often it gets paid off, how it bids on earlier streets and so on. In chess, that's much less of a factor. Sure, once it improves it's endgame play it will play different in the midgame. But up to a point those changes are incremental. It's like evolution. First the endgame is slightly improved. Then the midgame tweaks. And this alter the openings. But it's also extremely incremental. If you random A0 1,000 with different random seeds, you'd find after x matches it might like 1. e4 one time, 1. Na8 another time, it might value bishops more than rooks one time and so on, but the incremental nature of progress would imo guarantee the same end product every time.

Bridge is less incremental, it seems. It's harder (imo) to imagine "evolving" from ACOL to Polish Club.

In ChrisV's terms



Imagine A0 reaches the local maximum (in bridge). How does it get to the global maximum? Any incremental change would be harmful. Reaching such a dead end in chess is harder to envisage.

One thought: what if you played "Pure A0" (PA0) against an A0 that had been programmed to play 2/1? PA0 could play a large number of hands of duplicate, see what sort of boards it was winning or not winning on, and adjust. Another factor making bridge harder is the variance - a poor result in one game of chess is much more instructive than in one board of bridge.

To my mind, getting A0 to develop a bidding system from scratch is so difficult that it would be smarter for now to focus on cardplay. You could set it to work on minibridge. Or automate the bidding via the algorithm of a WBridge5 and set it to task on cardplay, so A0 would have to learn how to lead, inferences from the bidding etc. That would be fascinating. An incomplete information partnership game is a quite different challenge from anything else AI has mastered, AFAIK.

A0 would have to teach itself how to signal, under what circumstances signals mean different things, when to falsecard, the difference between single and double dummy analysis and so on. I'd love to see what it came up with.


*Excluding examples where it lost in 100 moves in black. But even then.....
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-29-2017 , 11:27 AM
I understand what the term "local maximum" means generically, but it's not clear to me how it's more likely to suffer that problem learning bridge than chess. I think part of the difference in opinion involves the idea of there being a "language element" to bridge in the bidding. I'm suggesting that I don't think A0 will have any such element, given the restriction that it only partners an exact duplicate of its trained neural net. I could be wrong about the feasibility of that idea of course, in which case I would agree it's a significant difference and a real challenge.

Bridge being less incremental seems likely to be true, and that seems like a factor which would appear under my (3), something that makes its learning process less efficient. I don't know whether that would make it completely infeasible for A0 to learn bridge, or just make it require a larger amount of resources and a longer training time. I don't know at what point there are diminishing returns on increased training time either.

Anyway, it seems like it's time to write an email to DeepMind suggesting they do bridge next, because now I'm super curious
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-29-2017 , 08:08 PM
Quote:
Originally Posted by well named
What I do not understand is why it is the case that the training is limited by its "initial consensus", or even what that refers to precisely. Remember that it randomizes choices during learning.
No, it starts its evaluation function with random weights, which will produce random play, but the evaluation function is updated as it learns. It wouldn't be possible to learn to play like a grandmaster making random moves, because the outcome from most positions will be near-random doing that. I would assume it sometimes selects second- or third-best moves during training to avoid just playing the same game over and over, but I'm not sure about that. It definitely selects what it thinks are good moves during training though.

Quote:
Originally Posted by well named
To me, your example sounds like saying that it won't learn chess well because it will first start playing 1. e4 and thus get locked into never trying other first moves, or that it will get locked into playing the french defense and only discover the best possible subvariations. Could you explain what makes bridge different from chess in this respect? It's also not clear to me why the property of neural nets "choosing the easy path" (I'm not sure that this is actually true, but assuming it is) would be more of a problem in bridge than in chess, either.
In chess, good moves are good moves even if play thereafter is suboptimal. To give a simple example, winning your opponent's queen is highly advantageous even if you play poorly after that. The exception is long forcing sequences, which I am surprised are not more of a problem for A0. However, those are special cases which A0 can tackle after it has already become proficient at chess.

In bridge bidding, 1S is a better opening bid with a strong hand and spades than 4S, but only with highly specific followups. 4S only requires partner to pass to be a good bid, whereas almost every sequence after 1S is a disaster. Therefore, if A0 tries to do a Monte-Carlo evaluation of the subtree after 1S and 4S, it will conclude that 1S sucks.

By the "easy path" I mean that neural networks will select for whatever gives the largest initial gains, rather than routes that will ultimately produce the best outcomes. Imagine I have to partner someone at bridge and they know absolutely nothing about either bridge or playing cards; they don't know how the scoring works, can't read numbers and don't know what the suits are. In the long term, it would be better to fix this problem, teach them how the game works, etc. But if I have two seconds to confer with partner before we have to play, what will give the best results is to say "look, just pass: I'll just attempt to place the contract in one bid". Since having a completely hopeless partner is precisely the situation A0 is in when it begins training, what it will learn to do is bypass partner in this way. That's the "easy path". In chess, the things that give the best initial results for beginners (don't hang your queen, don't get mated) are also generally true in master play, so there isn't this distinction between the easy and hard paths.

Quote:
Originally Posted by well named
3) Properties of the search space (e.g. incomplete information, or just the sheer size) make it more or less efficient, i.e. in that it spent less time learning chess to get to a high level than it spent learning go. I imagine there might be games where the algorithm becomes so inefficient that learning becomes infeasible.
I don't think that has anything to do with the search space, it has to do with how strong it is possible to become at simply looking at a position and evaluate it, without tree searching at all, since position evaluation is what A0 is training to do. This is more feasible in Go than in chess, one reason is that single moves in chess can transform the board more radically than in Go. A0 plays at the level of a human professional in Go without doing any tree search at all. I would strongly doubt that is possible in chess.

Quote:
Originally Posted by well named
I understand what the term "local maximum" means generically, but it's not clear to me how it's more likely to suffer that problem learning bridge than chess. I think part of the difference in opinion involves the idea of there being a "language element" to bridge in the bidding. I'm suggesting that I don't think A0 will have any such element, given the restriction that it only partners an exact duplicate of its trained neural net.
The purpose of constructive bidding (other than attempts to place the contract) is to relay information about one's hand, or seek such information from partner. If there's no "language element" in any sense, then no information can be passed and bidding other than to place the contract is pointless.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote
12-29-2017 , 08:56 PM
I'm under the impression A0 learns chess like this. Someone please correct me if I am wrong.

It starts only knowing the rules of the game & having powerful computing. This is enough to search for mate in (say) 20. If it does, it claims victory, if it doesn't, it plays a random move, as long as that move didn't lead to mate in 20 for the opponents. It plays (say) 1,000 matches using this formula, and then does statistical analysis of the end positions where mate in 20 was claimed. A simple regression of the number of pieces left on the board when mate was claimed shows (say) an extra pawn, ceteris paribus, increases a player's chances of victory by 5%. An extra bishop increases it by 20% etc. So A0 teaches itself that a bishop is worth four times a pawn. Similar estimates are made about the value of other pieces.

It then plays another 1,000 matches using the mate in 20 rule. But if it's faced with a choice of moves, it normally chooses the one that maximises its material edge, according to it's calculations from the first 1,000 matches. But in order to avoid playing the same game every time, it will select random moves now and then. When it analyses this 1,000 games it updates it's value of a bishop to, say, 3.5 pawns. It also notes that people who control the centre of the board early on tend to do well, and doubled pawns tend not to do so well. So, it updates it's formula.

By this point, which is probably only a few minutes on the super computer, A0 is already a very strong chess player. Maybe 2000 ELO? It never misses a checkmate, is tactically excellent, has a good idea about the relative value of pieces as well as basic position concepts. But it sucks at openings & it's positional ability is far below that of an elite humans.

The first 2,000 game runs were played at such a low standard that subtle positional concepts were lost in the statistical noise. There was enough data to conclude a rook is worth more than a bishop, but not that 1. e4 is better than 1. na1 (A0 at this point would select each of the 20 moves with probability 5%. It has 100 poorly played games with each opening. It is unlikely e4 would show up as better than na1 with statistical significance). But now A0 plays pretty decently. Over the next 1,000 games it's able to notice certain openings are better than others. That keeping your pieces mobile is a good thing. And so on.

After 5,000 games A0 is around 2300 ELO. A GM would get an edge in the opening and grind out a victory. A FIDE Master would get an edge in the opening but his or her higher rate of inaccuracies would make A0 a competitive matchup. It also teaches itself how to prune searches. It looks 20 moves ahead each move which means looking 1 move, 2 moves etc. ahead first. It knows that sometimes "This move looks really ****ty if you look 3 moves ahead, but it's the best move looking 20 moves ahead". But it also knows *under what conditions such sacrifice type plays tend to occur*. So it teaches itself when to consider sacrifice a piece and when not to, and it's much better at recognising this (and pruning) that current engines. A0 continues to improve but at a slower rate. Eventually DeepMind stop running A0 either after a certain period of time or when they detect A0 has stopped improving beyond a certain speed.
Google's AlphaZero AI Learns Chess in Four Hours; Then Beats Stockfish 28-0-72 Quote

      
m