An interesting aspect of how A0 works that hasn't been explained ITT so far is the use of Monte-Carlo Tree Search (MCTS) in contrast to alpha-beta search. From the paper:
Quote:
MCTS and Alpha-Beta Search
For at least four decades the strongest computer chess programs have used alpha-beta search. AlphaZero uses a markedly different approach that averages over the position evaluations within a subtree, rather than computing the minimax evaluation of that subtree. However, chess programs using traditional MCTS were much weaker than alpha-beta search programs, while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions.
AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree. Using MCTS may allow AlphaZero to effectively combine its neural network representations with a powerful,domain-independent search.
Here's what this means translated into English:
A program like Stockfish seeks to build as close to a complete game tree as it can, evaluating positions at the base of the tree using its rough evaluation function. It assumes that players will make what it thinks is the best move for them at every juncture. The value of any immediate move is the evaluation of the position at the bottom of the tree, when you follow the tree down making best moves for each player. Assuming Stockfish has evaluated all the moves, it will therefore correctly evaluate a move as good even if there is only one specific 15-move combination that leads to a positive outcome and the rest are disastrous. That's what the text means by "alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree". By "approximation errors" they mean evaluation error - that the weakness of Stockfish is that its evaluation of the final position it considers could easily be badly wrong. Stockfish hopes that this evaluation is so many moves ahead that this inaccuracy will not matter.
AlphaZero does not do this. The way it does its tree search is a Monte-Carlo evaluation, where it makes what it thinks are "reasonable" moves for each player (perhaps not the best) and sees where that leads. I'm necessarily being vague here with "reasonable", because the details will have to wait until the full paper. It does that multiple times and averages the results. That's what the text means by "MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree". It's like rollouts at backgammon. That's all well and good, and accounts for the improved positional play of A0, but one would suspect that the weakness of A0 would be long forcing sequences. Instead, it appeared able to find tactical moves which Stockfish only finds with a long think. That's impressive.
What that makes me wonder about is how strong AlphaZero would be on sheer evaluation alone, without any tree search at all. I have read that at Go, simply evaluating a position without any search at all, it plays at the level of a human professional. I don't know for sure that's true, but it sounds right. I'd love to see how strong A0 would be at chess under those conditions.