Quote:
Originally Posted by well named
What I do not understand is why it is the case that the training is limited by its "initial consensus", or even what that refers to precisely. Remember that it randomizes choices during learning.
No, it starts its evaluation function with random weights, which will produce random play, but the evaluation function is updated as it learns. It wouldn't be possible to learn to play like a grandmaster making random moves, because the outcome from most positions will be near-random doing that. I would assume it sometimes selects second- or third-best moves during training to avoid just playing the same game over and over, but I'm not sure about that. It definitely selects what it thinks are good moves during training though.
Quote:
Originally Posted by well named
To me, your example sounds like saying that it won't learn chess well because it will first start playing 1. e4 and thus get locked into never trying other first moves, or that it will get locked into playing the french defense and only discover the best possible subvariations. Could you explain what makes bridge different from chess in this respect? It's also not clear to me why the property of neural nets "choosing the easy path" (I'm not sure that this is actually true, but assuming it is) would be more of a problem in bridge than in chess, either.
In chess, good moves are good moves even if play thereafter is suboptimal. To give a simple example, winning your opponent's queen is highly advantageous even if you play poorly after that. The exception is long forcing sequences, which I am surprised are not more of a problem for A0. However, those are special cases which A0 can tackle after it has already become proficient at chess.
In bridge bidding, 1S is a better opening bid with a strong hand and spades than 4S, but only with highly specific followups. 4S only requires partner to pass to be a good bid, whereas almost every sequence after 1S is a disaster. Therefore, if A0 tries to do a Monte-Carlo evaluation of the subtree after 1S and 4S, it will conclude that 1S sucks.
By the "easy path" I mean that neural networks will select for whatever gives the largest initial gains, rather than routes that will ultimately produce the best outcomes. Imagine I have to partner someone at bridge and they know absolutely nothing about either bridge or playing cards; they don't know how the scoring works, can't read numbers and don't know what the suits are. In the long term, it would be better to fix this problem, teach them how the game works, etc. But if I have two seconds to confer with partner before we have to play, what will give the best results is to say "look, just pass: I'll just attempt to place the contract in one bid". Since having a completely hopeless partner is precisely the situation A0 is in when it begins training, what it will learn to do is bypass partner in this way. That's the "easy path". In chess, the things that give the best initial results for beginners (don't hang your queen, don't get mated) are also generally true in master play, so there isn't this distinction between the easy and hard paths.
Quote:
Originally Posted by well named
3) Properties of the search space (e.g. incomplete information, or just the sheer size) make it more or less efficient, i.e. in that it spent less time learning chess to get to a high level than it spent learning go. I imagine there might be games where the algorithm becomes so inefficient that learning becomes infeasible.
I don't think that has anything to do with the search space, it has to do with how strong it is possible to become at simply looking at a position and evaluate it, without tree searching at all, since position evaluation is what A0 is training to do. This is more feasible in Go than in chess, one reason is that single moves in chess can transform the board more radically than in Go. A0 plays at the level of a human professional in Go without doing any tree search at all. I would strongly doubt that is possible in chess.
Quote:
Originally Posted by well named
I understand what the term "local maximum" means generically, but it's not clear to me how it's more likely to suffer that problem learning bridge than chess. I think part of the difference in opinion involves the idea of there being a "language element" to bridge in the bidding. I'm suggesting that I don't think A0 will have any such element, given the restriction that it only partners an exact duplicate of its trained neural net.
The purpose of constructive bidding (other than attempts to place the contract) is to relay information about one's hand, or seek such information from partner. If there's no "language element" in any sense, then no information can be passed and bidding other than to place the contract is pointless.