(I've introduced some new questions and comments regarding the CFR+ algorithm used by the Cepheus team.)
Quote:
Originally Posted by nburch
I suspect that it _is_ beating a dead horse, but....
You don't like the term "essentially solved" and think it's somehow misleading. We'll just have to disagree. I don't know how you use "essentially" but if I ask someone "are you done?" and their answer is "I'm essentially finished" I don't think they're 100% finished.
If someone said "I'm literally finished" I wouldn't think he or she was 100% done either, but I wouldn't expect to see it used that way in a scientific paper. (Maybe "essentially" means something a little different in Canada? I don't mean this as a joke, eh.)
Quote:
You don't like the idea of using a human lifetime of play. We'll just have to disagree.
...
You talk about chess, and statistical superiority over humans. This misses the point.... Conversely, if someone was to just consider actual human play for poker, HULH was done years ago.
If the point isn't about play against humans, then why use a human lifetime of play?
Quote:
We are aware there are a wide variety of variance reduction techniques. If I've misread what you wrote, you also misread what I wrote. Yes, it's quite true that you can subtract off the expected value of the bot against itself, and this is an unbiased estimator because the expected value of self play when alternating seats is zero (otherwise you need an extra term.) Regarding playing duplicate poker with Cepheus as a partner... If I get x_i on hand i playing as the small blind, and Cepheus gets y_i playing as the small blind with same cards, you can use sum_i x_i - y_i to estimate your expected winnings. Or, if you were playing a duplicate match with Cepheus as partner, Cepheus-the-big-blind gets -y_i on hand i because the game is zero-sum, so your team estimate is also sum_i x_i - y_i. But you know the expected value of Cehpeus' self-play is 0, so that's also an estimate of your own value.
I'm going to use <> to denote expectation. If you are trying to estimate <x - y>, and you know <y> and <y_i>, which would have lower variance:
(1) sum_i(x_i - <y_i>)/n
or
(2) sum_i(x_i)/n - <y>?
If x_i and <y_i> are correlated then (1) will have lower variance.
Quote:
Hopefully, everyone is well aware that there is not (generally) just a single Nash equilibrium for a game.
No, but in the case of HULH and similar games, I think that if there are "multiple equilibria, they [would be] all similar except that some contained dominated strategy choices that were irrelevant against an optimal opponent." [Mathematics of Poker, pg 359. I'm not sure if they were referring to all typical poker games, or just ones they analyzed in their book.]
Quote:
Finding an arbitrary equilibrium that never makes a particular action x does NOT mean that action x is a mistake.
It does if the equilibrium does sometimes reach that action, and the particular action is negative EV against that equilibrium. (You can't gain the EV back by making a change somewhere else.)
Quote:
It just means not playing x is not a mistake. Given how CFR+ with no sampling works, and the choice to use the current strategy, there are no obvious places to prune low probability actions, preflop or otherwise: keep in mind that actions taken with positive probability had better expected value than the previous strategy, and the next-to-last strategy was already very good. If an actions is such an obvious mistake, it wouldn't be playing it on the next iteration... It might also be worth noting that even with sampling CFR, using the average strategy (which retains the initial all-random-play strategy from the first iteration), throwing away low probability actions still INCREASES the exploitability, not decreases it. There will surely be some threshold value where this switches, but it's not at all obvious what that threshold is. Purification might seem like a good idea, but it isn't.
Maybe I don't understand CFR+ fully. If an action has negative EV against the previous strategy, but still positive regret, it will still be taken with positive probability in the new strategy, right? I know you didn’t save the intermediate strategies, but did you at least save the preflop strategies? Then you could see how the preflop strategies changed from one iteration to the next. (I'm guessing you would find a certain amount of oscillating actions, especially near the end where your exploitability graph is noisy. If you kept the average strategy of all iterations I think it would have a significantly reduced exploitability.)
If purification is so tough, then maybe it’s a good area to study.
Note that CFR+ does do a lot of purification (compared to CFR or fictitious play), just not of plays that are only slightly -EV.
Quote:
Just going ahead and assuming that the places where Cepheus makes actions with small non-zero probability are places where it is making a mistake is a theoretically un-justified leap.
Assuming that all those places are a mistake might be unjustified, but I'd bet that many of them are a mistake. I've tested CFR+ on a particular game I've been hired to work on, and it seems that close plays don't go away quickly because the strategy oscillates after around 800 iterations and does not converge further. (However, the average strategy over all iterations does continue to converge, but not as fast as another algorithm I'm using.) (The game I'm working on is poker-related, but currently does not involve betting, so I don't know how much it applies to HULH.) Looking at your graph of exploitability versus computation time, it seems you might have some of the same problems near the end of your simulation. Not storing the average strategy is "theoretically un-justified" and may have limited the convergence of your algorithm after the initial 1000 iterations or so. Did you save the preflop strategies, or do you have any other way of checking for oscillations?
Quote:
So... if you catch it folding the nuts you know there's a mistake, but you had better still include the probability of reaching that situation, and the probability that it actually does so -- there's a good chance it still doesn't show up in 65 million hands.
I don’t think any iteration of Cepheus/CFR+ would ever fold the stone cold nuts, or even fail to raise on the river (unless it can only be against the nuts).
Quote:
Finally, as far as the statistics go, I will admit to being confused as to exactly what you're suggesting. Yes, for a 95% confidence test there is a bit better than 50% chance that Cepheus would appear to be an equilibrium with 65M hands playing against a best response. Yes, there's ~5% chance of Cepheus being ahead in terms of money. And, yes, the rate of false negatives and the probabilty of being ahead in terms of money both go up with a better approximation. But you have now twice talked about having a 95% chance of the identifying the proposed strategy as an equilibrium. Using a 95% confidence test, this is exactly and only the case for an exact strategy. Fine: you don't like >5% chance of winning (50%/5% type I/II error), or 5%/50% error, and you clearly don't like 5%/5% error. Do you have a particular type-I/type-II error level in mind that’s achievable before outright solving the game?
I’m saying you should want X% chance that a test will incorrectly fail to reject the hypothesis (that your strategy is an equilibrium) with Y% confidence, and that X% should be close to Y%, not close to 50%. I did make a mistake suggesting that X could actually equal Y. The goal is to perform almost as well as the GTO would (say, 95% as well as GTO against a 95% test, so about 90%). (You are correct that you can’t perform equally as well without being GTO.)
Then, you can write "We define a game to be blah-blah-blah-solved if a human lifetime of play is _sufficiently_unlikely_ to statistically differentiate it from being solved at 95% confidence", although you still can't write "_unable_ to statistically differentiate it from being solved at 95% confidence." (You would still need to define "sufficiently unlikely".)