Open Side Menu Go to the Top
Register
Alberta university Poker 'bot "solves" heads up limit hold 'em Alberta university Poker 'bot "solves" heads up limit hold 'em

02-04-2015 , 02:13 PM
It does not consider levels. It does not adjust. It doesn't even really pre-adjust. It's not really correct to say that it "learned" anything. It just searched through the space of all possible solutions. It chose a solution, calculated it's EV against it's nemesis, and then used a special algorithm to choose the next solution to try. That's really all.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 02:36 PM
Yeah I get you now.. I didn't realise the method it used was so alien to the human method.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 02:38 PM
Quote:
Originally Posted by RustyBrooks
It's not really correct to say that it "learned" anything. It just searched through the space of all possible solutions. It chose a solution, calculated it's EV against it's nemesis, and then used a special algorithm to choose the next solution to try.
I think using the terms learning and adjusting is correct because the algorithm, if I understand correctly, is choosing its next strategy based on weaknesses in its current strategy. In that sense its learning from its mistakes and making adjustments, as opposed to just sequentially or randomly selecting a new strategy to try.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 02:48 PM
Quote:
Originally Posted by NMcNasty
I think using the terms learning and adjusting is correct because the algorithm, if I understand correctly, is choosing its next strategy based on weaknesses in its current strategy. In that sense its learning from its mistakes and making adjustments, as opposed to just sequentially or randomly selecting a new strategy to try.
Learning is probably the correct term of art, but it's a terrible term to use when discussing it among non-specialists, because that word already has a particular meaning to regular people, and it's not the same meaning.

And really, the way we got to the solution is very immaterial when you're talking about what the solution "does" or what it's "like", any of the ways you arrived at this solution would have the same result and most of them don't do anything that could even euphamistically be called "learning"
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 02:50 PM
I'll admit that there are two meanings of "adjusting" in play here though. Normally in poker "adjusting to our opponents" means trying to find the perfect counter-strategy (nemesis) but yeah Cepheus isn't trying trying to do that.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 02:56 PM
Quote:
Originally Posted by nburch
It's pretty much a certainty that the best response to any current chess program would win.
Haven't they all lost at least one game? If so then its a 100% certainty. (For that color piece at least.)
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 03:19 PM
Quote:
Originally Posted by RustyBrooks
Learning is probably the correct term of art, but it's a terrible term to use when discussing it among non-specialists, because that word already has a particular meaning to regular people, and it's not the same meaning.

And really, the way we got to the solution is very immaterial when you're talking about what the solution "does" or what it's "like", any of the ways you arrived at this solution would have the same result and most of them don't do anything that could even euphamistically be called "learning"
Disagree entirely about the semantics of learning and there is no widely agreed upon meaning by either specialists or non-specialists. Regular Joes and top computer scientists might say that Netflix "learns" what movies I like based on what movies I watch, in the same way Cepheus learns its next strategy to choose based on how its current strategy does against its nemesis. You could get into some philosophical debate about whether learning requires consciousness or not or something but I think thats a little much for right now.

I agree how we get to a solution I agree is immaterial (perhaps still interesting though) once we get to a solution. If however, we aren't at a solution, like we aren't at now for HUFLHE, wondering whether we can get to the real solution with current methodology is a legitimate question. Admitting that a perfect solution exists and can be acquired through a different methodology is pretty important first step. People itt have alluded to the fact that this is the case, but so far its been ignored by the Alberta team.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 07:12 PM
That's digusting.

Why do you think people who have actually worked on this problem seem to think it's "solved enough" while apparently specialized forum-trolls don't?
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 08:12 PM
Quote:
Originally Posted by skario
That's digusting.

Why do you think people who have actually worked on this problem seem to think it's "solved enough" while apparently specialized forum-trolls don't?
Thread is going in circles and we're back to name-calling.
Alberta university Poker 'bot "solves" heads up limit hold 'em Quote
02-04-2015 , 11:52 PM
(I've introduced some new questions and comments regarding the CFR+ algorithm used by the Cepheus team.)

Quote:
Originally Posted by nburch
I suspect that it _is_ beating a dead horse, but....

You don't like the term "essentially solved" and think it's somehow misleading. We'll just have to disagree. I don't know how you use "essentially" but if I ask someone "are you done?" and their answer is "I'm essentially finished" I don't think they're 100% finished.
If someone said "I'm literally finished" I wouldn't think he or she was 100% done either, but I wouldn't expect to see it used that way in a scientific paper. (Maybe "essentially" means something a little different in Canada? I don't mean this as a joke, eh.)

Quote:
You don't like the idea of using a human lifetime of play. We'll just have to disagree.
...
You talk about chess, and statistical superiority over humans. This misses the point.... Conversely, if someone was to just consider actual human play for poker, HULH was done years ago.
If the point isn't about play against humans, then why use a human lifetime of play?

Quote:
We are aware there are a wide variety of variance reduction techniques. If I've misread what you wrote, you also misread what I wrote. Yes, it's quite true that you can subtract off the expected value of the bot against itself, and this is an unbiased estimator because the expected value of self play when alternating seats is zero (otherwise you need an extra term.) Regarding playing duplicate poker with Cepheus as a partner... If I get x_i on hand i playing as the small blind, and Cepheus gets y_i playing as the small blind with same cards, you can use sum_i x_i - y_i to estimate your expected winnings. Or, if you were playing a duplicate match with Cepheus as partner, Cepheus-the-big-blind gets -y_i on hand i because the game is zero-sum, so your team estimate is also sum_i x_i - y_i. But you know the expected value of Cehpeus' self-play is 0, so that's also an estimate of your own value.
I'm going to use <> to denote expectation. If you are trying to estimate <x - y>, and you know <y> and <y_i>, which would have lower variance:
(1) sum_i(x_i - <y_i>)/n
or
(2) sum_i(x_i)/n - <y>?

If x_i and <y_i> are correlated then (1) will have lower variance.

Quote:
Hopefully, everyone is well aware that there is not (generally) just a single Nash equilibrium for a game.
No, but in the case of HULH and similar games, I think that if there are "multiple equilibria, they [would be] all similar except that some contained dominated strategy choices that were irrelevant against an optimal opponent." [Mathematics of Poker, pg 359. I'm not sure if they were referring to all typical poker games, or just ones they analyzed in their book.]

Quote:
Finding an arbitrary equilibrium that never makes a particular action x does NOT mean that action x is a mistake.
It does if the equilibrium does sometimes reach that action, and the particular action is negative EV against that equilibrium. (You can't gain the EV back by making a change somewhere else.)

Quote:
It just means not playing x is not a mistake. Given how CFR+ with no sampling works, and the choice to use the current strategy, there are no obvious places to prune low probability actions, preflop or otherwise: keep in mind that actions taken with positive probability had better expected value than the previous strategy, and the next-to-last strategy was already very good. If an actions is such an obvious mistake, it wouldn't be playing it on the next iteration... It might also be worth noting that even with sampling CFR, using the average strategy (which retains the initial all-random-play strategy from the first iteration), throwing away low probability actions still INCREASES the exploitability, not decreases it. There will surely be some threshold value where this switches, but it's not at all obvious what that threshold is. Purification might seem like a good idea, but it isn't.
Maybe I don't understand CFR+ fully. If an action has negative EV against the previous strategy, but still positive regret, it will still be taken with positive probability in the new strategy, right? I know you didn’t save the intermediate strategies, but did you at least save the preflop strategies? Then you could see how the preflop strategies changed from one iteration to the next. (I'm guessing you would find a certain amount of oscillating actions, especially near the end where your exploitability graph is noisy. If you kept the average strategy of all iterations I think it would have a significantly reduced exploitability.)

If purification is so tough, then maybe it’s a good area to study.

Note that CFR+ does do a lot of purification (compared to CFR or fictitious play), just not of plays that are only slightly -EV.

Quote:
Just going ahead and assuming that the places where Cepheus makes actions with small non-zero probability are places where it is making a mistake is a theoretically un-justified leap.
Assuming that all those places are a mistake might be unjustified, but I'd bet that many of them are a mistake. I've tested CFR+ on a particular game I've been hired to work on, and it seems that close plays don't go away quickly because the strategy oscillates after around 800 iterations and does not converge further. (However, the average strategy over all iterations does continue to converge, but not as fast as another algorithm I'm using.) (The game I'm working on is poker-related, but currently does not involve betting, so I don't know how much it applies to HULH.) Looking at your graph of exploitability versus computation time, it seems you might have some of the same problems near the end of your simulation. Not storing the average strategy is "theoretically un-justified" and may have limited the convergence of your algorithm after the initial 1000 iterations or so. Did you save the preflop strategies, or do you have any other way of checking for oscillations?

Quote:
So... if you catch it folding the nuts you know there's a mistake, but you had better still include the probability of reaching that situation, and the probability that it actually does so -- there's a good chance it still doesn't show up in 65 million hands.
I don’t think any iteration of Cepheus/CFR+ would ever fold the stone cold nuts, or even fail to raise on the river (unless it can only be against the nuts).
Quote:
Finally, as far as the statistics go, I will admit to being confused as to exactly what you're suggesting. Yes, for a 95% confidence test there is a bit better than 50% chance that Cepheus would appear to be an equilibrium with 65M hands playing against a best response. Yes, there's ~5% chance of Cepheus being ahead in terms of money. And, yes, the rate of false negatives and the probabilty of being ahead in terms of money both go up with a better approximation. But you have now twice talked about having a 95% chance of the identifying the proposed strategy as an equilibrium. Using a 95% confidence test, this is exactly and only the case for an exact strategy. Fine: you don't like >5% chance of winning (50%/5% type I/II error), or 5%/50% error, and you clearly don't like 5%/5% error. Do you have a particular type-I/type-II error level in mind that’s achievable before outright solving the game?
I’m saying you should want X% chance that a test will incorrectly fail to reject the hypothesis (that your strategy is an equilibrium) with Y% confidence, and that X% should be close to Y%, not close to 50%. I did make a mistake suggesting that X could actually equal Y. The goal is to perform almost as well as the GTO would (say, 95% as well as GTO against a 95% test, so about 90%). (You are correct that you can’t perform equally as well without being GTO.)

Then, you can write "We define a game to be blah-blah-blah-solved if a human lifetime of play is _sufficiently_unlikely_ to statistically differentiate it from being solved at 95% confidence", although you still can't write "_unable_ to statistically differentiate it from being solved at 95% confidence." (You would still need to define "sufficiently unlikely".)
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-06-2015 , 12:09 PM
Quote:
Why do you think people who have actually worked on this problem seem to think it's "solved enough" while apparently specialized forum-trolls don't?
There is an incentive to claim it's solved enough if it's you who have worked and published the solution.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-06-2015 , 01:37 PM
Quote:
Originally Posted by Andy Bloch
Assuming that all those places are a mistake might be unjustified, but I'd bet that many of them are a mistake. I've tested CFR+ on a particular game I've been hired to work on, and it seems that close plays don't go away quickly because the strategy oscillates after around 800 iterations and does not converge further. (However, the average strategy over all iterations does continue to converge, but not as fast as another algorithm I'm using.) (The game I'm working on is poker-related, but currently does not involve betting, so I don't know how much it applies to HULH.) Looking at your graph of exploitability versus computation time, it seems you might have some of the same problems near the end of your simulation. Not storing the average strategy is "theoretically un-justified" and may have limited the convergence of your algorithm after the initial 1000 iterations or so. Did you save the preflop strategies, or do you have any other way of checking for oscillations?
Did you use linearly weighted averaging (weight = iteration number)? It's much faster than CFR-style constant-weight averaging. Note to save time you can also delay it, that is start doing at iteration 800 for example.

We did compute and store (linearly weighted) average strategy too but the current strategy was so close that we we able to use it instead. If we were to continue the computation further we could switch to it anytime in case the current strategy convergence would slow down or stop.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-06-2015 , 01:48 PM
Here are graphs from my one-card poker demo, using an 800-iteration averaging delay.

Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-07-2015 , 04:47 PM
Quote:
Here are graphs from my one-card poker demo, using an 800-iteration averaging delay.
This is curious behavior (not improving for many iterations and then making huge progress). I haven't seen that with real holdem or maybe I've just given up too early on some CFR variants displaying similar behavior at the beginning (usually stopping progress for many iterations at some threshold).
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-07-2015 , 08:11 PM
I think he's not starting to calculate an average until at least iteration 800, which is why they're all zeros before that. The dark orange is an averaged amount, looks at the blue lines for non-averaged CFR and CFR+
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-08-2015 , 10:28 AM
Yes, there's no strategy before iteration 800 so there should really be nothing in the graph before that. I changed it to use the current strategy (delayed by a half-iteration because of an implementation detail) to make it more clear. Delaying works in hold'em and all other games we've tried it in too.

(I also added linear weighting without a delay for comparison.)



The dark blue line is averaged (i.e. the usual vanilla) CFR. Non-averaged CFR does not work at all.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 03:14 PM
To everyone:

I am interested in anyone posting their results against the online
Cepheus Computer Program. It is very aggressive-but the weakness
appears the program has patterns. Which might be exploitable.
Have played it now for many thousands of hands. Would ask
any math expert out their. If a person plays 10,000 hands
against Cepheus and is ahead. Would that be enough hands to show
this program is beatable?
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 03:39 PM
No, 10,000 hands is probably not nearly enough, unless you have a very large win rate. I'd be surprised if 100k was enough.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 03:48 PM
200 runs of 100K hands each with winrate 1bb/100, using SD = 20bb/100 estimated for LHE. Lots of the runs still underwater after 100K hands.

Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 06:12 PM
Quote:
Originally Posted by TimothyBauer
To everyone:

I am interested in anyone posting their results against the online
Cepheus Computer Program. It is very aggressive-but the weakness
appears the program has patterns. Which might be exploitable.
Have played it now for many thousands of hands. Would ask
any math expert out their. If a person plays 10,000 hands
against Cepheus and is ahead. Would that be enough hands to show
this program is beatable?
why do you fear change so much?
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 06:45 PM
Quote:
Originally Posted by NewOldGuy
200 runs of 100K hands each with winrate 1bb/100, using SD = 20bb/100 estimated for LHE. Lots of the runs still underwater after 100K hands.

To NewOldGuy:

Thanks for the figures. Can you draw up a chart Cepheus playing
not an expert but someone whom computer should beat for 4 big
bets per 100 hands. What is the probability this opponent would be
ahead of Cepheus after 10,000, 25,000, and 100,000 hands?
Thanks if can do this.

Timothy Bauer
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 07:18 PM
Quote:
Originally Posted by TimothyBauer
Thanks for the figures. Can you draw up a chart Cepheus playing
not an expert but someone whom computer should beat for 4 big
bets per 100 hands. What is the probability this opponent would be
ahead of Cepheus after 10,000, 25,000, and 100,000 hands?
Thanks if can do this.

Timothy Bauer
This is a straightforward math exercise to calculate the probability. No need to graph it.

If we assume an SD/100 of 20, then for 25000 hands the SD is 20*sqrt(25000/100) = 316. We expect to be ahead by 250*4=1000, and 1000/316 is 3.2 standard deviations under expectation for us to be breakeven or worse. Putting 3.2 in a Z-score calculator we get probability of .000687 or about 1 chance in 1456.

For 100,000 hands I get about 1 in 536,000.

This should also prove that your result will be much more reliable at 100K hands than at 25K. In other words much less chance to get lucky over 100K hands.

Last edited by NewOldGuy; 02-20-2015 at 07:46 PM.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
02-20-2015 , 08:22 PM
Thankyou very much. That was informative.
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
03-15-2015 , 03:10 AM
This is awful. Stop it computer scientists before you ruin poker!
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote
05-27-2015 , 08:47 AM
Here's a new, more technical paper: http://webdocs.cs.ualberta.ca/~games...ai-cfrplus.pdf
Alberta university Poker 'bot &quot;solves&quot; heads up limit hold 'em Quote

      
m