Open Side Menu Go to the Top
Register
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

05-10-2015 , 04:10 PM
I guess I'll hop in here and answer a few questions and clarify a few points too.

First, the 95% threshold was not chosen arbitrarily. That's the standard generally used in science, and the standard used in the ACPC. I personally would have avoided the term "statistical tie" which does NOT mean the same thing as "tie", and would have instead opted for the equivalent term "not statistically significant", but I didn't write the press release.

Earlier I mentioned that the pros would have needed to win by 10.35 BB/100 for statistical significance. This does correctly account for the hands being mirrored. We knew going into this that getting statistical significance would be tough, but it was definitely possible for one side or the other to reach that threshold. Based on experiments, we estimated going into the competition that the win rate would need to be ~8.5 BB/100. However, it's impossible to say what the needed threshold is until after the competition ends, because it depends on how the pros and the bot play. The bot and humans ended up playing very aggressively, and very differently, so that likely pushed up the variance.

Equity chopping did reduce the variance (there was some debate if this would actually be the case, since mirrored hands can play out very differently when there's an equity chop), but not by much. Without the equity chop, the needed threshold would have been 10.72 BB/100 (and, interestingly, the pros' overall win rate would have dropped to 7.0 BB/100).

I think anyone that looks at this objectively will recognize we did everything we could to achieve statistical significance: 4 humans, mirrored hands, equity chopping, and as many hands as the humans could possibly play. That said, we recognize there are things we could improve if there's a next time (more humans, multitabling, and maybe even more variance-reduction techniques).

200BB was chosen for this competition and as the ACPC format because it is more challenging for bots due to the much larger game tree. We trained new strategies for this competition, and we could have easily switched them to be in 100BB format. We didn't pick 200BB to make this easy on ourselves, just as we didn't pick some big-name poker players who are not so great in heads up no-limit to make this easy on ourselves.

I'll also say that I think the humans had a stronger edge than the statistics reveal. This is from me personally watching the humans and bot play over the past two weeks, and can't be captured in the statistics which just look at the scores for each hand. There are definitely weaknesses in the bot, and they are weaknesses that can't be fixed with just more memory or more cores. As a researcher, that's very exciting, because it means we need new approaches to address the unique challenges of no-limit games. The next couple years will be very busy for me!

For those of you that are interested, here's the breakdown of the scores per session (usually 750-800 hands):
Spoiler:
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:10 PM
Quote:
total amount wagered over the two weeks in theoretical dollars was $170 million
Gross cashes; it's not just for MTTs anymore.

Good job humans. Between all the distractions and boredom it was a very impressive showing. Pretty disgusting how those involved on the other sideare representing the results. Agree with others that if results had been reversed the message would be a tad different. Id suggest not having anything to do with them in the future.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:10 PM
Quote:
Originally Posted by watergun7
Doug I know it's strat question but I hope you can answer since it's pretty simple (and doesn't apply for human play).

Why were you raising limps with almost 100% of your range? Claudico should be fairly balanced with its limps I thought (obviously you have the hhs and can look through the database). Raising limps 100% is exploitative- so has Claudico adjusted over time to the 100% iso in your opinion?
It played pretty weak vs the raise.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:16 PM
Quote:
Originally Posted by WCGRider
We got the hands after each day, for that day, and could import them and review. However a real-time hud would have been way better, this is the first time in easily 5-6 years I had to play "blind".
Was there some kind of stipulation that you couldn't use a HUD or was it just the fact that no such HUD exists that prevented you from using one?

If you guys play them again next year I feel like it's something that needs to be present (call me )
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:20 PM
It wasnt going to work given the time constraints on getting everything together. Obviously a hud would make this a lot better to play with.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:21 PM
Doug,

I had posted this before, but since I now see you are reading the thread I will do it again: You came across very well in the stream, both on your own behalf as a personable and upbeat person and as a representative of those that play poker for reasons over and above the gamble.

After watching all of your streams at one time or another, you and your team would have been winners in my book even if the Event had not ended in a +$9/hand over 80,000 hands "tie."
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 04:37 PM
great response by noam
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 05:34 PM
Quote:
Originally Posted by restorativejustice
Doug,

I had posted this before, but since I now see you are reading the thread I will do it again: You came across very well in the stream, both on your own behalf as a personable and upbeat person and as a representative of those that play poker for reasons over and above the gamble.

After watching all of your streams at one time or another, you and your team would have been winners in my book even if the Event had not ended in a +$9/hand over 80,000 hands "tie."
Thanks I appreciate the support
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 05:45 PM
Doug, I couldn't tell because you had glasses on, but were your eyes rolling when you tried to explain the 99 vs A4 aipf play to the professor?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 07:10 PM
Quote:
you and your team would have been winners in my book even if the Event had not ended in a +$9/hand over 80,000 hands "tie.
+1 Huge respect Doug
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 07:34 PM
Quote:
Originally Posted by NoamBrown
I think anyone that looks at this objectively will recognize we did everything we could to achieve statistical significance: 4 humans, mirrored hands, equity chopping, and as many hands as the humans could possibly play. That said, we recognize there are things we could improve if there's a next time (more humans, multitabling, and maybe even more variance-reduction techniques).
Noam and Sam,

As far as variance-reduction techniques, have you considered equity-chopping pots based on Claudico's entire range at showdown? If you saved Claudico's strategy for each session, this could be easily computed and reduce variance considerably, perhaps more so than the duplicate matches?

Quote:
Originally Posted by Sam Ganzfried
For the flop minbet example, he bets 100 into pot of 500. So x = 0.2. The closest actions we have are A = 0 (for check) and B = 0.25. Plugging these in to the formula give f(x) = 1/6 = 0.167. This is the probability we map his bet down to 0 and interpret it as a check. So we pick a random number in [0,1], and if it's above 1/6 we interpret the bet as 0.25 pot, and otherwise as a check.
Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?

Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 08:42 PM
Quote:
Originally Posted by jaytorr
Noam and Sam,

As far as variance-reduction techniques, have you considered equity-chopping pots based on Claudico's entire range at showdown? If you saved Claudico's strategy for each session, this could be easily computed and reduce variance considerably, perhaps more so than the duplicate matches?



Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?

Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?
G Bucks ITT
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 08:45 PM
Quote:
Originally Posted by jaytorr
As far as variance-reduction techniques, have you considered equity-chopping pots based on Claudico's entire range at showdown? If you saved Claudico's strategy for each session, this could be easily computed and reduce variance considerably, perhaps more so than the duplicate matches?
There are a lot of variance-reduction techniques we'd like to try, such as the one you describe. We decided to not use anything beyond duplicate/AIEV for the competition, since it would be impossible for anyone else to verify the accuracy of more sophisticated approaches that depend on looking at our own strategies.

Quote:
Originally Posted by jaytorr
Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?
I'd need to see the specific hand, but yes it's likely the opponent bet a size below the lowest size in our abstraction and we happened to map it down to check, and so weren't allowed to fold. We had a 10% pot size for the opponent for the river during most of the match, which should have protected us pretty well against this problem. The hand you have in mind likely occurred when we didn't have the 10% size in.

Quote:
Originally Posted by jaytorr
Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?
Claudico didn't use much "opponent modeling" per se, though I've worked on opponent modeling in the past for other projects (see AAMAS-11 and TEAC-15 papers on my website). We did make several modifications throughout the competition that were intended to plug some leaks, and possibly counter-exploit some of the humans' exploitative tendencies, and a couple times we used different versions against different humans.

The card abstraction and equilibrium-finding algorithms were the same as for Tartanian7, http://www.cs.cmu.edu/~sganzfri/Tartanian7_AAMAS15.pdf. We decided to switch to an "asymmetric betting abstraction," where we limited the number of bet sizes for ourselves for first preflop action -- we just had fold, limp, 2.5x (for Tartanian7 we also had 2x, 3x, and some other bigger ones) -- while keeping in a lot of sizes for the opponent in case the humans decided to use them. This cut down on the size of the game tree pretty significantly, and I think led to better convergence.

We also incorporated real-time computation for the river that we didn't include in Tartanian7 due to some last-minute issues right before the computer competition deadline, which I think helped a lot. That approach is described in detail here, http://www.cs.cmu.edu/~sganzfri/Endgame_AAMAS15.pdf. Those were the main differences, and there were also several less major ones maybe I'll describe more later.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 09:13 PM
Can we please see the calculation for the standard error?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 10:38 PM
oh, oh, ok, about claudico's betsize bucketing. when creating the interface for the challenge, it seems pretty likely to me that the developers at least discussed what %pot shortcuts were used as they related to claudico's betsizing abstraction. perhaps it's even possible the developers put in the %pot shortcuts strictly to make it more likely that humans would use sizes present on claudico's abstracted tree, which is in their interest - the more humans deviate from claudico's abstracted tree, the more mistakes claudico makes.

humans should have been able to figure this out beforehand and choose sizes that fell directly in the middle of claudico's sizes in an attempt to exploit errors arising from bucketing sizes. (at the very least, after that 5 high call it should be obvious that claudico's bucketing sizes) also it's quite likely claudico's sizes correspond to the %pot shortcuts, so we'd know where to aim. if we see %pot shortcuts for .5pot and .75pot, claudico's abstraction likely includes thses sizes, so choose a betsize of .625pot. I wasn't paying enough attention though, maybe they did stuff like this?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 11:09 PM
Doug
Congrats on the challenge. As others have complimented, you personally came across very well particularly on the streams in the casino chatting with interested patrons.

How tilting was the lengthy river delay by the Bot? I would have been sorely tempted to shove more turns just to stop the ****ing thing from taking so long. LOL.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 11:14 PM
Isn't part of GTO finding the optimal betsizes? Not just plugging in several fixed sizes yourself and building the game tree from there?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 11:35 PM
Pretty sure manually adjusting bot's strategy mid-challenge invalidates the study as the same Bot didn't play all the hands. That's Stats 101. And in a practical sense, cheating since again the same Bot did not play all the hands as CMU cheated by tricking Humans into thinking they were always playing Claudico when in reality they were up against Claudico Vx.y with opaque enhancements.

You would need a separate controlled sample for each manually altered version of the Bot. Different thing entirely if Bot itself makes these adjustments with no human intervention but maybe these weak study rules also follow from the Bot vs Bot competition. I don't know. Whatever, they are statistically invalid either way!

Conclusion: CMU only did this manual intervention invalidating the study because they were concerned about being embarrassingly totally crushed otherwise.

Secondly, I stand by my assertion that it was a reasonable Alternate Hypothesis that Humans were better than the Bot apriori. (After all this was precisely why Bots had not previously challenged Humans at NLHE because it was believed Bots were sig worse.) Hence a one-sided significance test was more appropriate for this challenge. As the Humans were statistically significant at 90% two-sided level, Humans were stat sig at 95% one-side level and did in fact stat sig win.

Anecdotally, I'm sure human fatigue was a factor of practical significance that strongly suggests human team under normal grinding conditions (eg with hud and ~1.5 hour sessions and sufficient rest between sessions for total challenge) humans would have eeked out the extra ~1BB or so needed for a two-sided 95% significance test.

Next time, I hope the Humans negotiate terms better and learn more about Bot play or else refuse challenge. For example, prepare specifically for playing the specifc Bot algorithms they are up against. As Cheet said he found it difficult not to treat Bot as Human. As an example, on the river with say bottom pair, Humans can bet just less than 10% pot (more than minbet) and get paid off as Claudico treats it as check and calls 100%. Either Bot algorithms should be transparent to Human team or Humans should get some play time prior to challenge to figure out exploits. These exploits are always going to be there because Bot is forced by game complexity to have bet abstraction so it is always a matter of finding the max exploit points in the bet abstraction. These should either be told transparently to the humans or else there should be an opportunity for the Humans (and next time have Bot-knowledgeable assistants as part of the Human team) to teach the Human team the Bot weaknesses before the challenge proper begins and as Bot adapts during challenge. Also, manual intervention to adjust Bot's algorithms should be specifically forbidden during the whole challenge. Otherwise Humans should have the right to arbitrarily call extra rest days for as long as they want because that's the Human equivalent of such cheating by the CMU team.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 11:53 PM
Quote:
Originally Posted by tenderloinig
Isn't part of GTO finding the optimal betsizes? Not just plugging in several fixed sizes yourself and building the game tree from there?
Theoretically you are right but in NLHE the game tree is too big to solve this way so instead several fixed sizes are used to simplify the game tree and then solved. This is kind of what Humans do, too since Humans have to simplify bet-sizes because the game is too hard otherwise. The computer is just better than Humans at calcs but still nowhere near good enough to do the impossible level of calcs for a practically infinite game tree.

Technically, a Nash Equilibrium is found not for NLHE but for the simplified game (call it NLHE') resulting from bet-sizing abstraction. CMU then uses a bet-size averaging technique to interpolate responses to Human bets that differ from the Bot's fixed training.

So CMU has a GTO solution for NLHE' not NLHE.

Hence it is a fundamental principle of any Human vs Bot challenge to do prior study of the Bot's algorithm's and max exploit it's deviations from full NLHE by, for example, using "funny" bet-sizes deliberately targeted at the Bot's fuzzy points. Humans should bet bigger for value and smaller with bluffs at the precise turning points of the Bot's bet-sizing abstractions. This would max-exploit the Bot's weakness in not being GTO but rather being GTO for NLHE'.

By the way, it is commonly expected that true GTO solution for NLHE would entail many different "funny" bet-sizes in different spots as ranges vary by texture so it is not unlikely that in some spots a true GTO solution might want to use say 6 different bet-sizes with 10 different parts of its range (6 was arbitrarily chosen just for illustration but way more than one or two betsizes).

Hence a Claudico based on bet-size abstraction can never be called GTO for NLHE but only ever GTO for NLHE'.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 11:58 PM
Sam/Noam
What are the max exploit weaknesses exposed by card bucketing? That is in what way does the card bucketing simplifications to create GTO for NLHE' make the Bot susceptible to exploitation in full NLHE?

For example with bucketing K2/K3/K4 the 2 or 3 or 4 could become significant with back-door draws on different runouts? Once bucketed on an earlier street, is a turned draw not seen by the Bot or is there a re-bucketing on each street depending on runout?

Last edited by TimTamBiscuit; 05-11-2015 at 12:03 AM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:12 AM
Quote:
Originally Posted by TimTamBiscuit
Pretty sure manually adjusting bot's strategy mid-challenge invalidates the study as the same Bot didn't play all the hands. That's Stats 101. And in a practical sense, cheating since again the same Bot did not play all the hands as CMU cheated by tricking Humans into thinking they were always playing Claudico when in reality they were up against Claudico Vx.y with opaque enhancements.
I do think it's rather strange to reprogram the bot during the competition. No true treatment group. I'd be ok with optimization that dint affect strat, like reducing he river tank time. But to change strat midcompetiton is a research design flaw.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:12 AM
Sam,
Why is the AI's river calc so slow when there are a few publicly licensed GTO' river solution tools that are effectively instant using much, much less computing power?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:16 AM
Another Human negotiated condition I'd like to see in a future challenge is a time limit for decisions either like on online sites where you get to save up a time bank for use on tough spots or like at some live tournaments where the opponent can call time and the AI's hand get's autofolded if it fails to decide in 10 seconds.

The AI's long river timeouts especially on trivial decisions was very tilting watching the stream let alone playing against it.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:27 AM
I'd be very interested in seeing Claudico's ranges for pre-flop wars. I'll bet they can be easily exploited if we knew what they were.

For example, change bet-sizes strategically to exploit Claudico's bet-size abstraction to make it more correct to call rather than raise or vice versa interacting with Claudico's range choice of more or less polarized/linear to put the Bot post-flop into tough spots where its chosen range is sub-optimal.

In other words, abusing its betsize abstraction to force it into suboptimal range choices leading to suboptimal postflop scenarios.

As a comparison publicly licensed AI PokerSnowie is terrible at this for the same reasons.

I cannot see how you can get anywhere near a GTO solution without a lot more bet-sizes (increments of 1BB in 3B sizing across a decently wide range!) pre-flop without exposing significant max exploit opportunities versus a Nemesis strategy that knows Bot algorithms well. I suspect that many more buckets are needed pre-flop than postflop.

Last edited by TimTamBiscuit; 05-11-2015 at 12:42 AM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:36 AM
Quote:
Originally Posted by boredoo
I do think it's rather strange to reprogram the bot during the competition. No true treatment group. I'd be ok with optimization that dint affect strat, like reducing he river tank time. But to change strat midcompetiton is a research design flaw.
I think this was not a true study but rather a PR exercise masquerading as a study.

In a true study you strive to eliminate systematic variance so as to spotlight the variables you are studying unaffected by variance due to uncontrolled variables you should hold constant.

Bot strategy CANNOT be manually changed during study or else it invalidates study.

Similarly, the Human fatigue as the challenge progressed invalidates the study. Conditions should have been such that Humans were just as fresh for each session.

That is, conditions should be invariant for Humans and AI. This is just as fundamental as each hand always beginning afresh with 200BB.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote

      
m