WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot - Page 61 - Poker News

I guess I'll hop in here and answer a few questions and clarify a few points too.

First, the 95% threshold was not chosen arbitrarily. That's the standard generally used in science, and the standard used in the ACPC. I personally would have avoided the term "statistical tie" which does NOT mean the same thing as "tie", and would have instead opted for the equivalent term "not statistically significant", but I didn't write the press release.

Earlier I mentioned that the pros would have needed to win by 10.35 BB/100 for statistical significance. This does correctly account for the hands being mirrored. We knew going into this that getting statistical significance would be tough, but it was definitely possible for one side or the other to reach that threshold. Based on experiments, we estimated going into the competition that the win rate would need to be ~8.5 BB/100. However, it's impossible to say what the needed threshold is until after the competition ends, because it depends on how the pros and the bot play. The bot and humans ended up playing very aggressively, and very differently, so that likely pushed up the variance.

Equity chopping did reduce the variance (there was some debate if this would actually be the case, since mirrored hands can play out very differently when there's an equity chop), but not by much. Without the equity chop, the needed threshold would have been 10.72 BB/100 (and, interestingly, the pros' overall win rate would have dropped to 7.0 BB/100).

I think anyone that looks at this objectively will recognize we did everything we could to achieve statistical significance: 4 humans, mirrored hands, equity chopping, and as many hands as the humans could possibly play. That said, we recognize there are things we could improve if there's a next time (more humans, multitabling, and maybe even more variance-reduction techniques).

200BB was chosen for this competition and as the ACPC format because it is more challenging for bots due to the much larger game tree. We trained new strategies for this competition, and we could have easily switched them to be in 100BB format. We didn't pick 200BB to make this easy on ourselves, just as we didn't pick some big-name poker players who are not so great in heads up no-limit to make this easy on ourselves.

I'll also say that I think the humans had a stronger edge than the statistics reveal. This is from me personally watching the humans and bot play over the past two weeks, and can't be captured in the statistics which just look at the scores for each hand. There are definitely weaknesses in the bot, and they are weaknesses that can't be fixed with just more memory or more cores. As a researcher, that's very exciting, because it means we need new approaches to address the unique challenges of no-limit games. The next couple years will be very busy for me!

For those of you that are interested, here's the breakdown of the scores per session (usually 750-800 hands):

Spoiler:

Quote

05-10-2015 , 04:10 PM

#1502

Rusemandingo

Carpal \'Tunnel

Join Date: Oct 2008 Posts: 14,067

Quote:

total amount wagered over the two weeks in theoretical dollars was $170 million

Gross cashes; it's not just for MTTs anymore.

Good job humans. Between all the distractions and boredom it was a very impressive showing. Pretty disgusting how those involved on the other sideare representing the results. Agree with others that if results had been reversed the message would be a tad different. Id suggest not having anything to do with them in the future.

Quote

05-10-2015 , 04:10 PM

#1503

WCGRider

Carpal \'Tunnel

Join Date: Apr 2007 Posts: 7,026

Quote:

Originally Posted by watergun7

Doug I know it's strat question but I hope you can answer since it's pretty simple (and doesn't apply for human play).

Why were you raising limps with almost 100% of your range? Claudico should be fairly balanced with its limps I thought (obviously you have the hhs and can look through the database). Raising limps 100% is exploitative- so has Claudico adjusted over time to the 100% iso in your opinion?

It played pretty weak vs the raise.

Quote

05-10-2015 , 04:16 PM

#1504

SretiCentV

Pooh-Bah

Join Date: Oct 2006 Posts: 3,695

Quote:

Originally Posted by WCGRider

We got the hands after each day, for that day, and could import them and review. However a real-time hud would have been way better, this is the first time in easily 5-6 years I had to play "blind".

Was there some kind of stipulation that you couldn't use a HUD or was it just the fact that no such HUD exists that prevented you from using one?

If you guys play them again next year I feel like it's something that needs to be present (call me

)

Quote

05-10-2015 , 04:20 PM

#1505

WCGRider

Carpal \'Tunnel

Join Date: Apr 2007 Posts: 7,026

It wasnt going to work given the time constraints on getting everything together. Obviously a hud would make this a lot better to play with.

Quote

05-10-2015 , 04:21 PM

#1506

restorativejustice

veteran

Join Date: Aug 2014 Posts: 2,753

Doug,

I had posted this before, but since I now see you are reading the thread I will do it again: You came across very well in the stream, both on your own behalf as a personable and upbeat person and as a representative of those that play poker for reasons over and above the gamble.

After watching all of your streams at one time or another, you and your team would have been winners in my book even if the Event had not ended in a +$9/hand over 80,000 hands "tie."

Quote

05-10-2015 , 04:37 PM

#1507

TouchOfEVil

veteran

Join Date: Aug 2012 Posts: 3,134

great response by noam

Quote

05-10-2015 , 05:34 PM

#1508

WCGRider

Carpal \'Tunnel

Join Date: Apr 2007 Posts: 7,026

Quote:

Originally Posted by restorativejustice

Thanks I appreciate the support

Quote

05-10-2015 , 05:45 PM

#1509

mcb08

grinder

Join Date: Nov 2008 Posts: 434

Doug, I couldn't tell because you had glasses on, but were your eyes rolling when you tried to explain the 99 vs A4 aipf play to the professor?

Quote

05-10-2015 , 07:10 PM

#1510

Trife

journeyman

Join Date: Nov 2011 Posts: 257

Quote:

you and your team would have been winners in my book even if the Event had not ended in a +$9/hand over 80,000 hands "tie.

+1 Huge respect Doug

Quote

05-10-2015 , 07:34 PM

#1511

jaytorr

centurion

Join Date: May 2003 Posts: 127

Quote:

Originally Posted by NoamBrown

I think anyone that looks at this objectively will recognize we did everything we could to achieve statistical significance: 4 humans, mirrored hands, equity chopping, and as many hands as the humans could possibly play. That said, we recognize there are things we could improve if there's a next time (more humans, multitabling, and maybe even more variance-reduction techniques).

Quote:

Originally Posted by Sam Ganzfried

For the flop minbet example, he bets 100 into pot of 500. So x = 0.2. The closest actions we have are A = 0 (for check) and B = 0.25. Plugging these in to the formula give f(x) = 1/6 = 0.167. This is the probability we map his bet down to 0 and interpret it as a check. So we pick a random number in [0,1], and if it's above 1/6 we interpret the bet as 0.25 pot, and otherwise as a check.

Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?

Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?

Quote

05-10-2015 , 08:42 PM

#1512

lostinthesaus

old hand

Join Date: Feb 2011 Posts: 1,520

Quote:

Originally Posted by jaytorr

Noam and Sam,

As far as variance-reduction techniques, have you considered equity-chopping pots based on Claudico's entire range at showdown? If you saved Claudico's strategy for each session, this could be easily computed and reduce variance considerably, perhaps more so than the duplicate matches?

Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?

Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?

G Bucks ITT

Quote

05-10-2015 , 08:45 PM

#1513

Sam Ganzfried

journeyman

Join Date: Oct 2014 Posts: 224

Quote:

Originally Posted by jaytorr

As far as variance-reduction techniques, have you considered equity-chopping pots based on Claudico's entire range at showdown? If you saved Claudico's strategy for each session, this could be easily computed and reduce variance considerably, perhaps more so than the duplicate matches?

There are a lot of variance-reduction techniques we'd like to try, such as the one you describe. We decided to not use anything beyond duplicate/AIEV for the competition, since it would be impossible for anyone else to verify the accuracy of more sophisticated approaches that depend on looking at our own strategies.

Quote:

Originally Posted by jaytorr

Does this explain that hand where the bot called a river minbet with 5-hi; it interpreted the minbet as a check and didn't fold any part of its range?

I'd need to see the specific hand, but yes it's likely the opponent bet a size below the lowest size in our abstraction and we happened to map it down to check, and so weren't allowed to fold. We had a 10% pot size for the opponent for the river during most of the match, which should have protected us pretty well against this problem. The hand you have in mind likely occurred when we didn't have the 10% size in.

Quote:

Originally Posted by jaytorr

Would also love to hear more about the abstraction and opponent modeling techniques. Which innovations made this bot much better than Tartanian7?

Claudico didn't use much "opponent modeling" per se, though I've worked on opponent modeling in the past for other projects (see AAMAS-11 and TEAC-15 papers on my website). We did make several modifications throughout the competition that were intended to plug some leaks, and possibly counter-exploit some of the humans' exploitative tendencies, and a couple times we used different versions against different humans.

The card abstraction and equilibrium-finding algorithms were the same as for Tartanian7, http://www.cs.cmu.edu/~sganzfri/Tartanian7_AAMAS15.pdf. We decided to switch to an "asymmetric betting abstraction," where we limited the number of bet sizes for ourselves for first preflop action -- we just had fold, limp, 2.5x (for Tartanian7 we also had 2x, 3x, and some other bigger ones) -- while keeping in a lot of sizes for the opponent in case the humans decided to use them. This cut down on the size of the game tree pretty significantly, and I think led to better convergence.

We also incorporated real-time computation for the river that we didn't include in Tartanian7 due to some last-minute issues right before the computer competition deadline, which I think helped a lot. That approach is described in detail here, http://www.cs.cmu.edu/~sganzfri/Endgame_AAMAS15.pdf. Those were the main differences, and there were also several less major ones maybe I'll describe more later.

Quote

05-10-2015 , 09:13 PM

#1514

RonMexico

Pooh-Bah

Join Date: Feb 2006 Posts: 4,323

Can we please see the calculation for the standard error?

Quote

05-10-2015 , 10:38 PM

#1515

mythrilfox

veteran

Join Date: May 2004 Posts: 3,340

oh, oh, ok, about claudico's betsize bucketing. when creating the interface for the challenge, it seems pretty likely to me that the developers at least discussed what %pot shortcuts were used as they related to claudico's betsizing abstraction. perhaps it's even possible the developers put in the %pot shortcuts strictly to make it more likely that humans would use sizes present on claudico's abstracted tree, which is in their interest - the more humans deviate from claudico's abstracted tree, the more mistakes claudico makes.

humans should have been able to figure this out beforehand and choose sizes that fell directly in the middle of claudico's sizes in an attempt to exploit errors arising from bucketing sizes. (at the very least, after that 5 high call it should be obvious that claudico's bucketing sizes) also it's quite likely claudico's sizes correspond to the %pot shortcuts, so we'd know where to aim. if we see %pot shortcuts for .5pot and .75pot, claudico's abstraction likely includes thses sizes, so choose a betsize of .625pot. I wasn't paying enough attention though, maybe they did stuff like this?

Quote

05-10-2015 , 11:09 PM

#1516

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Doug
Congrats on the challenge. As others have complimented, you personally came across very well particularly on the streams in the casino chatting with interested patrons.

How tilting was the lengthy river delay by the Bot? I would have been sorely tempted to shove more turns just to stop the ****ing thing from taking so long. LOL.

Quote

05-10-2015 , 11:14 PM

#1517

tenderloinig

adept

Join Date: Dec 2013 Posts: 803

Isn't part of GTO finding the optimal betsizes? Not just plugging in several fixed sizes yourself and building the game tree from there?

Quote

05-10-2015 , 11:35 PM

#1518

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Pretty sure manually adjusting bot's strategy mid-challenge invalidates the study as the same Bot didn't play all the hands. That's Stats 101. And in a practical sense, cheating since again the same Bot did not play all the hands as CMU cheated by tricking Humans into thinking they were always playing Claudico when in reality they were up against Claudico Vx.y with opaque enhancements.

You would need a separate controlled sample for each manually altered version of the Bot. Different thing entirely if Bot itself makes these adjustments with no human intervention but maybe these weak study rules also follow from the Bot vs Bot competition. I don't know. Whatever, they are statistically invalid either way!

Conclusion: CMU only did this manual intervention invalidating the study because they were concerned about being embarrassingly totally crushed otherwise.

Secondly, I stand by my assertion that it was a reasonable Alternate Hypothesis that Humans were better than the Bot apriori. (After all this was precisely why Bots had not previously challenged Humans at NLHE because it was believed Bots were sig worse.) Hence a one-sided significance test was more appropriate for this challenge. As the Humans were statistically significant at 90% two-sided level, Humans were stat sig at 95% one-side level and did in fact stat sig win.

Anecdotally, I'm sure human fatigue was a factor of practical significance that strongly suggests human team under normal grinding conditions (eg with hud and ~1.5 hour sessions and sufficient rest between sessions for total challenge) humans would have eeked out the extra ~1BB or so needed for a two-sided 95% significance test.

Next time, I hope the Humans negotiate terms better and learn more about Bot play or else refuse challenge. For example, prepare specifically for playing the specifc Bot algorithms they are up against. As Cheet said he found it difficult not to treat Bot as Human. As an example, on the river with say bottom pair, Humans can bet just less than 10% pot (more than minbet) and get paid off as Claudico treats it as check and calls 100%. Either Bot algorithms should be transparent to Human team or Humans should get some play time prior to challenge to figure out exploits. These exploits are always going to be there because Bot is forced by game complexity to have bet abstraction so it is always a matter of finding the max exploit points in the bet abstraction. These should either be told transparently to the humans or else there should be an opportunity for the Humans (and next time have Bot-knowledgeable assistants as part of the Human team) to teach the Human team the Bot weaknesses before the challenge proper begins and as Bot adapts during challenge. Also, manual intervention to adjust Bot's algorithms should be specifically forbidden during the whole challenge. Otherwise Humans should have the right to arbitrarily call extra rest days for as long as they want because that's the Human equivalent of such cheating by the CMU team.

Quote

05-10-2015 , 11:53 PM

#1519

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Quote:

Originally Posted by tenderloinig

Isn't part of GTO finding the optimal betsizes? Not just plugging in several fixed sizes yourself and building the game tree from there?

Theoretically you are right but in NLHE the game tree is too big to solve this way so instead several fixed sizes are used to simplify the game tree and then solved. This is kind of what Humans do, too since Humans have to simplify bet-sizes because the game is too hard otherwise. The computer is just better than Humans at calcs but still nowhere near good enough to do the impossible level of calcs for a practically infinite game tree.

Technically, a Nash Equilibrium is found not for NLHE but for the simplified game (call it NLHE') resulting from bet-sizing abstraction. CMU then uses a bet-size averaging technique to interpolate responses to Human bets that differ from the Bot's fixed training.

So CMU has a GTO solution for NLHE' not NLHE.

Hence it is a fundamental principle of any Human vs Bot challenge to do prior study of the Bot's algorithm's and max exploit it's deviations from full NLHE by, for example, using "funny" bet-sizes deliberately targeted at the Bot's fuzzy points. Humans should bet bigger for value and smaller with bluffs at the precise turning points of the Bot's bet-sizing abstractions. This would max-exploit the Bot's weakness in not being GTO but rather being GTO for NLHE'.

By the way, it is commonly expected that true GTO solution for NLHE would entail many different "funny" bet-sizes in different spots as ranges vary by texture so it is not unlikely that in some spots a true GTO solution might want to use say 6 different bet-sizes with 10 different parts of its range (6 was arbitrarily chosen just for illustration but way more than one or two betsizes).

Hence a Claudico based on bet-size abstraction can never be called GTO for NLHE but only ever GTO for NLHE'.

Quote

05-10-2015 , 11:58 PM

#1520

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Sam/Noam
What are the max exploit weaknesses exposed by card bucketing? That is in what way does the card bucketing simplifications to create GTO for NLHE' make the Bot susceptible to exploitation in full NLHE?

For example with bucketing K2/K3/K4 the 2 or 3 or 4 could become significant with back-door draws on different runouts? Once bucketed on an earlier street, is a turned draw not seen by the Bot or is there a re-bucketing on each street depending on runout?

Last edited by TimTamBiscuit; 05-11-2015 at 12:03 AM.

Quote

05-11-2015 , 12:12 AM

#1521

boredoo

journeyman

Join Date: Jun 2013 Posts: 306

Quote:

Originally Posted by TimTamBiscuit

I do think it's rather strange to reprogram the bot during the competition. No true treatment group. I'd be ok with optimization that dint affect strat, like reducing he river tank time. But to change strat midcompetiton is a research design flaw.

Quote

05-11-2015 , 12:12 AM

#1522

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Sam,
Why is the AI's river calc so slow when there are a few publicly licensed GTO' river solution tools that are effectively instant using much, much less computing power?

Quote

05-11-2015 , 12:16 AM

#1523

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Another Human negotiated condition I'd like to see in a future challenge is a time limit for decisions either like on online sites where you get to save up a time bank for use on tough spots or like at some live tournaments where the opponent can call time and the AI's hand get's autofolded if it fails to decide in 10 seconds.

The AI's long river timeouts especially on trivial decisions was very tilting watching the stream let alone playing against it.

Quote

05-11-2015 , 12:27 AM

#1524

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

I'd be very interested in seeing Claudico's ranges for pre-flop wars. I'll bet they can be easily exploited if we knew what they were.

For example, change bet-sizes strategically to exploit Claudico's bet-size abstraction to make it more correct to call rather than raise or vice versa interacting with Claudico's range choice of more or less polarized/linear to put the Bot post-flop into tough spots where its chosen range is sub-optimal.

In other words, abusing its betsize abstraction to force it into suboptimal range choices leading to suboptimal postflop scenarios.

As a comparison publicly licensed AI PokerSnowie is terrible at this for the same reasons.

I cannot see how you can get anywhere near a GTO solution without a lot more bet-sizes (increments of 1BB in 3B sizing across a decently wide range!) pre-flop without exposing significant max exploit opportunities versus a Nemesis strategy that knows Bot algorithms well. I suspect that many more buckets are needed pre-flop than postflop.

Last edited by TimTamBiscuit; 05-11-2015 at 12:42 AM.

Quote

05-11-2015 , 12:36 AM

#1525

TimTamBiscuit

veteran

Join Date: Oct 2007 Posts: 2,134

Quote:

Originally Posted by boredoo

I think this was not a true study but rather a PR exercise masquerading as a study.

In a true study you strive to eliminate systematic variance so as to spotlight the variables you are studying unaffected by variance due to uncontrolled variables you should hold constant.

Bot strategy CANNOT be manually changed during study or else it invalidates study.

Similarly, the Human fatigue as the challenge progressed invalidates the study. Conditions should have been such that Humans were just as fresh for each session.

That is, conditions should be invariant for Humans and AI. This is just as fundamental as each hand always beginning afresh with 200BB.

Quote

Page 61 of 65

First

11 41 51 56 57 58 59 60 61 62 63 64 65

Last

Post Reply Subscribe

...

Page 61 of 65

First

11 41 51 56 57 58 59 60 61 62 63 64 65

Last