Two Plus Two Publishing LLC Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > >

News, Views, and Gossip For poker news, views and gossip.

Reply
 
Thread Tools Display Modes
Old 05-09-2015, 06:08 PM   #1426
Sam Ganzfried
centurion
 
Join Date: Oct 2014
Posts: 187
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by punter11235 View Post
How do you know how it does empirically if you don't measure how well they do at what they are supposed to be doing?
Quote:
Originally Posted by samooth View Post
so you have an equilibrium-finding algorithm for an abstraction that uses imperfect-recall and thus the algorithm has no guarantee of converging to Nash, but you are confident it does because it works well "empirically"? and with empirically you refer to what exactly?

how can you not see the massive disconnect in that claim when you haven't measured how well the approximation process works?
This paper shows that imperfect-recall abstractions improve performance over perfect-recall ones, http://webdocs.cs.ualberta.ca/~bowli...bstraction.pdf. The analysis uses both exploitability and head-to-head performance metrics. Those results were for limit Texas Hold'em, but based on personal communication with the authors similar improvements have been observed in no-limit also with respect to the head-to-head performance metric. Since computing exploitability in no-limit is currently intractable, there are no results for that metric in that domain yet.
Sam Ganzfried is offline   Reply With Quote
Old 05-09-2015, 06:32 PM   #1427
Sam Ganzfried
centurion
 
Join Date: Oct 2014
Posts: 187
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by mythrilfox View Post
samooth, you're missing the point. obviously there's value in computing exploitability and sam never said otherwise. the question is what value is there in computing exploitability using an abstraction. if we're exploitable for 5 bb/100 using an abstracted tree, what does that translate to in terms of exploitability using the full game tree? i'm guessing from sam's comments that meaningful translation of this sort is nontrivial if not impossible.

punter never really addressed this and instead asked what's the value in creating a strategy without being able to assess how far from equilibrium it is, which imo is a silly question. of course there's value. if, for instance, your algorithm generates successive strategies such that strategy n beats all strategies 1 to n-1, then you're still getting closer to equilibrium with each successive iteration without ever knowing how far you are from it, and inevitably you will eventually arrive at equilibrium with enough iterations.

this relies on testing the value of the strategies empirically (i.e. by actually playing bajillions of hands), so as you got far enough along and started generating successive strategies that were extremely close to equilibrium, the edges would be so small that you would never be able to peg down precisely where equilibrium was. that doesn't mean there still isn't immense value in getting close to equilibrium, or at the very least demonstrably improving. it would be like saying we can't measure how far we are from a complete understanding of the laws of physics, so what value is there in even trying to understand them in the first place? literally all of science is measuring how a method/model compares to our current understanding & how well it performs empirically, not some sort of ground-up value assessment which would require complete omniscience of everything in the universe. generally speaking we only have the luxury of ground-up assessment in small and controlled systems that we humans created.
I definitely think that computing exploitability is important. Many of my papers use exploitability calculations whenever applicable/feasible, e.g., http://www.cs.cmu.edu/~sganzfri/Translation_IJCAI13.pdf, http://www.cs.cmu.edu/~sganzfri/Puri...on_AAMAS12.pdf.

The first approach for computing exploitability in LHE was just developed a few years ago, and involved sophisticated techniques to be feasible, http://webdocs.cs.ualberta.ca/~bowli...ijcai-rgbr.pdf. There are no known approaches for doing this in NL. I suppose the techniques from that paper could be applied for computing exploitability in NL if restricted to one betting size everywhere. Researchers from University of Alberta presented preliminary results of doing this at the AAAI Computer Poker and Imperfect Information Workshop back in 2012 or 2013 on their prior year's NL competition strategies. The exploitability that was reported was extremely high. They did not end up publishing this result, and when I followed up they were still looking into verifying the results to ensure correctness.

It seems like this approach would be totally infeasible for computing exploitability with more than one bet size.

Some posters have talked about computing estimates of exploitability, e.g., fixing preflop strategies and computing "postflop exploitability," computing exploitability within some abstraction, etc. I'm just not sure what the merit of such a value would be. If I say "I computed approximate exploitability using abstraction A and assumptions X, Y, and Z and it was 27.2 BB/100," there are probably two people in the world who would care.

Like I mentioned, I'm also not really sure how feasible these computations would be for strategies based on imperfect-recall abstractions.

It's possible that there is something interesting here, but I just haven't had much time to look into this problem very carefully, and as described above my instinct is that it would not be very valuable scientifically at this point (and would possibly require a lot of time and effort).

What I do think would be valuable would be an efficient general algorithm for computing best responses in games with imperfect recall, which would not be poker specific and could theoretically have broader applications. If I'm able to come up with an algorithm for that, then I would be interested in applying it to NLHE.

Last edited by Sam Ganzfried; 05-09-2015 at 07:01 PM.
Sam Ganzfried is offline   Reply With Quote
Old 05-09-2015, 06:40 PM   #1428
Keruli
adept
 
Keruli's Avatar
 
Join Date: Jul 2009
Posts: 703
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

''That sounds like a lot, but over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can't be considered a statistically significant victory.

But a tie is significant in and of itself: It suggests that Claudico is at least a match for human players, at least one-on-one and in this particular game.''

from the NBCnews article. wow...
Keruli is offline   Reply With Quote
Old 05-09-2015, 06:46 PM   #1429
Sam Ganzfried
centurion
 
Join Date: Oct 2014
Posts: 187
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by 200zoomgrinder View Post
WOW at tie. I thought Sam was saying that they would accept 95% confidence.
I did not make any comments regarding what confidence interval to use, whether it was a "tie," etc. If other people from CMU or the media want to claim it was a "statistical tie" that is their prerogative.
Sam Ganzfried is offline   Reply With Quote
Old 05-09-2015, 06:49 PM   #1430
RussianRoulette
newbie
 
Join Date: Dec 2014
Posts: 45
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Absolutely hilarious how this was spun as a tie by the CMU team... Oh well, I guess it doesn't truly matter in the long run as computers will inevitably catch up.
RussianRoulette is offline   Reply With Quote
Old 05-09-2015, 06:50 PM   #1431
eaglesfaan
grinder
 
Join Date: Oct 2012
Location: Cali
Posts: 456
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by Keruli View Post
''That sounds like a lot, but over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can't be considered a statistically significant victory.
This is just pathetic...
eaglesfaan is offline   Reply With Quote
Old 05-09-2015, 07:07 PM   #1432
ArtyMcFly
Carpal \'Tunnel
 
ArtyMcFly's Avatar
 
Join Date: Dec 2014
Location: Enchantment Under the Sea
Posts: 13,251
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Inb4 the professor says "I'm 95% confident that every hand is fifty-fifty".
ArtyMcFly is offline   Reply With Quote
Old 05-09-2015, 07:20 PM   #1433
Sam Ganzfried
centurion
 
Join Date: Oct 2014
Posts: 187
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by samooth View Post
Come on, man.

We had this thread in late September last year dedicated to Tartanium7: https://forumserver.twoplustwo.com/29...25/index5.html. In this thread, several posters pointed out that -- since the bot is playing a static strategy that resembles your best approximation of a NE in 200bb deep hunl -- an exploitability number or a lower bound of expl. is not only useful but imperative to measure how well the bot/strategy actually is. If you go to the last page of this thread, you see that it died with this exact discussion going on and was left unaddressed, eventually.

After a 6 month posting hiatus, you come back and post



Together with you questioning the "scientific value" of exploitability, this post indicates that for some reason, you and your team are resisting to understand the importance of the issue. punter11235 has made a lot of good posts explaining exploitability and its importance -- it's the most natural test, and it lets you/us actually quantify how well the approximation is rather than relying on relative benchmarks (other bots/humans) with results also being affected by variance. The fact that you beat other bots and that you (probably) lost with Claudico at a 9ish bb/100 rate really tells us nothing other than the obvious (x is better than y over z hands).

I'm not coming here to bring any hate, I have congratulated you on Tartanian7 and I really enjoyed following this challenge as well. It is simply inconceivable for me that you and your team are apparently resisting to understand the entire concept. It becomes even more inconceivable after reading



How are you going to publish papers based on these poker bots without giving out that number? If the answer is "the referees don't demand it", then it is in line with my own experience in academia (although from an unrelated field). meh

Also, this question still stands:



In contrast however, you have quoted the Cepheus bot and its exploitability number, implying you understand its usefulness -- which brings me to my last point:



You absolutely cannot recommend any technique to other domains like national security or medicine without measuring how well it actually works, so as long as you are not giving out that number...

Looking forward to your reply and all infos on how Claudico works.
Addressed your questions in above post.

I stopped posting in the prior thread because I was being harassed about the identity of the developers of a program we played against. It looks like you tacked on a big post at the end there.

Approaches from research on poker have already been successfully applied to other applications, e.g., robust policies for diabetes management, http://webdocs.cs.ualberta.ca/~bowli...2nips-kofn.pdf.
Sam Ganzfried is offline   Reply With Quote
Old 05-09-2015, 07:32 PM   #1434
200zoomgrinder
journeyman
 
Join Date: Jan 2014
Posts: 366
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

@Sam: Apologies, it was noambrown.
200zoomgrinder is offline   Reply With Quote
Old 05-09-2015, 07:53 PM   #1435
Sam Ganzfried
centurion
 
Join Date: Oct 2014
Posts: 187
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by samooth View Post
so you have an equilibrium-finding algorithm for an abstraction that uses imperfect-recall and thus the algorithm has no guarantee of converging to Nash, but you are confident it does because it works well "empirically"? and with empirically you refer to what exactly?

how can you not see the massive disconnect in that claim when you haven't measured how well the approximation process works?

One other point I'd like to add in addition to my above posts is that sometimes it's very hard to prove theoretical guarantees for approaches for hard problems. E.g., there are no bounds for any of the abstraction algorithms for bucketing hands together. It's pretty much impossible to prove anything theoretically when mapping 10^160 states down to 10^14. As far as I know the only "lossy" abstraction algorithm with a provable bound is from colleagues at CMU, and scales only to a poker game with a 5-card deck, http://www.cs.cmu.edu/~ckroer/papers...4-extended.pdf.

It's valuable to prove theoretical bounds on approaches, but it's also valuable to come up with agents for hard problems, like NLHE, which might involve some heuristics/approximations without provable guarantees/bounds.

It's pretty common to present theoretical analysis just for smaller domains, and then empirical results for larger "real-world" ones. E.g., a lot of poker papers present exploitability calculations on small variants like Leduc Hold'em as a proof of concept, but head-to-head empirical results for full NLHE, e.g., http://www.cs.cmu.edu/~sganzfri/Translation_IJCAI13.pdf, http://www.cs.cmu.edu/~kwaugh/publications/nips09.pdf.

Hopefully the theoretical bounds and the scalable heuristic approaches will eventually meet somewhere, but this is not always possible.

Last edited by Sam Ganzfried; 05-09-2015 at 08:22 PM.
Sam Ganzfried is offline   Reply With Quote
Old 05-09-2015, 08:09 PM   #1436
cheet
veteran
 
cheet's Avatar
 
Join Date: Jan 2006
Posts: 2,498
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

I understand where the CMU public relations team is coming from when they call this a "statistical tie". However, I feel like the degree to which that (and additionally the "win vs money wagered" comparison used) has been pushed in the press release that was distributed to media outlets went too far and is being used to color this more favorably for their self-interest at the expense of the human's accomplishment here.

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
cheet is offline   Reply With Quote
Old 05-09-2015, 08:12 PM   #1437
cheet
veteran
 
cheet's Avatar
 
Join Date: Jan 2006
Posts: 2,498
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Also I recorded this for 2p2 because I want to share what I thought was a great talk by Doug at the closing press conference of the challenge. He outlined everything quite well and I agree with everything he had to say.

https://www.youtube.com/watch?v=L10zaLtUabY
cheet is offline   Reply With Quote
Old 05-09-2015, 08:13 PM   #1438
The4thFilm
Carpal \'Tunnel
 
The4thFilm's Avatar
 
Join Date: Jul 2004
Posts: 13,472
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by cheet View Post
I understand where the CMU public relations team is coming from when they call this a "statistical tie". However, I feel like the degree to which that (and additionally the "win vs money wagered" comparison used) has been pushed in the press release that was distributed to media outlets went too far and is being used to color this more favorably for their self-interest at the expense of the human's accomplishment here.

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
A University should be held to a higher standard than a beer commercial. Claiming a tie is straight up academic dishonesty/fraud and they should be sanctioned if not fired.
The4thFilm is offline   Reply With Quote
Old 05-09-2015, 08:27 PM   #1439
Wasp
enthusiast
 
Join Date: Feb 2010
Posts: 70
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by tenderloinig View Post
I'm sure you say that in reverse. At the beginning of this you guys said 95% would be considered significant.
If Barcelona beats Bayern at 4:3 in football that would be a tie too. Because it is not statistically significant.

Moreso when Klitchko K.o-ed Pulev in boxing was statistically insignificant. That should be a tie too.

I suggest to poker media to do not spread the lie about "tie". Only tell the truth:

Facts:
- 80k mirrored and eq chopped hands has been played,
- 7320 blinds won by Brains team,
- more than 9bb/100 winrate against Claudico
- CMU team still consider a p=92% win, and a 9bb/100 loosing as a "tie" which is a joke for any poker player.

(Let me be demagogue a little: any of us would play "This Was A Tie" (sic. professor said this) game life long at high stakes. For example this game means at 200 hands/hour at nl2k, the human player can make 360 usd/hour pre rake hourly rate, which is a nice salary and Claudico would make -3M USD per year against best human players at two table yearly. pre rake this too ofc... It is stupid calculation, but telling lies to poker players about simple results is stupid thing too.)
Wasp is offline   Reply With Quote
Old 05-09-2015, 08:47 PM   #1440
Whirlwind
stranger
 
Join Date: May 2007
Posts: 9
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by NoamBrown View Post
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
Would you please post the math for the 95% confidence interval?
Whirlwind is offline   Reply With Quote
Old 05-09-2015, 09:07 PM   #1441
tultfill
veteran
 
tultfill's Avatar
 
Join Date: Jan 2009
Location: internet
Posts: 2,928
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Im disgusted.
tultfill is offline   Reply With Quote
Old 05-09-2015, 10:17 PM   #1442
angeles
old hand
 
angeles's Avatar
 
Join Date: Apr 2011
Location: somewhere over the rainbow
Posts: 1,947
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

yeah pretty ridiculous by these profs to consider this a tie.
angeles is offline   Reply With Quote
Old 05-09-2015, 10:59 PM   #1443
Frankie Fuzz
grinder
 
Join Date: Aug 2013
Posts: 534
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Seems like the organizers should have said that the humans won, but that there was an 8% chance that the humans had no actual edge, due to the role of chance. That would have made a lot more sense.
Frankie Fuzz is offline   Reply With Quote
Old 05-09-2015, 11:23 PM   #1444
fityfmi
old hand
 
fityfmi's Avatar
 
Join Date: Jan 2013
Location: Only the ladder is real.
Posts: 1,696
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
fityfmi is offline   Reply With Quote
Old 05-09-2015, 11:57 PM   #1445
David S
enthusiast
 
David S's Avatar
 
Join Date: Apr 2011
Posts: 83
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by fityfmi View Post
When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
Would love to hear one of the players confirm this
David S is offline   Reply With Quote
Old 05-10-2015, 12:11 AM   #1446
Bigoldnit
veteran
 
Join Date: Sep 2010
Posts: 2,397
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

I for one am just glad to see that AI developers can be just as delusional about variance, edges, and, "I'm just running bad" as casual players.

Bots are the new fish, imo
Bigoldnit is offline   Reply With Quote
Old 05-10-2015, 12:14 AM   #1447
ChicagoRy
 
ChicagoRy's Avatar
 
Join Date: Jan 2007
Location: ColoradoRy
Posts: 20,517
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Based on the results, what is the % chance that the players are better than the bot?
ChicagoRy is offline   Reply With Quote
Old 05-10-2015, 12:15 AM   #1448
sam1chips
adept
 
sam1chips's Avatar
 
Join Date: Aug 2012
Posts: 902
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by fityfmi View Post
When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
You think that the players are actively playing less than optimally?
sam1chips is offline   Reply With Quote
Old 05-10-2015, 12:34 AM   #1449
tenderloinig
adept
 
Join Date: Dec 2013
Posts: 803
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by cheet View Post

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
Pretty gross they did this to you, I noticed it throughout the contest myself. Not fair at all especially given what they said prior to hands being played, and after hands being played. You handled it well though Jason, and you handled their bot too
tenderloinig is offline   Reply With Quote
Old 05-10-2015, 12:37 AM   #1450
ike
Pooh-Bah
 
Join Date: Jan 2004
Posts: 5,635
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

If this result doesn't qualify as statistically significant, how likely was this challenge to generate a statistically significant result?
ike is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -4. The time now is 04:06 PM.


Powered by vBulletin®
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
Copyright © 2008-2020, Two Plus Two Interactive