Open Side Menu Go to the Top
Register
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

05-09-2015 , 06:08 PM
Quote:
Originally Posted by punter11235
How do you know how it does empirically if you don't measure how well they do at what they are supposed to be doing?
Quote:
Originally Posted by samooth
so you have an equilibrium-finding algorithm for an abstraction that uses imperfect-recall and thus the algorithm has no guarantee of converging to Nash, but you are confident it does because it works well "empirically"? and with empirically you refer to what exactly?

how can you not see the massive disconnect in that claim when you haven't measured how well the approximation process works?
This paper shows that imperfect-recall abstractions improve performance over perfect-recall ones, http://webdocs.cs.ualberta.ca/~bowli...bstraction.pdf. The analysis uses both exploitability and head-to-head performance metrics. Those results were for limit Texas Hold'em, but based on personal communication with the authors similar improvements have been observed in no-limit also with respect to the head-to-head performance metric. Since computing exploitability in no-limit is currently intractable, there are no results for that metric in that domain yet.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 06:32 PM
Quote:
Originally Posted by mythrilfox
samooth, you're missing the point. obviously there's value in computing exploitability and sam never said otherwise. the question is what value is there in computing exploitability using an abstraction. if we're exploitable for 5 bb/100 using an abstracted tree, what does that translate to in terms of exploitability using the full game tree? i'm guessing from sam's comments that meaningful translation of this sort is nontrivial if not impossible.

punter never really addressed this and instead asked what's the value in creating a strategy without being able to assess how far from equilibrium it is, which imo is a silly question. of course there's value. if, for instance, your algorithm generates successive strategies such that strategy n beats all strategies 1 to n-1, then you're still getting closer to equilibrium with each successive iteration without ever knowing how far you are from it, and inevitably you will eventually arrive at equilibrium with enough iterations.

this relies on testing the value of the strategies empirically (i.e. by actually playing bajillions of hands), so as you got far enough along and started generating successive strategies that were extremely close to equilibrium, the edges would be so small that you would never be able to peg down precisely where equilibrium was. that doesn't mean there still isn't immense value in getting close to equilibrium, or at the very least demonstrably improving. it would be like saying we can't measure how far we are from a complete understanding of the laws of physics, so what value is there in even trying to understand them in the first place? literally all of science is measuring how a method/model compares to our current understanding & how well it performs empirically, not some sort of ground-up value assessment which would require complete omniscience of everything in the universe. generally speaking we only have the luxury of ground-up assessment in small and controlled systems that we humans created.
I definitely think that computing exploitability is important. Many of my papers use exploitability calculations whenever applicable/feasible, e.g., http://www.cs.cmu.edu/~sganzfri/Translation_IJCAI13.pdf, http://www.cs.cmu.edu/~sganzfri/Puri...on_AAMAS12.pdf.

The first approach for computing exploitability in LHE was just developed a few years ago, and involved sophisticated techniques to be feasible, http://webdocs.cs.ualberta.ca/~bowli...ijcai-rgbr.pdf. There are no known approaches for doing this in NL. I suppose the techniques from that paper could be applied for computing exploitability in NL if restricted to one betting size everywhere. Researchers from University of Alberta presented preliminary results of doing this at the AAAI Computer Poker and Imperfect Information Workshop back in 2012 or 2013 on their prior year's NL competition strategies. The exploitability that was reported was extremely high. They did not end up publishing this result, and when I followed up they were still looking into verifying the results to ensure correctness.

It seems like this approach would be totally infeasible for computing exploitability with more than one bet size.

Some posters have talked about computing estimates of exploitability, e.g., fixing preflop strategies and computing "postflop exploitability," computing exploitability within some abstraction, etc. I'm just not sure what the merit of such a value would be. If I say "I computed approximate exploitability using abstraction A and assumptions X, Y, and Z and it was 27.2 BB/100," there are probably two people in the world who would care.

Like I mentioned, I'm also not really sure how feasible these computations would be for strategies based on imperfect-recall abstractions.

It's possible that there is something interesting here, but I just haven't had much time to look into this problem very carefully, and as described above my instinct is that it would not be very valuable scientifically at this point (and would possibly require a lot of time and effort).

What I do think would be valuable would be an efficient general algorithm for computing best responses in games with imperfect recall, which would not be poker specific and could theoretically have broader applications. If I'm able to come up with an algorithm for that, then I would be interested in applying it to NLHE.

Last edited by Sam Ganzfried; 05-09-2015 at 07:01 PM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 06:40 PM
''That sounds like a lot, but over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can't be considered a statistically significant victory.

But a tie is significant in and of itself: It suggests that Claudico is at least a match for human players, at least one-on-one and in this particular game.''

from the NBCnews article. wow...
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 06:46 PM
Quote:
Originally Posted by 200zoomgrinder
WOW at tie. I thought Sam was saying that they would accept 95% confidence.
I did not make any comments regarding what confidence interval to use, whether it was a "tie," etc. If other people from CMU or the media want to claim it was a "statistical tie" that is their prerogative.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 06:49 PM
Absolutely hilarious how this was spun as a tie by the CMU team... Oh well, I guess it doesn't truly matter in the long run as computers will inevitably catch up.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 06:50 PM
Quote:
Originally Posted by Keruli
''That sounds like a lot, but over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can't be considered a statistically significant victory.
This is just pathetic...
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 07:07 PM
Inb4 the professor says "I'm 95% confident that every hand is fifty-fifty".
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 07:20 PM
Quote:
Originally Posted by samooth
Come on, man.

We had this thread in late September last year dedicated to Tartanium7: http://forumserver.twoplustwo.com/29...25/index5.html. In this thread, several posters pointed out that -- since the bot is playing a static strategy that resembles your best approximation of a NE in 200bb deep hunl -- an exploitability number or a lower bound of expl. is not only useful but imperative to measure how well the bot/strategy actually is. If you go to the last page of this thread, you see that it died with this exact discussion going on and was left unaddressed, eventually.

After a 6 month posting hiatus, you come back and post



Together with you questioning the "scientific value" of exploitability, this post indicates that for some reason, you and your team are resisting to understand the importance of the issue. punter11235 has made a lot of good posts explaining exploitability and its importance -- it's the most natural test, and it lets you/us actually quantify how well the approximation is rather than relying on relative benchmarks (other bots/humans) with results also being affected by variance. The fact that you beat other bots and that you (probably) lost with Claudico at a 9ish bb/100 rate really tells us nothing other than the obvious (x is better than y over z hands).

I'm not coming here to bring any hate, I have congratulated you on Tartanian7 and I really enjoyed following this challenge as well. It is simply inconceivable for me that you and your team are apparently resisting to understand the entire concept. It becomes even more inconceivable after reading



How are you going to publish papers based on these poker bots without giving out that number? If the answer is "the referees don't demand it", then it is in line with my own experience in academia (although from an unrelated field). meh

Also, this question still stands:



In contrast however, you have quoted the Cepheus bot and its exploitability number, implying you understand its usefulness -- which brings me to my last point:



You absolutely cannot recommend any technique to other domains like national security or medicine without measuring how well it actually works, so as long as you are not giving out that number...

Looking forward to your reply and all infos on how Claudico works.
Addressed your questions in above post.

I stopped posting in the prior thread because I was being harassed about the identity of the developers of a program we played against. It looks like you tacked on a big post at the end there.

Approaches from research on poker have already been successfully applied to other applications, e.g., robust policies for diabetes management, http://webdocs.cs.ualberta.ca/~bowli...2nips-kofn.pdf.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 07:32 PM
@Sam: Apologies, it was noambrown.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 07:53 PM
Quote:
Originally Posted by samooth
so you have an equilibrium-finding algorithm for an abstraction that uses imperfect-recall and thus the algorithm has no guarantee of converging to Nash, but you are confident it does because it works well "empirically"? and with empirically you refer to what exactly?

how can you not see the massive disconnect in that claim when you haven't measured how well the approximation process works?

One other point I'd like to add in addition to my above posts is that sometimes it's very hard to prove theoretical guarantees for approaches for hard problems. E.g., there are no bounds for any of the abstraction algorithms for bucketing hands together. It's pretty much impossible to prove anything theoretically when mapping 10^160 states down to 10^14. As far as I know the only "lossy" abstraction algorithm with a provable bound is from colleagues at CMU, and scales only to a poker game with a 5-card deck, http://www.cs.cmu.edu/~ckroer/papers...4-extended.pdf.

It's valuable to prove theoretical bounds on approaches, but it's also valuable to come up with agents for hard problems, like NLHE, which might involve some heuristics/approximations without provable guarantees/bounds.

It's pretty common to present theoretical analysis just for smaller domains, and then empirical results for larger "real-world" ones. E.g., a lot of poker papers present exploitability calculations on small variants like Leduc Hold'em as a proof of concept, but head-to-head empirical results for full NLHE, e.g., http://www.cs.cmu.edu/~sganzfri/Translation_IJCAI13.pdf, http://www.cs.cmu.edu/~kwaugh/publications/nips09.pdf.

Hopefully the theoretical bounds and the scalable heuristic approaches will eventually meet somewhere, but this is not always possible.

Last edited by Sam Ganzfried; 05-09-2015 at 08:22 PM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 08:09 PM
I understand where the CMU public relations team is coming from when they call this a "statistical tie". However, I feel like the degree to which that (and additionally the "win vs money wagered" comparison used) has been pushed in the press release that was distributed to media outlets went too far and is being used to color this more favorably for their self-interest at the expense of the human's accomplishment here.

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 08:12 PM
Also I recorded this for 2p2 because I want to share what I thought was a great talk by Doug at the closing press conference of the challenge. He outlined everything quite well and I agree with everything he had to say.

https://www.youtube.com/watch?v=L10zaLtUabY
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 08:13 PM
Quote:
Originally Posted by cheet
I understand where the CMU public relations team is coming from when they call this a "statistical tie". However, I feel like the degree to which that (and additionally the "win vs money wagered" comparison used) has been pushed in the press release that was distributed to media outlets went too far and is being used to color this more favorably for their self-interest at the expense of the human's accomplishment here.

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
A University should be held to a higher standard than a beer commercial. Claiming a tie is straight up academic dishonesty/fraud and they should be sanctioned if not fired.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 08:27 PM
Quote:
Originally Posted by tenderloinig
I'm sure you say that in reverse. At the beginning of this you guys said 95% would be considered significant.
If Barcelona beats Bayern at 4:3 in football that would be a tie too. Because it is not statistically significant.

Moreso when Klitchko K.o-ed Pulev in boxing was statistically insignificant. That should be a tie too.

I suggest to poker media to do not spread the lie about "tie". Only tell the truth:

Facts:
- 80k mirrored and eq chopped hands has been played,
- 7320 blinds won by Brains team,
- more than 9bb/100 winrate against Claudico
- CMU team still consider a p=92% win, and a 9bb/100 loosing as a "tie" which is a joke for any poker player.

(Let me be demagogue a little: any of us would play "This Was A Tie" (sic. professor said this) game life long at high stakes. For example this game means at 200 hands/hour at nl2k, the human player can make 360 usd/hour pre rake hourly rate, which is a nice salary and Claudico would make -3M USD per year against best human players at two table yearly. pre rake this too ofc... It is stupid calculation, but telling lies to poker players about simple results is stupid thing too.)
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 08:47 PM
Quote:
Originally Posted by NoamBrown
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
Would you please post the math for the 95% confidence interval?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 09:07 PM
Im disgusted.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 10:17 PM
yeah pretty ridiculous by these profs to consider this a tie.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 10:59 PM
Seems like the organizers should have said that the humans won, but that there was an 8% chance that the humans had no actual edge, due to the role of chance. That would have made a lot more sense.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 11:23 PM
When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-09-2015 , 11:57 PM
Quote:
Originally Posted by fityfmi
When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
Would love to hear one of the players confirm this
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 12:11 AM
I for one am just glad to see that AI developers can be just as delusional about variance, edges, and, "I'm just running bad" as casual players.

Bots are the new fish, imo
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 12:14 AM
Based on the results, what is the % chance that the players are better than the bot?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 12:15 AM
Quote:
Originally Posted by fityfmi
When all the hands are broadcasted with holecards up, it should go without saying that the players will not bring their top default game. While 9bb/100 is a crushing winrate, I doubt that this was the full potential of the human team.
You think that the players are actively playing less than optimally?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 12:34 AM
Quote:
Originally Posted by cheet

I also want to say that coming down the stretch in this challenge my personal results vs Claudico were often held as a trophy to the media as if it were evidence that Claudico was beating an individual, despite the format and spirit of this competition being a "team". I don't recall the term "statistical tie" ever being used then.
Pretty gross they did this to you, I noticed it throughout the contest myself. Not fair at all especially given what they said prior to hands being played, and after hands being played. You handled it well though Jason, and you handled their bot too
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-10-2015 , 12:37 AM
If this result doesn't qualify as statistically significant, how likely was this challenge to generate a statistically significant result?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote

      
m