Two Plus Two Publishing LLC Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > >

News, Views, and Gossip For poker news, views and gossip.

Reply
 
Thread Tools Display Modes
Old 05-08-2015, 02:56 PM   #1326
NoamBrown
stranger
 
Join Date: May 2015
Posts: 7
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

All-in equity didn't make a big difference because it was a pretty rare occurrence. It also turns out it may have unintentionally increased the variance slightly because hands play out differently between the mirrored pairs. Variance was also likely higher than usual because the bots (and, later, the humans) played very aggressively. There were a lot more all-ins than usual play.
NoamBrown is offline   Reply With Quote
Old 05-08-2015, 03:04 PM   #1327
UpHillBothWays
adept
 
Join Date: Nov 2013
Posts: 978
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

very interesting stuff here.

question: at what p-value are the results significant? 92%ish? 90%?

i think the article there that was posted is horrible. as one person said, many players would kill to have the non-statistically significant winrate's results in their accounts. most live pros wouldn't have "statistically significant win rates" at the 95% confidence level.

to me at least, this was a clear win by the pros. does anybody have the breakdown per person? how much each won and session stats?
UpHillBothWays is offline   Reply With Quote
Old 05-08-2015, 03:05 PM   #1328
restorativejustice
veteran
 
restorativejustice's Avatar
 
Join Date: Aug 2014
Posts: 2,753
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by potpotpondo View Post
When I saw it was Polk and Donger, I was confident in the humans.

When I saw Bjorn had been added, I grew even more confident.

When I saw Jason Les had been added, I knew humans were in trouble.
Clown-statement, bro.
restorativejustice is offline   Reply With Quote
Old 05-08-2015, 03:07 PM   #1329
Kirbynator
Carpal \'Tunnel
 
Kirbynator's Avatar
 
Join Date: Aug 2006
Posts: 52,005
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Noam is cool

*brofist*

(your bot lost though - but everyone will agree 9bb100 isnt getting crushed against some of the bests in the world)


Looking forward to the next challenge with roided 2.0 Claudico (maybe even human growth hormones?!)
Kirbynator is offline   Reply With Quote
Old 05-08-2015, 03:10 PM   #1330
Nit Bag
centurion
 
Join Date: May 2012
Posts: 160
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Hope the professor realizes that for any rematch of the same sample size they will probably be unable to ever claim victory within 95% as these players are probably easily within 10bb/100 of playing GTO if needed.
Nit Bag is offline   Reply With Quote
Old 05-08-2015, 03:17 PM   #1331
200zoomgrinder
journeyman
 
Join Date: Jan 2014
Posts: 366
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by Nit Bag View Post
Won't the equity chops and mirrored hands be reflected in actual St Dev observed in 80K hands without need for adjustment?
The equity chops would be yah, if they measured std dev (and I dont actually think you would use the standard deviation in a mirrored match if my thinking is correct). The effect of the mirrored hands would not be included if you calculated using std deviation per hand.

Im pretty sure (although I dont have any experience with mirrored matches) that the calculation would go like this. Humans won 7300BB mirrored. The standard deviation we would be measuring is the deviation between winnings for each paired mirror hands (40000 trials). Between 2 equal players the mean winnings for each hand is 0. So if Pair(1) was a cooler and humans lost 100BB and in the reverse hand Claudico lost 200BB the deviation is 100. If Pair(2) was raised pre and BB folds, and in the reverse hand the same thing happened, the deviation is _0_. Similarly if Pair(3) was KK vs AA and both groups got it allin pre, using normal std deviation this would be a massive deviation, but using mirrored std deviation it would be 0. Compute all of these deviations (this would need to be done with a program, but shouldnt be too hard) and then compute the standard deviation from those results, which is very straightforward. Then use that std deviation to find the chances that two evenly matched players would have 1 player end up >= 7300BB.

Writing it out like that, I would guess it would actually be a very significant reduction in std dev and likely put claudico quite unlikely to be a winner.

Edit: Saw some posts above so removed a bit of hyperbole...for now
200zoomgrinder is offline   Reply With Quote
Old 05-08-2015, 03:23 PM   #1332
Wasp
enthusiast
 
Join Date: Feb 2010
Posts: 70
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by 200zoomgrinder View Post
Doubtful CMU will go through calculating this though, although they probably would if they had won and were trying to publish a paper about it.
I am curious about their Tartanian bot (what have won widely over the AI competition as CMU said).

How much bb/100 it has made? It was "statistically significant"?

(I am mad a little...)
Wasp is offline   Reply With Quote
Old 05-08-2015, 03:28 PM   #1333
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by Wasp View Post
I am curious about their Tartanian bot (what have won widely over the AI competition as CMU said).

How much bb/100 it has made? It was "statistically significant"?

(I am mad a little...)
http://www.computerpokercompetition....owall=&start=2
bip! is offline   Reply With Quote
Old 05-08-2015, 03:28 PM   #1334
polarizeddeck
newbie
 
Join Date: Apr 2015
Posts: 38
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

It seems disingenuous to claim that because the winrate was slightly outside the 95 pct confidence interval that it's a "tie." Some quick calcs suggest it was significant at the 92.3% level.

Second this doesn't take the mirrored matchup into account. I disagree that the mirror format shouldn't be taken into account. For example, imagine if a human got aces vs kings and vice versa for the machine. Given an all in preflop this would add to the calculate variance but should really be taken out of consideration in the mirrored game. It's hard to know exactly how much to adjust for the mirror format but accounting for this, 92.3 seems pretty close...
polarizeddeck is offline   Reply With Quote
Old 05-08-2015, 03:32 PM   #1335
Kirbynator
Carpal \'Tunnel
 
Kirbynator's Avatar
 
Join Date: Aug 2006
Posts: 52,005
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

speaking of, i dont think there was a single KK vs AA in 80 000 hands o.O.

Or QQ/AK vs AA from what I recall.

Odd
Kirbynator is offline   Reply With Quote
Old 05-08-2015, 03:39 PM   #1336
polarizeddeck
newbie
 
Join Date: Apr 2015
Posts: 38
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

I'm not sure if the developers did this I think the correct way to calculate the stdev is like this:

Treat each pair of mirror hands as a single game (so there are 40000 games). The net profit or loss from each pair should be used to compute the standard deviation. This would automatically correct for cooler situations like aces vs kings (since the net pnl would usually be 0)
polarizeddeck is offline   Reply With Quote
Old 05-08-2015, 03:40 PM   #1337
Joshua
journeyman
 
Joshua's Avatar
 
Join Date: Sep 2002
Location: Monkey Tilt
Posts: 279
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by NoamBrown View Post
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
Thanks for coming here and replying.

Let me first say that Claudico seems like an impressive bot and I enjoyed following this competition.

In the article the professor really makes it sound like the result doesn't mean anything (which is far from the truth).But if that is the case then what can your team learn from this? Given Claudico's high variance play style (with huge all-in overbets) wasn't this result to be expected? Or did you just assume Claudico would crush the opposition?
Joshua is offline   Reply With Quote
Old 05-08-2015, 03:42 PM   #1338
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

So in the results for the poker bot competition... (If I am reading results table right)

Tartarian beat hyperborean and prelude by 2bb/100 but did exceed 95% confidence

It beat Slumbot by 3bb/100 but again, exceeded 95% confidence

(I guess they played lots of hands).

Interesting "Claudico 4th place" like math... in the total bankroll challenge (where an exploitative bot gets rewarded for pummeling an exploitable bot) - Tartarian was in 5th, but was awarded gold? (it did beat every other bot - but it didn't win the most in the round robin heads up)
bip! is offline   Reply With Quote
Old 05-08-2015, 03:43 PM   #1339
TimTamBiscuit
veteran
 
TimTamBiscuit's Avatar
 
Join Date: Oct 2007
Location: cRUSHed!!!!!!
Posts: 2,134
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by NoamBrown View Post
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
With respect your use of +- suggests you erred by incorrectly using a two-sided confidence interval. Apriori it was more than reasonable to assume the humans are better than the AI. Hence the correct 95% interval uses the one-sided hypothesis that Humans are better than AI. This halves the non-confidence region yielding a statistically significant Human victory at the 95% confidence interval.

It is outrageous for an academic to be quoted in his own press release with such deliberately misleading nonsense as the Prof is quoting himself.
TimTamBiscuit is offline   Reply With Quote
Old 05-08-2015, 03:46 PM   #1340
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Ah - they played 3000 hands per "match".

Please correct me if I am wrong... but they declare >95% confidence with 2bb / 100 WR in 3k hands... but not for -9bb/100 in 80k hands?

Some fuzzy math going on?
bip! is offline   Reply With Quote
Old 05-08-2015, 03:48 PM   #1341
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

.. and nevermind bjorn Li winning at 24bb/100 over 20k hands
bip! is offline   Reply With Quote
Old 05-08-2015, 03:51 PM   #1342
Wasp
enthusiast
 
Join Date: Feb 2010
Posts: 70
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by bip! View Post
So in the results for the poker bot competition... (If I am reading results table right)

Tartarian beat hyperborean and prelude by 2bb/100 but did exceed 95% confidence

It beat Slumbot by 3bb/100 but again, exceeded 95% confidence

(I guess they played lots of hands).

Interesting "Claudico 4th place" like math... in the total bankroll challenge (where an exploitative bot gets rewarded for pummeling an exploitable bot) - Tartarian was in 5th, but was awarded gold? (it did beat every other bot - but it didn't win the most in the round robin heads up)
As I assumed. (they have played 300k hands against each others) so 3bb/100 is a huge win if CMU team wins if they lose with 10bb/100 it is just a tie.

I think doesn't matter much if the payed players totally agreed with the planned statement, apologize for my harshness CMU and Brains team
Wasp is offline   Reply With Quote
Old 05-08-2015, 03:55 PM   #1343
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

http://www.computerpokercompetition..../96-2014-rules
The rules where it states 3000 hands per match

http://www.computerpokercompetition....5-2014-results
The format of the results (1/1000th of bb per hand) - so just take the value in the table /10 to get WR in bb/100 hands.

http://www.computerpokercompetition....n_uncapped.pdf

And the results^
bip! is offline   Reply With Quote
Old 05-08-2015, 04:15 PM   #1344
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

I can't figure out how many "matches" were played in the bot competition so I don't know how many hands were played. But for the bot competition they considered each opponent a separate statistical case - as they should do with each human opponent.

But my takeaway is this:

Beat other top bots for 2bb/100?... "Nuclear weapon for poker"

Lose to humans at 9bb/100?... "Claudico tie"...
bip! is offline   Reply With Quote
Old 05-08-2015, 04:21 PM   #1345
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Ok - downloaded the HH files and it seems it played each opponent a differing number of hands. Perhaps to reach a statistical confidence? It seems to have played weak competition as little as 50k hands, but the tough competition as much as nearly 1mil hands?

Anyways /derail - sorry
bip! is offline   Reply With Quote
Old 05-08-2015, 04:24 PM   #1346
feedmykids2
Pooh-Bah
 
Join Date: May 2012
Posts: 4,527
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by Kirbynator View Post
speaking of, i dont think there was a single KK vs AA in 80 000 hands o.O.

Or QQ/AK vs AA from what I recall.

Odd
They really only played 40,000 hands.

It does confuse me how everyone is saying they played 80,000 hands. 40,000 unique hands were played (each hand was played twice). I come from a bridge background (which is always played duplicate at tournaments or at a club), and the way this would be scored in bridge would be something like this:

In a 20,000 hand match, team dong/bjorn were +570k*
In a 20,000 hand match, team wcg/cheet were +200k*

The 2 teams each played 20,000 unique hands, and the amount won or lost by is the difference in their scores. This is the only thing that makes sense to me, as opposed to caculating their winrate over 80k hands.

* I am not sure about the exact final tally so I am estimating, but you get the idea
feedmykids2 is offline   Reply With Quote
Old 05-08-2015, 04:26 PM   #1347
nburch
newbie
 
Join Date: Jan 2015
Posts: 47
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by bip! View Post
http://www.computerpokercompetition..../96-2014-rules
The rules where it states 3000 hands per match

http://www.computerpokercompetition....5-2014-results
The format of the results (1/1000th of bb per hand) - so just take the value in the table /10 to get WR in bb/100 hands.

http://www.computerpokercompetition....n_uncapped.pdf

And the results^
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)

The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)

So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf

Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf

bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --
--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.
nburch is offline   Reply With Quote
Old 05-08-2015, 04:30 PM   #1348
bip!
Slow Pony
 
bip!'s Avatar
 
Join Date: Oct 2012
Location: not on urban dictionary...
Posts: 13,705
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by nburch View Post
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)



The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)



So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf



Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf



bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --

--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.

Ah - ty for the clarification. Makes sense.
bip! is offline   Reply With Quote
Old 05-08-2015, 04:37 PM   #1349
Wasp
enthusiast
 
Join Date: Feb 2010
Posts: 70
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

Quote:
Originally Posted by nburch View Post
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)

The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)

So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf

Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf

bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --
--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.
It is a great measurement. So if we remove the weakest opponents (fishes) Tartanian beat the field with 2 bb/100. We do it in Claudico vs Brains competitions too because the Brains vs fishes result doesn't matter neither.

And can we say the Brains beats Claudico at least about the same way as Tartanian beats the field of bot it played against?

If we can, can we say Brains was like a "nuclear weapon" against CMU's poker artificial intelligence?
Wasp is offline   Reply With Quote
Old 05-08-2015, 04:47 PM   #1350
Frankie Fuzz
grinder
 
Join Date: Aug 2013
Posts: 534
Re: WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

I asked earlier in the thread how statistical significance would be calculated for this and nobody responded. Shouldn't this have been stated somewhere prior to the match?
Frankie Fuzz is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -4. The time now is 03:36 PM.


Powered by vBulletin®
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
Copyright © 2008-2020, Two Plus Two Interactive