Open Side Menu Go to the Top
Register
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

05-08-2015 , 02:56 PM
All-in equity didn't make a big difference because it was a pretty rare occurrence. It also turns out it may have unintentionally increased the variance slightly because hands play out differently between the mirrored pairs. Variance was also likely higher than usual because the bots (and, later, the humans) played very aggressively. There were a lot more all-ins than usual play.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:04 PM
very interesting stuff here.

question: at what p-value are the results significant? 92%ish? 90%?

i think the article there that was posted is horrible. as one person said, many players would kill to have the non-statistically significant winrate's results in their accounts. most live pros wouldn't have "statistically significant win rates" at the 95% confidence level.

to me at least, this was a clear win by the pros. does anybody have the breakdown per person? how much each won and session stats?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:05 PM
Quote:
Originally Posted by potpotpondo
When I saw it was Polk and Donger, I was confident in the humans.

When I saw Bjorn had been added, I grew even more confident.

When I saw Jason Les had been added, I knew humans were in trouble.
Clown-statement, bro.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:07 PM
Noam is cool

*brofist*

(your bot lost though - but everyone will agree 9bb100 isnt getting crushed against some of the bests in the world)


Looking forward to the next challenge with roided 2.0 Claudico (maybe even human growth hormones?!)
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:10 PM
Hope the professor realizes that for any rematch of the same sample size they will probably be unable to ever claim victory within 95% as these players are probably easily within 10bb/100 of playing GTO if needed.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:17 PM
Quote:
Originally Posted by Nit Bag
Won't the equity chops and mirrored hands be reflected in actual St Dev observed in 80K hands without need for adjustment?
The equity chops would be yah, if they measured std dev (and I dont actually think you would use the standard deviation in a mirrored match if my thinking is correct). The effect of the mirrored hands would not be included if you calculated using std deviation per hand.

Im pretty sure (although I dont have any experience with mirrored matches) that the calculation would go like this. Humans won 7300BB mirrored. The standard deviation we would be measuring is the deviation between winnings for each paired mirror hands (40000 trials). Between 2 equal players the mean winnings for each hand is 0. So if Pair(1) was a cooler and humans lost 100BB and in the reverse hand Claudico lost 200BB the deviation is 100. If Pair(2) was raised pre and BB folds, and in the reverse hand the same thing happened, the deviation is _0_. Similarly if Pair(3) was KK vs AA and both groups got it allin pre, using normal std deviation this would be a massive deviation, but using mirrored std deviation it would be 0. Compute all of these deviations (this would need to be done with a program, but shouldnt be too hard) and then compute the standard deviation from those results, which is very straightforward. Then use that std deviation to find the chances that two evenly matched players would have 1 player end up >= 7300BB.

Writing it out like that, I would guess it would actually be a very significant reduction in std dev and likely put claudico quite unlikely to be a winner.

Edit: Saw some posts above so removed a bit of hyperbole...for now
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:23 PM
Quote:
Originally Posted by 200zoomgrinder
Doubtful CMU will go through calculating this though, although they probably would if they had won and were trying to publish a paper about it.
I am curious about their Tartanian bot (what have won widely over the AI competition as CMU said).

How much bb/100 it has made? It was "statistically significant"?

(I am mad a little...)
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:28 PM
Quote:
Originally Posted by Wasp
I am curious about their Tartanian bot (what have won widely over the AI competition as CMU said).

How much bb/100 it has made? It was "statistically significant"?

(I am mad a little...)
http://www.computerpokercompetition....owall=&start=2
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:28 PM
It seems disingenuous to claim that because the winrate was slightly outside the 95 pct confidence interval that it's a "tie." Some quick calcs suggest it was significant at the 92.3% level.

Second this doesn't take the mirrored matchup into account. I disagree that the mirror format shouldn't be taken into account. For example, imagine if a human got aces vs kings and vice versa for the machine. Given an all in preflop this would add to the calculate variance but should really be taken out of consideration in the mirrored game. It's hard to know exactly how much to adjust for the mirror format but accounting for this, 92.3 seems pretty close...
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:32 PM
speaking of, i dont think there was a single KK vs AA in 80 000 hands o.O.

Or QQ/AK vs AA from what I recall.

Odd
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:39 PM
I'm not sure if the developers did this I think the correct way to calculate the stdev is like this:

Treat each pair of mirror hands as a single game (so there are 40000 games). The net profit or loss from each pair should be used to compute the standard deviation. This would automatically correct for cooler situations like aces vs kings (since the net pnl would usually be 0)
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:40 PM
Quote:
Originally Posted by NoamBrown
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
Thanks for coming here and replying.

Let me first say that Claudico seems like an impressive bot and I enjoyed following this competition.

In the article the professor really makes it sound like the result doesn't mean anything (which is far from the truth).But if that is the case then what can your team learn from this? Given Claudico's high variance play style (with huge all-in overbets) wasn't this result to be expected? Or did you just assume Claudico would crush the opposition?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:42 PM
So in the results for the poker bot competition... (If I am reading results table right)

Tartarian beat hyperborean and prelude by 2bb/100 but did exceed 95% confidence

It beat Slumbot by 3bb/100 but again, exceeded 95% confidence

(I guess they played lots of hands).

Interesting "Claudico 4th place" like math... in the total bankroll challenge (where an exploitative bot gets rewarded for pummeling an exploitable bot) - Tartarian was in 5th, but was awarded gold? (it did beat every other bot - but it didn't win the most in the round robin heads up)
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:43 PM
Quote:
Originally Posted by NoamBrown
Hey all,

I'm one of the Claudico developers and I'm sitting here chatting with the pros and Sam now. I thought I'd clear up some confusion about the statistics. We calculated the 95% confidence interval based on the 80,000 mirrored hands that were played and it was +/- 10.35bb/100. The pros won by 9.16bb/100. That's a pretty strong lead, but the result is not statistically significant at 95%.

We discussed this with the pros before we made the announcement and we all were pretty satisfied with how things were phrased. The title says that the pros finished ahead in chips, but the subheader says it was not a statistically significant result.
With respect your use of +- suggests you erred by incorrectly using a two-sided confidence interval. Apriori it was more than reasonable to assume the humans are better than the AI. Hence the correct 95% interval uses the one-sided hypothesis that Humans are better than AI. This halves the non-confidence region yielding a statistically significant Human victory at the 95% confidence interval.

It is outrageous for an academic to be quoted in his own press release with such deliberately misleading nonsense as the Prof is quoting himself.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:46 PM
Ah - they played 3000 hands per "match".

Please correct me if I am wrong... but they declare >95% confidence with 2bb / 100 WR in 3k hands... but not for -9bb/100 in 80k hands?

Some fuzzy math going on?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:48 PM
.. and nevermind bjorn Li winning at 24bb/100 over 20k hands
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:51 PM
Quote:
Originally Posted by bip!
So in the results for the poker bot competition... (If I am reading results table right)

Tartarian beat hyperborean and prelude by 2bb/100 but did exceed 95% confidence

It beat Slumbot by 3bb/100 but again, exceeded 95% confidence

(I guess they played lots of hands).

Interesting "Claudico 4th place" like math... in the total bankroll challenge (where an exploitative bot gets rewarded for pummeling an exploitable bot) - Tartarian was in 5th, but was awarded gold? (it did beat every other bot - but it didn't win the most in the round robin heads up)
As I assumed. (they have played 300k hands against each others) so 3bb/100 is a huge win if CMU team wins if they lose with 10bb/100 it is just a tie.

I think doesn't matter much if the payed players totally agreed with the planned statement, apologize for my harshness CMU and Brains team
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 03:55 PM
http://www.computerpokercompetition..../96-2014-rules
The rules where it states 3000 hands per match

http://www.computerpokercompetition....5-2014-results
The format of the results (1/1000th of bb per hand) - so just take the value in the table /10 to get WR in bb/100 hands.

http://www.computerpokercompetition....n_uncapped.pdf

And the results^
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:15 PM
I can't figure out how many "matches" were played in the bot competition so I don't know how many hands were played. But for the bot competition they considered each opponent a separate statistical case - as they should do with each human opponent.

But my takeaway is this:

Beat other top bots for 2bb/100?... "Nuclear weapon for poker"

Lose to humans at 9bb/100?... "Claudico tie"...
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:21 PM
Ok - downloaded the HH files and it seems it played each opponent a differing number of hands. Perhaps to reach a statistical confidence? It seems to have played weak competition as little as 50k hands, but the tough competition as much as nearly 1mil hands?

Anyways /derail - sorry
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:24 PM
Quote:
Originally Posted by Kirbynator
speaking of, i dont think there was a single KK vs AA in 80 000 hands o.O.

Or QQ/AK vs AA from what I recall.

Odd
They really only played 40,000 hands.

It does confuse me how everyone is saying they played 80,000 hands. 40,000 unique hands were played (each hand was played twice). I come from a bridge background (which is always played duplicate at tournaments or at a club), and the way this would be scored in bridge would be something like this:

In a 20,000 hand match, team dong/bjorn were +570k*
In a 20,000 hand match, team wcg/cheet were +200k*

The 2 teams each played 20,000 unique hands, and the amount won or lost by is the difference in their scores. This is the only thing that makes sense to me, as opposed to caculating their winrate over 80k hands.

* I am not sure about the exact final tally so I am estimating, but you get the idea
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:26 PM
Quote:
Originally Posted by bip!
http://www.computerpokercompetition..../96-2014-rules
The rules where it states 3000 hands per match

http://www.computerpokercompetition....5-2014-results
The format of the results (1/1000th of bb per hand) - so just take the value in the table /10 to get WR in bb/100 hands.

http://www.computerpokercompetition....n_uncapped.pdf

And the results^
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)

The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)

So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf

Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf

bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --
--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:30 PM
Quote:
Originally Posted by nburch
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)



The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)



So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf



Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf



bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --

--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.

Ah - ty for the clarification. Makes sense.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:37 PM
Quote:
Originally Posted by nburch
There's a distinction between capped and uncapped results, and there are two events with slightly different competitors (for example feste_iro and feste_tbr are two different agents for the two events, from the same competitor.)

The instant run-off event just does the obvious thing. The total bankroll event has had a cap on winnings for a few of years: an agent's win rate against any other agent is capped at .75 big blinds/hand -- always fold. The cap was added because the entire competition was being decided by how everyone did against a single agent (often one with a bug that would make it do something like always call...)

So... Tartanian won the instant run-off event by not losing to anyone (requiring many hundreds of thousands of hands to be able to make that distinction.) That's http://www.computerpokercompetition....ts_2pn_iro.pdf

Tartanian won the total bankroll event by having the best average --capped-- payout against other bots. That's http://www.computerpokercompetition....ts_2pn_tbr.pdf

bip! linked to http://www.computerpokercompetition....n_uncapped.pdf which shows what would have happened --
--without-- the cap, which is not an official event. That summary is included because a bunch of people want to see the results without the cap. There are agents which do quite a bit better than tartanian at beating up the very weak bots at the bottom of the table.
It is a great measurement. So if we remove the weakest opponents (fishes) Tartanian beat the field with 2 bb/100. We do it in Claudico vs Brains competitions too because the Brains vs fishes result doesn't matter neither.

And can we say the Brains beats Claudico at least about the same way as Tartanian beats the field of bot it played against?

If we can, can we say Brains was like a "nuclear weapon" against CMU's poker artificial intelligence?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-08-2015 , 04:47 PM
I asked earlier in the thread how statistical significance would be calculated for this and nobody responded. Shouldn't this have been stated somewhere prior to the match?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote

      
m