Quote:
Originally Posted by nburch
Ignoring everything but the math, they are different situations. With only 40k pairs of duplicate hands, all you can say is that with 95% confidence that the expected value of humans vs. Claudico is between -1.19bb/100hand and +19.51bb/100hand (I think there's an assumption of stationarity here too, but... what else are you going to do?) The computer poker results have a smaller observed edge but much less variance, so that for example you can say with 95% confidence that the expected value of Tartanian7 vs. Prelude is between +0.398bb/100hand and +3.554bb/100hand: all positive.
TimTamBiscuit's point about 1-sided vs 2-sided is something else again. If we've assumed things are normally distributed and 9.16 +/- 10.35 is one 95% confidence interval, then >= 9.16-8.77 is another 95% confidence interval, and that one is all positive...
edit: (just to note the "we" here is an academic writing tic -- it's we as an in all you readers, plus me. I'm not involved with the match in any way...)
well it seems logic. but: if it is a competition then has to be a winner sometimes and if you use too high confidence criteria there wont be any.
(If it is not a competition there only one conclude we can make: in realistic scenario we can say at 95% confidence, that Claudico we saw never will be able to beat the rake on top regulars.)
But I am optimistic and I think the event was as it told. A competition against near top human players and the best poker AI today. So we need to lower are requirements about confidentiality, and we can make it easily: lets calculate 90% or 80% confidence criteria then if we play less hands against a tougher competition. On a smaller sample this is a fair and scientifically correct and acceptable way to decide who is the winner if I think it right.
For example we can say: even though the small sample size we can say the humans was better than Claudico but with "only" with 90% confidence.
(I am not even mentioning if you integrate the winrate/probabilities curve you give a way higher value for the human team than Tartanian achieved against the bots, not even mentioning the human unfriendly environment, etc.)
Lowering confidence criteria on smaller sample size and tougher competition is logical and the only way that can be correct, if we wanted to see a fair competition, right?