How Many Trials to be Statistically Significant? - Gambling and Probability

Two Plus Two Forums Other Topics Probability

How Many Trials to be Statistically Significant?

Post Reply Subscribe

...

01-04-2009 , 05:53 AM

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

I'm asking this because I play on a major online site as you may suspect. I felt that I was losing way too many races but didn't know if maybe I was just remembering only those that I lost, so I thought the only way to decide was to do some real number crunching.

I'm not going to mention the site and this is not a bad beat whining post.

I'm trying to test the hypothesis of am I losing more races than I should? i.e. is a fair coin being tossed?

For this purpose, a trial is a race either pre flop all in or all in after the flop, where there are cards yet to burn, i.e. you are not drawing dead when the chips are all in.

I'm going through hand histories, most recent to oldest, looking for hands that were a race situation as mentioned.

Next I'm using pokerstove to calculate the probability of winning the hand. This gives me the expected outcome.

Next, I'm assigning a value of 0 to a lost race and 1 to a won race and .5 to a chop.

Then, I calculate Actual minus Expected.

I sum these over all trials (races) to get my result. Overall, given a fair game the sum of all of the outcomes should be zero. i.e. races will even out over the long run within expected deviations.

Then, I calculated the standard deviation of these trials (races).

This is where things get interesting.

Results

Trials 38 (This is only over a less than a weeks hand histories)
Sum of Actual minus expected over 38 trials = -6.26
Std Deviation over 38 trials = .5

Result is that I am 12.52 std deviations off of the expected value of 0. Anyone who knows statistics knows that this is off the charts.

I have a better chance of winning a drawing where everyone in the world gets one ticket and I get the winning one.

I was a math major at one time, but am many years removed from my statistics course. But given the low number of trials, can I conclude that the game is not fair?

I created a new i.d. for this post so that it could not be tied to my online handle.

Quote

01-04-2009 , 12:57 PM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

You already have a good start for a chi-square test.

Chi-square is calculated as follows:

sum ( ( (observed - expected)^2 ) / expected )

So all you need to do is square the differences you have already calculated and divided those squared differences by the expected value. Then sum these up. The resulting value is distributed as chi-square with degrees of freedom equal to K - 1, where K is the number of hands in your sample.

You can look up the probability of obtaining this value in a chi-square table, or you can compute it using excel: =chitest(observed range, expected range).

The result of the excel formula is the probability of getting the observed result if in fact there is no bias.

As a forewarning though, doing a chi-square test isn't really a fair hypothesis test...you already suspected you were running bad over this small sample of hands which is what motivated you to do the test. A more appropriate test would include all hands you have played on said site.

Sherman

Quote

01-04-2009 , 04:08 PM

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

Quote:

Originally Posted by Sherman

The result of the excel formula is the probability of getting the observed result if in fact there is no bias.
Sherman

Thanks for feedback and I've Googled to get back up to speed on chi square test.

I'm getting a value of .998, which would say that the hypothesis that the game is fair can be rejected with a good a amount of confidence if I understand this correctly.

I'm going to continue populating (up to 42 trials to get the .998).

Updated results
Trials 42
Result -6.4
Std Dev .49
Chi Dist .998

I'll continue to update as I populate the table more.

Comments?

Quote

01-04-2009 , 05:36 PM

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

I'm not really sure if your methodology is right. Why is the expected value of the game equal to 0? If you rougly win as much as you lose (EV=0), then in your reasoning the EV must be 0.5, since losing = 0 and winning = 1. So there's something wrong in your analysis in my opinion.

I assume a fair game (like tossing a coin) is a game with an expected value of 0. Like for example when you win 49% of the time, lose 49% of the time and a tie occurs 2% of the time. Then the EV is (0.49)(+1) + (0.49)(-1) + (0.02)(0) = 0.

If I were you, I would do this:

winning = 1
losing = -1
tie = 0

This is logical, because when you win, you win value (+1). When you lose, you lose value (-1) and when it's a tie, both parties gain/lose the same or stay in the same condition (0).

To get the mean: take the sum of all outcomes and devide by the amount of trials. To get the standard deviation: subtract the mean you just calculated from every outcome, then take the square of this difference. Sum up all these squared differences, and divide them by the number of trials minus 1. Take the square root of the number you become and this is the standard deviation. You can just fill in all the outcomes (-1, 1 and 0) in Excel and use AVERAGE() and STDEV(), which will be way faster...

This mean or average should be close to 0 if the game is fair and the amount of trials is not ridiculously small. Also, you need more then 30 trials the apply the confidence intervals I state later on.

Since I don't have your original data, I will just simulate 40 trials where you lose a bit more then you should expect to lose: You lost 23 times, won 15 times and it was a tie 2 times.

The mean of these trials is -0,2 and the standard deviation is 0,96609.

Now you are testing the hypothesis that this game is fair, or formally, that the mean of this game is equal to 0. If we can reject H0, we can say that this game is not fair.

H0: µ = 0 (the game is fair)
H1: µ ≠ 0 (the game is unfair)

So you need a two-tailed test here... The simplest thing you can do is construct a 95% or 99% confidence interval... These intervals will make you able to say: "I'm 95/99% sure that the population mean lies in this interval".

CI(95%) = mean ± 1.96 stdev/sqrt(n)
CI(99%) = mean ± 2.57 stdev/sqrt(n)

If the value 0 is not in that confidence interval, you can reject H0 and say that the game is unfair at the given confidence leven (and for example take appropriate legal actions).

So let’s construct those intervals:

CI(95%) = -0.2 ± 1.96 * (0,96609/sqrt(40))
CI(99%) = -0.2 ± 2.57 * (0,96609/sqrt(40))

CI(95%) = [ -0,4994; 0,0994 ]
CI(99%) = [ -0,5926 ; 0,1926 ]

Since the value 0 is in both intervals, we can't reject H0, neither with a 95%, nor with a 99% confidence level. So based on my simulated data (where you lost a bit more then expected, I still can't say it's an unfair game).

The number of trials doesn't really matter, bit bigger is obviously better. Just make sure you got more then 30 trials because otherwise you gotta use a Student t-distribution instead of the normal distribution and that just complicates matters too much...

You will always be statistically significant (as long as trials > 30) at the 95% or 99% confidence level if you use this method I discribed. But the conclusion could ofcourse be that the game is actually fair. Maybe if you are really really sure this game is unfair, you can do more trials to reduce the standard error (which is stdev / sqrt(n)). The more trials you have, the smaller the intervals become, so the accurater your prediction becomes. Maybe then you can reduce an interval to one that does not contain the value zero. It is pretty unlickely though that you will find an interval that does not contain 0 is the game is actually fair.

So do this over for your own data, and see what you get...
Let us know!

Quote

01-04-2009 , 06:32 PM

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

Quote:

Originally Posted by Riverdale27

Let me clarify and you'll see why I expect 0. We'll use the coin toss as an example

Trial Expected Actual Actual - Expected
1 .5 1 .5
2 .5 0 -.5
3 .5 1 .5
4 .5 0 -.5

So, in this coin example with four trials, the actual minus expected totals to zero showing a fair coin is being used. I'm using expectation of zero to be the sum over all actual minus expected. The expected is my probability of winning as calculated by pokerstove after all in, and with cards on the board yet to come and no one is drawing dead.

The mean of actual minus expected should approach zero.

I'll modify my spreadsheet to add the CI formulas you show and report back.

Given the clarification do you agree with my methodology?

Quote

01-04-2009 , 06:40 PM

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

Aha yes, now I can see what you mean. Then indeed the expected difference should be 0.

But why the detour of subtracting the EV from the outcome? You can also just average the outcomes themselves (0.5, 0 and 1 for tie/loss/win), which should have an average of 0.5 in a fair game. Then you just test:

H0: µ = 0.5
H1: µ ≠ 0.5

The detour of first subtacting the EV from every outcome and then taking the average of the (outcome-EV) numbers is not really efficiënt. Not that it really matters that much with small calculations like these ofcourse...

But yeah, I'm interested in the conclusions, so keep us posted

Quote

01-04-2009 , 06:49 PM

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

So, after 68 trials tabulated and using the confidence interval approach. I get,

mean = -.09
std dev = 0.46
n = 68

99% ci [-.24,.05]
95% ci [-.2,.02]

Which would indicate that I cannot say at this point the game is unfair.

Quote

01-04-2009 , 10:40 PM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

I'm not sure what your .998 value is exactly, but if that is your chi-square value there is no way that is statistically significant on any number of degrees of freedom, let alone 47. Further, if .998 is the value given using =chitest in excel, the .998 represents the probability of the data if the null hypothesis were true...in that case we have a very high probability and cannot conclude that the game is unfair.

Sherman

Quote

01-05-2009 , 12:18 AM

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

I'm taking a break from tabulating tonight. So far I have

n = 93
mean = -0.127
std dev = 0.461

95% ci [-0.221, -0.033]
99% ci [-0.250, -0.004]

This is telling me that what I've been suspecting has something to it. It is not reasonable to expect to lose so many races.

Sherman - I need to study the Chi square test as I obviously do not understand what it means at this point. But, the result I'm now getting from

Chitest(observed range, expected range) is .9999982

and as expected, I got the same value by doing the calculation manually and looking it up with the chidist(value, df) function.

What is this chi square test telling me? If I understand you that says we cannot conclude the game is unfair even though the confidence intervals would indicate otherwise?

Quote

01-05-2009 , 01:15 AM

#10

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by ElvisPresley

Then the value you are computing is in fact the p-value. The p-value is the probability of getting the data you got (or more extreme) if the sample came from a population that is unbiased (null population). In my opinion, it is pretty overwhelming evidence that you are in fact not running "unprobabilistically" worse than expectation.

Sherman

Quote

01-05-2009 , 11:54 AM

#11

Agthorr

grinder

Join Date: Apr 2005 Posts: 477

Quote:

Originally Posted by Riverdale27

But why the detour of subtracting the EV from the outcome? You can also just average the outcomes themselves (0.5, 0 and 1 for tie/loss/win), which should have an average of 0.5 in a fair game. Then you just test:

Because he's not testing if the game is "fair" in the sense that you mean (each player having a 50% chance of winning).

He's testing if the chance of winning a hand in reality is the same as the advertised chance of winning (based on the visible cards). He wants to know if the game is rigged.

OP's methodology of computing (Actual - Expected) is great for this purpose. I like the confidence interval methodology.

Chi-Square is a minefield of chances to get the methodology wrong in obscure ways. Also, any time you get a p-value of larger than .99, it's very likely there's been a methodological error because that p-value shows that the data fit the expected values TOO well to be believed. A p-value of .9999982 is like rolling a 6-sided die 6 thousand times and finding that you have rolled each value exactly one thousand times. It's too perfect.

ElvisPresley: if you PM me your table (or email me a spreadsheet at daniel@pokersleuth.com), I'd be happy to take a look at it.

Quote

01-05-2009 , 12:46 PM

#12

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by Agthorr

I agree with your regarding the size of the p-value, but calling chi-square a minefield for methodological error...? It is one of the most straightforward inferential statistics there is...but I suppose we could consider all inferential statistics a minefield for methodological error.

Sherman

Quote

01-05-2009 , 01:04 PM

#13

Agthorr

grinder

Join Date: Apr 2005 Posts: 477

Quote:

Originally Posted by Sherman

It is one of the most straightforward inferential statistics there is...but I suppose we could consider all inferential statistics a minefield for methodological error.

Well, for starters, the chi-square test is unreliable if any of the the expected frequencies are less than 5. In the methodology you outlined (as I understand it), all of the expected outcomes are always less than or equal to 1.

Ergo, the gibberish p-value.

There are certain problems where Chi-square is great and it has a long history in certain disciplines (such as comparing a new drug with a placebo). For other problems, it's terrible.

Quote

01-05-2009 , 09:51 PM

#14

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

Quote:

Originally Posted by Agthorr

ElvisPresley: if you PM me your table (or email me a spreadsheet at daniel@pokersleuth.com), I'd be happy to take a look at it.

Thanks for the offer. I'll email it to you.

Quote

01-06-2009 , 11:14 AM

#15

Agthorr

grinder

Join Date: Apr 2005 Posts: 477

Quote:

Originally Posted by ElvisPresley

Thanks for the offer. I'll email it to you.

Thanks. After playing around with various approaches for the confidence intervals, I decided to use the sledgehammer approach. I took your expected probability of winning each of the 93 hands and ran a simulation that generated random numbers to determine a simulated win or loss for each. I then ran 10,000 such simulations. Only 0.47% of the simulations ran worse than you; 99.53% did better.

Before you jump to any conclusions, I'd check the following:

Did you miss any all-in hands from your spreadsheet?
Did you type all of the relevant players and their hands into the spreadsheet correctly?
Did you enter the information correctly into Poker Stove (or similar tool)?
Did you copy the results correctly from Poker Stove (or similar tool)?

Maybe someday I'll write a tool to compute all of this stuff automatically from hand histories, to eliminate the possibility of typographical errors. Have to finish my HUD first, though.

Quote

01-06-2009 , 02:17 PM

#16

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

Could I have that data too please?

kurt_verstegen@hotmail.com

Thank you!

Quote

01-06-2009 , 07:29 PM

#17

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

Quote:

Originally Posted by Agthorr

Before you jump to any conclusions, I'd check the following:

Did you miss any all-in hands from your spreadsheet?
Did you type all of the relevant players and their hands into the spreadsheet correctly?
Did you enter the information correctly into Poker Stove (or similar tool)?
Did you copy the results correctly from Poker Stove (or similar tool)?

I'll go back and double check the data, but I'm pretty detail oriented so I expect that it will be accurate. It may take a few days to do a thorough audit.

But this runs consistent with what made me take on this exercise to begin with. Your results seem to be consistent with the 99% CI test as well.

Quote:

Originally Posted by Riverdale27

Could I have that data too please?

kurt_verstegen@hotmail.com

Thank you!

It's on it's way.

Quote

01-07-2009 , 04:31 AM

#18

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

Thanks for the data,

Now I get your point...

You have a series of hand matchups, where you have a certain equity, a priori. Then, a posteriori, you measure equity by just using 1 for a win, 0 for a loss, and 0.5, 0.33, 0.25, 0.2, etc... for a tie (evt. when you tie between more than 2 people).

Then by subtracting the result from the equity, you get a value that states how much under or over your expectation you ran. In the long run, this should converge to 0... because you are expected to equal the equity in the long run. So A (actual) - E (expected) converges to 0 over a long run...

So we test:

H0: A-E = 0
H1: A-E ≠ 0

In your data I was quite surprised that the 95% confidence interval of the A-E was [ -0,22093831 ; -0,033441626 ]. So we can reject that A-E = 0 with a 95% confidence level. In the 99% confidence level, the interval is [ -0,25011509 ; -0,004264846 ], it doesn't include 0 as well... So either the game is unfair (with 99% confidence), or your sample is a sample that occurs very rarely... When the true population value of A-E is 0, and you get a confidence interval that does not contain 0, it seems to me you have been very unlucky by observing the sample that lies in the tail of the distribution...

I think it is pretty likely that the latter is the case here... Why? Well let's reason a bit.

If you are unlucky, then that means other players are very lucky. What could be the reason for this? The pokersite earns rake from the buy-in of such SNG's, so there is absolutely no advantage for them to let some players win, and let some others lose. The only thing that makes sence to me is that they have crazy matchups of hands, non-random matchups, that lead to a preflop all-in in cashgames, for instance AA vs KK, or maybe TT vs 99 on a T99 flop or something like that... and that they maybe (out of lazyness) use the same hand generator in SNG's (because they didn't want to correct the faulty one). With such a non-random matchup, the pot grows big and they get the maximum rake in that cashgame. But to let one player be lucky and the other one be unlucky, it doens't make sence to me. This only grows negative risk (lawsuit, ...) but does not increase positive risk (extra big profits for the pokersite)...

So for this reason I believe that you just had a really really bad run... (which I've had too some times...)

Last edited by Riverdale27; 01-07-2009 at 04:36 AM.

Quote

01-07-2009 , 09:28 AM

#19

ElvisPresley

stranger

Join Date: Jan 2009 Posts: 9

Quote:

Originally Posted by Riverdale27

Thanks for the feedback. Yes, there are two possible conclusions, the game is unfair or the game is fair and I've had extremely bad luck, cumulatively worse than getting hit by a one outer on the river.

I don't rule out any possibility yet as I don't underestimate greed (most recently Madoff comes to mind), even of a cash cow enterprise or insiders. It just means that I need to keep digging and add to my worksheet to see if results normalize. I guess I wouldn't worry so much if it was a deviation the other way, but it's not.

If the game is not fair, what could be happening? Two possibilities come to mind.
1. The commonly put forth idea that poorer players are allowed to win to keep them playing.
2. There are insider owned accounts, playing at all levels, that are given and unfair edge for obvious reasons. It's hard to detect small frequent hits as opposed to the big greedy hits at high stakes cash.
3. Other?

I can't prove either 1 or 2, all I can do is analyze data and draw a conclusion if I'm throwing away my money or not.

What to do?
-Audit current worksheet for accuracy
-Keep adding to worksheet to see if results normalize to expected
-Write to the company and demand and explanation? would my results suddenly turn better?, the response is predictable that I'm only on a bad run and this is bound to happen due to the large number of players and hands played, etc...
- other?

I'd like to see if others out there have seen similar things and can back it up with data. To begin to assert anything, there would have to be similar results from several other players.

Quote

01-07-2009 , 10:15 AM

#20

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

By the way, change one 0 to a 1 and the 99% confidence interval will contain zero (maybe you need to change 2 0's to 1's, in case the first 0 was in a situation where your equity was really high)...

So I just think you had some real bad luck...

Quote

01-07-2009 , 02:13 PM

#21

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

As I mentioned in the beginning, any hypothesis test (of which a confidence interval can be included) on this data isn't really fair. You already had reason to suspect you are running bad b/c you had already "seen the data".

A more appropriate test would be to only include data that came after you suspected you were running bad.

Sherman

Quote

01-07-2009 , 02:44 PM

#22

Riverdale27

journeyman

Join Date: Aug 2007 Posts: 249

Huh?

Aren't you just testing that A-E can be 0 zero when you correct for sampling error? You have some sample with a couple of values for A-E, and since you can't test em all, you just use sampling theory?

One's expectation will not change the data being observed, will it? Since changing your expectation does not change the data or the conclusions drawn from it....

Quote

01-07-2009 , 03:21 PM

#23

Agthorr

grinder

Join Date: Apr 2005 Posts: 477

Quote:

Originally Posted by Riverdale27

Aren't you just testing that A-E can be 0 zero when you correct for sampling error? You have some sample with a couple of values for A-E, and since you can't test em all, you just use sampling theory?

Sampling theory works if you choose the samples uniformly at random. If you gather samples only when you feel like you've been running bad, then you don't have random samples.

I agree with Sherman.

Quote

Post Reply Subscribe

...