Open Side Menu Go to the Top
Register
Sample Size and Statistics Question Sample Size and Statistics Question

05-24-2017 , 02:11 AM
Okay this is a quick post for the actuaries on the board.,
My only exposure to statistics has been a survey in financial statistics. So I know literally the bare minimum.

In determining a winning player

I've heard a lot of people on the forum saying "you need well over X amount of hands to confidently infer Y." Often the number varies from person to person,and really seems quite arbitrary with little to no evidence to back their claims.

My Question

How is proper sample size determined? How is our population parameter determined? Finally do required sample sizes differ between live and online, if so why?

any help would be greatly appreciated.

#crush
Sample Size and Statistics Question Quote
05-24-2017 , 11:10 AM
DS?
Sample Size and Statistics Question Quote
05-24-2017 , 11:22 AM
Well, poker is assumed to have an average and a standard deviation, which is the amount by which that average jumps up and down over time.

You can measure mean and standard deviation on an any trending line. From that you can determine what confidence you have after x hands that the true mean is greater than zero. The more hands, the more confidence you have.

It's a statistical trick. Google introduction to confidence interval and find an article or video that resonates with you. I'm sure it's explained a lot better than I could do.

Underlying that is the assumption that the events creating the data behave in generally the same way over the long run. That's a reasonable assumption most of the time. Occasionally it's unreasonable - for example if someone is a tiltmonkey sometimes or if the games change. But even then the variance of the variance will get less and less important over time, as you eventually capture all playing conditions and character traits.
Sample Size and Statistics Question Quote
05-24-2017 , 11:31 AM
Just push it man.

Have the best cards, the best bluffs, and the best semibluffs.
Sample Size and Statistics Question Quote
05-24-2017 , 10:54 PM
how much do you spend? thats how you know.
Sample Size and Statistics Question Quote
05-28-2017 , 05:04 PM
Quote:
Originally Posted by ToothSayer

You can measure mean and standard deviation on an any trending line. From that you can determine what confidence you have after x hands that the true mean is greater than zero.
Actually you can determine the chances you could get that lucky given you are a break even player. Its not quite the same thing. So if you got a result of one percent it doesn't always mean that you are 99% to be a winning player. To see this suppose you were wondering whether you had the ability to influence coin flips and you proceed to get seven flips in a row to do your bidding. That doesn't mean there is a 127/28 chance that you have this power. In the poker example suppose you were playing in a game with a 15% rake.)
Sample Size and Statistics Question Quote
05-28-2017 , 09:36 PM
Quote:
Originally Posted by Snomys27
Okay this is a quick post for the actuaries on the board.,
My only exposure to statistics has been a survey in financial statistics. So I know literally the bare minimum.

In determining a winning player

I've heard a lot of people on the forum saying "you need well over X amount of hands to confidently infer Y." Often the number varies from person to person,and really seems quite arbitrary with little to no evidence to back their claims.

My Question

How is proper sample size determined? How is our population parameter determined? Finally do required sample sizes differ between live and online, if so why?

any help would be greatly appreciated.

#crush

I have a classmate that is doing his Capstone project (he just presented it in our Data Mining class for Spring) for his MS in Data Analytics. I thought about doing something similar, but the topic seems more complicated.

As far as your question directly (and I'm not expert) here is what I can provide. The population parameters are unknown, we can only do our best to estimate them. This can be done by simulation, or by collecting LOTS of data on hands played. Here where it can get ugly. First you have to define what the population is. Is it all hands played in a limit or no limit situation? Then is are we talking about high limit, low limit, number of players, etc. This is completely ignoring type of players, etc. In practice the population of interest should be clearly defined.

For example this classmate has used his presentation in two different classes and from what I can understand he is still defining variables, but has a very huge data set, if I remember correctly in the 100's of thousands of hands played.

As far as appropriate sample sizes that can easily be done if you know the margin of error you are comfortable with and once you have a sample proportion and standard error, you of course also chose the confidence level you want. In your question i assume we would be predicting the probability of player A winning.

As far as online/live. Most random number generators are not truly random, they follow an algorithm. Usually they are seeded, meaning they choose a random iteration in the algorithm to begin and simply follow on in order from there. If the generator is not seeded then someone could figure out the pattern (especially if they have inside knowledge of algorithm used). With a live game there is statistical evidence that after a deck has been shuffled 4 times (maybe its 7, but leaning towards 4, I would have to find the book to verify, but sure a simple google search would also answer) there is no real improvement in shuffling beyond that.

Well thats my two cents.
Sample Size and Statistics Question Quote
05-29-2017 , 09:15 PM
You might try asking this on the Probability Forum. I think there are people there who have already given this question a lot of thought.


PairTheBoard
Sample Size and Statistics Question Quote
05-29-2017 , 10:28 PM
Quote:
Originally Posted by David Sklansky
Actually you can determine the chances you could get that lucky given you are a break even player. Its not quite the same thing. So if you got a result of one percent it doesn't always mean that you are 99% to be a winning player. To see this suppose you were wondering whether you had the ability to influence coin flips and you proceed to get seven flips in a row to do your bidding. That doesn't mean there is a 127/28 chance that you have this power.
This is a good point. It's very easy to plug in a bunch of hands and calculate a standard deviation and a 95% confidence interval, but you should at least do a quasi-Baysean reality check using information we know about the game.

Like, if someone who's never played poker before sits down at a high-stakes game and beats it over 10k hands with a 95% confidence level, the most likely explanation is that he's just on a sick heater. The probability that a total noob could beat the game is far less than the 5% chance that the player is on a lucky streak. Similarly, some guy who flips 12 heads in a row is probably just incredibly lucky rather than an expert coin flipper.

I think Bill Chen's book discusses this in detail.
Sample Size and Statistics Question Quote
05-30-2017 , 06:35 AM
It is a good point, but following it also leads to another type of error - confirmation bias. It's not like Sklansky's view is an error-free way of viewing reality. It comes with its own costs.

That's not obvious because he picked a great example (influencing coin tosses) where we know the priors are ridiculously skewed toward the null hypothesis.
Sample Size and Statistics Question Quote
05-30-2017 , 07:22 AM
Just do a Bayes calculation. Assume that in general (but research this) only 5% of the poker players that start are winners. And also that winners avg say 0.02bb/h winrate on avg in some level in question. Then calculate the chance that you are a winner given that you got the results you did given also that the avg non winning player has say -0.05bb/h (rake etc take care of all this). I am making number up but in principle you can research this in some level.

Then you can do a Bayes calculation after a given number of hands and when it gets above some probability you feel is good then you have proven youa re good player within that error level.

P(winner|results)=P(Results|winner)*P(winner)/P(results)


You can of course just calculate your standard deviation and then your winrate and if the z-score (ie result-avg)/sd is big for the assumption of an even player 0 ev player then you are likely a good one.

See also normal distribution estimation of parameters. Of course make sure your sessions are kind of similar vs well mixed opponents that are representative of the stakes which will be evident over time by how they feel to you.

https://en.wikipedia.org/wiki/Normal_distribution (look under estimation of parameters and then confidence intervals)

In my opinion a strong player in lower stakes of over 0.1bb/h will very rarely have a down month or say 3 months if they play 300 hands per day or more.

Basically 300*30~10000 so sd if 7bb/h it is after 10k hands 7*100=700bb and winrate 0.1*10000=1000bb so only ~8% a winning player with these numbers has a down month.

Last edited by masque de Z; 05-30-2017 at 07:36 AM.
Sample Size and Statistics Question Quote
06-01-2017 , 01:33 AM
Quote:
Originally Posted by masque de Z
Just do a Bayes calculation. Assume that in general (but research this) only 5% of the poker players that start are winners. And also that winners avg say 0.02bb/h winrate on avg in some level in question. Then calculate the chance that you are a winner given that you got the results you did given also that the avg non winning player has say -0.05bb/h (rake etc take care of all this). I am making number up but in principle you can research this in some level.

Then you can do a Bayes calculation after a given number of hands and when it gets above some probability you feel is good then you have proven youa re good player within that error level.

P(winner|results)=P(Results|winner)*P(winner)/P(results)


You can of course just calculate your standard deviation and then your winrate and if the z-score (ie result-avg)/sd is big for the assumption of an even player 0 ev player then you are likely a good one.

See also normal distribution estimation of parameters. Of course make sure your sessions are kind of similar vs well mixed opponents that are representative of the stakes which will be evident over time by how they feel to you.

https://en.wikipedia.org/wiki/Normal_distribution (look under estimation of parameters and then confidence intervals)

In my opinion a strong player in lower stakes of over 0.1bb/h will very rarely have a down month or say 3 months if they play 300 hands per day or more.

Basically 300*30~10000 so sd if 7bb/h it is after 10k hands 7*100=700bb and winrate 0.1*10000=1000bb so only ~8% a winning player with these numbers has a down month.
I doubt that most player's distributions of returns look even remotely like a normal distribution, even if you are looking at winrate per hour (or winrate for 8-hour session). Maybe if the player is 8-tabling it approaches a normal distribution per hour. You'd have to either use a test for normal distribution or just eyeball it. You are likely to have to use some nonparametric statistics, but if the particular player's distributions look normal then you can proceed with parametric tools.

If you were going to do a Bayesian analysis, you should (imo) start with an initial assumption that you lose your proportion of the rake being taken off the table.* This is calculable from a hand history. It would be incredibly difficult to estimate the percentage of players who are profitable.

I strongly disagree with the bolded above. Even the very best players have large heavy-tailed deviations from their average returns. Even DS cannot be described as a particle.

*you could improve this by calculating the proportion of rake contributed by vpip% preflop, I think.
Sample Size and Statistics Question Quote
06-01-2017 , 01:38 AM
Quote:
Originally Posted by ToothSayer
It is a good point, but following it also leads to another type of error - confirmation bias. It's not like Sklansky's view is an error-free way of viewing reality. It comes with its own costs.

That's not obvious because he picked a great example (influencing coin tosses) where we know the priors are ridiculously skewed toward the null hypothesis.
Bayesian analysis is neutral on the confirmation bias scale.

I'm assuming that the person doing the calculations isn't throwing out data ("doesn't count, I was on tilt").
Sample Size and Statistics Question Quote
06-01-2017 , 06:52 AM
Quote:
Originally Posted by BrianTheMick2
I doubt that most player's distributions of returns look even remotely like a normal distribution, even if you are looking at winrate per hour (or winrate for 8-hour session). Maybe if the player is 8-tabling it approaches a normal distribution per hour. You'd have to either use a test for normal distribution or just eyeball it. You are likely to have to use some nonparametric statistics, but if the particular player's distributions look normal then you can proceed with parametric tools.

If you were going to do a Bayesian analysis, you should (imo) start with an initial assumption that you lose your proportion of the rake being taken off the table.* This is calculable from a hand history. It would be incredibly difficult to estimate the percentage of players who are profitable.

I strongly disagree with the bolded above. Even the very best players have large heavy-tailed deviations from their average returns. Even DS cannot be described as a particle.

*you could improve this by calculating the proportion of rake contributed by vpip% preflop, I think.
Thats right they do not look normal near term like an hour or even more because you can be down 3 buy ins easily in 10 hands (an event 10bb/h sd will never anticipate) if on a super bs situation vs luck boxes that have also tilted the hell out of you and you get cold decks also and push all in very frequently to destroy them getting called by gut straight bs or weak kickers who get 2p later etc and losing making you play very aggressively even if still correct. But the idea here is after many sessions the accumulated results will be near normal even if the individual distributions are not normal as long as they are not very very wild and belong to some distribution. Even if you have different distributions every session they will converge as sum to normal eventually as long as they have sd and avg.

Also on the Bayesian thing i was suggesting a line of thinking not using any real numbers as i said also.

Also notice in the bolded example i still gave an 8% for a down month. At a personal level i had never had a negative month playing poker in the stakes i always did. But i never played real high. The worse i ever had was 3 weeks down (that is also biased because i tended to play relentlessly more hands when down trying to recover and have my way with the process which is stupid a bit if on tilt but whatever i was in control and more like madly persistent than on a downward spiral). I also had a small volatility like 6-7bb/h (making harder to experience terrible variance) as player always playing carefully to not be chasing too much bs even when bluffing not being terribly wild at it. I was even when building a loose wild image careful not to overdo it when it made limited sense and selecting more semibluffs rather than pure bluffs plus playing tight preflop on reraises if the table was idiotic and the attention was to other players.
Sample Size and Statistics Question Quote
06-01-2017 , 11:13 AM
Quote:
Originally Posted by masque de Z
Just do a Bayes calculation. Assume that in general (but research this) only 5% of the poker players that start are winners. And also that winners avg say 0.02bb/h winrate on avg in some level in question. Then calculate the chance that you are a winner given that you got the results you did given also that the avg non winning player has say -0.05bb/h (rake etc take care of all this). I am making number up but in principle you can research this in some level.

Then you can do a Bayes calculation after a given number of hands and when it gets above some probability you feel is good then you have proven youa re good player within that error level.

P(winner|results)=P(Results|winner)*P(winner)/P(results)


You can of course just calculate your standard deviation and then your winrate and if the z-score (ie result-avg)/sd is big for the assumption of an even player 0 ev player then you are likely a good one.

See also normal distribution estimation of parameters. Of course make sure your sessions are kind of similar vs well mixed opponents that are representative of the stakes which will be evident over time by how they feel to you.

https://en.wikipedia.org/wiki/Normal_distribution (look under estimation of parameters and then confidence intervals)

In my opinion a strong player in lower stakes of over 0.1bb/h will very rarely have a down month or say 3 months if they play 300 hands per day or more.

Basically 300*30~10000 so sd if 7bb/h it is after 10k hands 7*100=700bb and winrate 0.1*10000=1000bb so only ~8% a winning player with these numbers has a down month.
You say that like it's easy, but in practical terms, you'd need prior information about the game that usually isn't readily available if you wanted to do a Bayesean analysis of an actual hand history.
Sample Size and Statistics Question Quote
06-01-2017 , 11:53 AM
In reality we screw up the analysis because we put our Bayesian priors for own skill usually tilted well toward the "skilled" level.
Sample Size and Statistics Question Quote
06-01-2017 , 03:59 PM
Its ok one can even assume a variety of profiles and see that the Bayesian calculation yields over 50% winner indication if it started with say 5% or 10%. For certain good starts it will be hard to argue its luck if the sd is not huge and the winrate is big. That of course is true only in microstakes or live with smaller rake than some places.

I would buy that over 50% of the ultimate winners started as winners in the first many thousands hands. Take it even to 30% or 25% of the winners and it still will give you confidence if you have a good start that is deep enough.

Of course its hard to get such data but not impossible.

I started as a winner for example. Never had a down period more than a few days initially then it became weeks and maxed at 3 weeks years later. Its very hard to have a bad start if you are tight initially and you play carefully learning fast to become a bit balanced and not predictable. Even if predictable in levels people do not pay attention it works.

Frankly a good player has a feel if they are lucky or not as much as it would need to invalidate a good winrate estimate. For example even if up you can know you have played bad if you recall your all ins and how many times you got lucky on bad all ins or that you read people wrong often and then got lucky. You can keep track of your big preflop all ins also and see how often your AA held or your lower pairs defeated higher pairs, how often sets at flop etc. All it takes is a few lucky all ins to change the direction. But a person that cares will remember that. In reality you know if the table is easy or tough within minutes because you do not observe only your hands but how they play without you involved. You can recognize their style fast.

Last edited by masque de Z; 06-01-2017 at 04:10 PM.
Sample Size and Statistics Question Quote
06-01-2017 , 09:34 PM
Quote:
Originally Posted by masque de Z
Its ok one can even assume a variety of profiles and see that the Bayesian calculation yields over 50% winner indication if it started with say 5% or 10%. For certain good starts it will be hard to argue its luck if the sd is not huge and the winrate is big. That of course is true only in microstakes or live with smaller rake than some places.

I would buy that over 50% of the ultimate winners started as winners in the first many thousands hands. Take it even to 30% or 25% of the winners and it still will give you confidence if you have a good start that is deep enough.

Of course its hard to get such data but not impossible.

I started as a winner for example. Never had a down period more than a few days initially then it became weeks and maxed at 3 weeks years later. Its very hard to have a bad start if you are tight initially and you play carefully learning fast to become a bit balanced and not predictable. Even if predictable in levels people do not pay attention it works.

Frankly a good player has a feel if they are lucky or not as much as it would need to invalidate a good winrate estimate. For example even if up you can know you have played bad if you recall your all ins and how many times you got lucky on bad all ins or that you read people wrong often and then got lucky. You can keep track of your big preflop all ins also and see how often your AA held or your lower pairs defeated higher pairs, how often sets at flop etc. All it takes is a few lucky all ins to change the direction. But a person that cares will remember that. In reality you know if the table is easy or tough within minutes because you do not observe only your hands but how they play without you involved. You can recognize their style fast.
I pretty much only play LO8 and am a nit. I'm pretty sure that you are underestimating how big variance is.

If you were playing at the beginning of the poker boom, then your results may vary. Winning was easy back then. You could simply nut peddle and win. That doesn't work anymore.
Sample Size and Statistics Question Quote
06-01-2017 , 09:37 PM
Quote:
Originally Posted by Trolly McTrollson
You say that like it's easy, but in practical terms, you'd need prior information about the game that usually isn't readily available if you wanted to do a Bayesean analysis of an actual hand history.
All you need is an estimate of how much money is taken off the table per unit of time or hands and the number of players.
Sample Size and Statistics Question Quote
06-01-2017 , 11:48 PM
this site might help http://www.whatsmywinrate.com/
Sample Size and Statistics Question Quote
06-02-2017 , 04:08 AM
Quote:
Originally Posted by BrianTheMick2
I pretty much only play LO8 and am a nit. I'm pretty sure that you are underestimating how big variance is.

If you were playing at the beginning of the poker boom, then your results may vary. Winning was easy back then. You could simply nut peddle and win. That doesn't work anymore.
Never played LO8 only PLO8. But mostly nl holdem. Pot limit omaha is very volatile because people like to reraise pot bets preflop with all kinds of hands plus equities in all ins are close and then if you see flops they can have anything post flop and it gets ugly quickly because of stupid forced all ins because you have to go for it if forced. But Holdem is very different. You can control volatility easily there.

The reason i do not play poker as much anymore is because the ahole republicants closed the system down and i lost all my accounts online and then live i need to drive to San Jose to play tournaments and 1-2 or higher games. I can make more money per hour like 10x more working easy enjoyable jobs that continue to educate me at the same time and trade the market when i get good buy or sell signals.


When you go to high stakes the winrate in NL drops to 0.02bb/h volatlity goes above 8bb/h for good players and then its ridiculously hard to beat volatility consistently.

I never had volatility bigger than 7bb/h and also enjoyed from 0.05-0.15bb/h winrates consistently for long periods.

Sure i do not play a lot recently but who cares. If its not worth it its bs to play for the rake criminals.


My argument is essential this. I knew i could beat the game every time i played because i knew who i was playing against within minutes. No point to play vs very advanced opponents. Change table. Within minutes you know you can win there or not. You know it because you see a ton of mistakes others do that are idiotic so that instantly means you have a better winrate.

A good player also recognizes their own luck. Like hell i will close my eyes to getting multiple AA or sets and winning big with them. That is not standard and it means i got very good results based on luck (same with draws getting there more than they deserved or mistakes not getting punished). But a good player can make good luck work better than it does for others. You know you can beat the game even if you lose near term because the stupid things continue to exist and good reaction to them will be paid off eventually.
Sample Size and Statistics Question Quote
06-02-2017 , 09:21 AM
Quote:
Originally Posted by BrianTheMick2
All you need is an estimate of how much money is taken off the table per unit of time or hands and the number of players.
I believe you would need a distribution of win rates at whatever stake you're playing at.
Sample Size and Statistics Question Quote
06-02-2017 , 09:23 AM
Quote:
Originally Posted by Trolly McTrollson
I believe you would need a distribution of win rates at whatever stake you're playing at.
Incidentally, does this data exist? Has anyone ever gotten the player records of a poker site? Would be fascinating to look at.
Sample Size and Statistics Question Quote
06-04-2017 , 10:43 AM
Quote:
Originally Posted by masque de Z
Never played LO8 only PLO8. But mostly nl holdem. Pot limit omaha is very volatile because people like to reraise pot bets preflop with all kinds of hands plus equities in all ins are close and then if you see flops they can have anything post flop and it gets ugly quickly because of stupid forced all ins because you have to go for it if forced. But Holdem is very different. You can control volatility easily there.
LO8 is probably the lowest variance form of poker.
Sample Size and Statistics Question Quote
06-04-2017 , 10:21 PM
Quote:
Originally Posted by SublettingProblems
LO8 is probably the lowest variance form of poker.
That's why old retired farts on fixed incomes pollute all the tables, its a damn geriatric ward most of the time. Smart casinos would change the high hand jackpot/bad beat to 55-gallon drums of industrial strength Viagra instead of money. It would draw more players.
Sample Size and Statistics Question Quote

      
m