Sample Size and Statistics Question - Science, Math and Philosophy Forum

Two Plus Two Forums Other Topics Science, Math, and Philosophy

Sample Size and Statistics Question

Post Reply Subscribe

...

Page 1 of 2

1 2

Page 1 of 2

1 2

05-24-2017 , 02:11 AM

Snomys27

newbie

Join Date: May 2017 Posts: 48

Okay this is a quick post for the actuaries on the board.,
My only exposure to statistics has been a survey in financial statistics. So I know literally the bare minimum.

In determining a winning player

I've heard a lot of people on the forum saying "you need well over X amount of hands to confidently infer Y." Often the number varies from person to person,and really seems quite arbitrary with little to no evidence to back their claims.

My Question

How is proper sample size determined? How is our population parameter determined? Finally do required sample sizes differ between live and online, if so why?

any help would be greatly appreciated.

#crush

Quote

05-24-2017 , 11:10 AM

plaaynde

Poker Historian

Join Date: Jan 2010 Posts: 20,056

DS?

Quote

05-24-2017 , 11:22 AM

ToothSayer

Carpal \'Tunnel

Join Date: Jul 2015 Posts: 13,237

Well, poker is assumed to have an average and a standard deviation, which is the amount by which that average jumps up and down over time.

You can measure mean and standard deviation on an any trending line. From that you can determine what confidence you have after x hands that the true mean is greater than zero. The more hands, the more confidence you have.

It's a statistical trick. Google introduction to confidence interval and find an article or video that resonates with you. I'm sure it's explained a lot better than I could do.

Underlying that is the assumption that the events creating the data behave in generally the same way over the long run. That's a reasonable assumption most of the time. Occasionally it's unreasonable - for example if someone is a tiltmonkey sometimes or if the games change. But even then the variance of the variance will get less and less important over time, as you eventually capture all playing conditions and character traits.

Quote

05-24-2017 , 11:31 AM

plaaynde

Poker Historian

Join Date: Jan 2010 Posts: 20,056

Just push it man.

Have the best cards, the best bluffs, and the best semibluffs.

Quote

05-24-2017 , 10:54 PM

Number Zero

self-banned

Join Date: Mar 2017 Posts: 46

how much do you spend? thats how you know.

Quote

05-28-2017 , 05:04 PM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,073

Quote:

Originally Posted by ToothSayer

You can measure mean and standard deviation on an any trending line. From that you can determine what confidence you have after x hands that the true mean is greater than zero.

Actually you can determine the chances you could get that lucky given you are a break even player. Its not quite the same thing. So if you got a result of one percent it doesn't always mean that you are 99% to be a winning player. To see this suppose you were wondering whether you had the ability to influence coin flips and you proceed to get seven flips in a row to do your bidding. That doesn't mean there is a 127/28 chance that you have this power. In the poker example suppose you were playing in a game with a 15% rake.)

Quote

05-28-2017 , 09:36 PM

1hitrquitr

stranger

Join Date: May 2017 Posts: 1

Quote:

Originally Posted by Snomys27

#crush

I have a classmate that is doing his Capstone project (he just presented it in our Data Mining class for Spring) for his MS in Data Analytics. I thought about doing something similar, but the topic seems more complicated.

As far as your question directly (and I'm not expert) here is what I can provide. The population parameters are unknown, we can only do our best to estimate them. This can be done by simulation, or by collecting LOTS of data on hands played. Here where it can get ugly. First you have to define what the population is. Is it all hands played in a limit or no limit situation? Then is are we talking about high limit, low limit, number of players, etc. This is completely ignoring type of players, etc. In practice the population of interest should be clearly defined.

For example this classmate has used his presentation in two different classes and from what I can understand he is still defining variables, but has a very huge data set, if I remember correctly in the 100's of thousands of hands played.

As far as appropriate sample sizes that can easily be done if you know the margin of error you are comfortable with and once you have a sample proportion and standard error, you of course also chose the confidence level you want. In your question i assume we would be predicting the probability of player A winning.

As far as online/live. Most random number generators are not truly random, they follow an algorithm. Usually they are seeded, meaning they choose a random iteration in the algorithm to begin and simply follow on in order from there. If the generator is not seeded then someone could figure out the pattern (especially if they have inside knowledge of algorithm used). With a live game there is statistical evidence that after a deck has been shuffled 4 times (maybe its 7, but leaning towards 4, I would have to find the book to verify, but sure a simple google search would also answer) there is no real improvement in shuffling beyond that.

Well thats my two cents.

Quote

05-29-2017 , 09:15 PM

PairTheBoard

Carpal \'Tunnel

Join Date: Dec 2003 Posts: 10,037

You might try asking this on the Probability Forum. I think there are people there who have already given this question a lot of thought.

PairTheBoard

Quote

05-29-2017 , 10:28 PM

Trolly McTrollson

Carpal \'Tunnel

Join Date: Oct 2010 Posts: 36,712

Quote:

Originally Posted by David Sklansky

This is a good point. It's very easy to plug in a bunch of hands and calculate a standard deviation and a 95% confidence interval, but you should at least do a quasi-Baysean reality check using information we know about the game.

Like, if someone who's never played poker before sits down at a high-stakes game and beats it over 10k hands with a 95% confidence level, the most likely explanation is that he's just on a sick heater. The probability that a total noob could beat the game is far less than the 5% chance that the player is on a lucky streak. Similarly, some guy who flips 12 heads in a row is probably just incredibly lucky rather than an expert coin flipper.

I think Bill Chen's book discusses this in detail.

Quote

05-30-2017 , 06:35 AM

#10

ToothSayer

Carpal \'Tunnel

Join Date: Jul 2015 Posts: 13,237

It is a good point, but following it also leads to another type of error - confirmation bias. It's not like Sklansky's view is an error-free way of viewing reality. It comes with its own costs.

That's not obvious because he picked a great example (influencing coin tosses) where we know the priors are ridiculously skewed toward the null hypothesis.

Quote

05-30-2017 , 07:22 AM

#11

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Just do a Bayes calculation. Assume that in general (but research this) only 5% of the poker players that start are winners. And also that winners avg say 0.02bb/h winrate on avg in some level in question. Then calculate the chance that you are a winner given that you got the results you did given also that the avg non winning player has say -0.05bb/h (rake etc take care of all this). I am making number up but in principle you can research this in some level.

Then you can do a Bayes calculation after a given number of hands and when it gets above some probability you feel is good then you have proven youa re good player within that error level.

P(winner|results)=P(Results|winner)*P(winner)/P(results)

You can of course just calculate your standard deviation and then your winrate and if the z-score (ie result-avg)/sd is big for the assumption of an even player 0 ev player then you are likely a good one.

See also normal distribution estimation of parameters. Of course make sure your sessions are kind of similar vs well mixed opponents that are representative of the stakes which will be evident over time by how they feel to you.

https://en.wikipedia.org/wiki/Normal_distribution (look under estimation of parameters and then confidence intervals)

In my opinion a strong player in lower stakes of over 0.1bb/h will very rarely have a down month or say 3 months if they play 300 hands per day or more.

Basically 300*30~10000 so sd if 7bb/h it is after 10k hands 7*100=700bb and winrate 0.1*10000=1000bb so only ~8% a winning player with these numbers has a down month.

Last edited by masque de Z; 05-30-2017 at 07:36 AM.

Quote

06-01-2017 , 01:33 AM

#12

BrianTheMick2

Long way to go and a short time to get there.

Join Date: May 2012 Posts: 19,410

Quote:

Originally Posted by masque de Z

Just do a Bayes calculation. Assume that in general (but research this) only 5% of the poker players that start are winners. And also that winners avg say 0.02bb/h winrate on avg in some level in question. Then calculate the chance that you are a winner given that you got the results you did given also that the avg non winning player has say -0.05bb/h (rake etc take care of all this). I am making number up but in principle you can research this in some level.

Then you can do a Bayes calculation after a given number of hands and when it gets above some probability you feel is good then you have proven youa re good player within that error level.

P(winner|results)=P(Results|winner)*P(winner)/P(results)

You can of course just calculate your standard deviation and then your winrate and if the z-score (ie result-avg)/sd is big for the assumption of an even player 0 ev player then you are likely a good one.

See also normal distribution estimation of parameters. Of course make sure your sessions are kind of similar vs well mixed opponents that are representative of the stakes which will be evident over time by how they feel to you.

https://en.wikipedia.org/wiki/Normal_distribution (look under estimation of parameters and then confidence intervals)

In my opinion a strong player in lower stakes of over 0.1bb/h will very rarely have a down month or say 3 months if they play 300 hands per day or more.

Basically 300*30~10000 so sd if 7bb/h it is after 10k hands 7*100=700bb and winrate 0.1*10000=1000bb so only ~8% a winning player with these numbers has a down month.

I doubt that most player's distributions of returns look even remotely like a normal distribution, even if you are looking at winrate per hour (or winrate for 8-hour session). Maybe if the player is 8-tabling it approaches a normal distribution per hour. You'd have to either use a test for normal distribution or just eyeball it. You are likely to have to use some nonparametric statistics, but if the particular player's distributions look normal then you can proceed with parametric tools.

If you were going to do a Bayesian analysis, you should (imo) start with an initial assumption that you lose your proportion of the rake being taken off the table.* This is calculable from a hand history. It would be incredibly difficult to estimate the percentage of players who are profitable.

I strongly disagree with the bolded above. Even the very best players have large heavy-tailed deviations from their average returns. Even DS cannot be described as a particle.

*you could improve this by calculating the proportion of rake contributed by vpip% preflop, I think.

Quote

06-01-2017 , 01:38 AM

#13

BrianTheMick2

Long way to go and a short time to get there.

Join Date: May 2012 Posts: 19,410

Quote:

Originally Posted by ToothSayer

Bayesian analysis is neutral on the confirmation bias scale.

I'm assuming that the person doing the calculations isn't throwing out data ("doesn't count, I was on tilt").

Quote

06-01-2017 , 06:52 AM

#14

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by BrianTheMick2

Thats right they do not look normal near term like an hour or even more because you can be down 3 buy ins easily in 10 hands (an event 10bb/h sd will never anticipate) if on a super bs situation vs luck boxes that have also tilted the hell out of you and you get cold decks also and push all in very frequently to destroy them getting called by gut straight bs or weak kickers who get 2p later etc and losing making you play very aggressively even if still correct. But the idea here is after many sessions the accumulated results will be near normal even if the individual distributions are not normal as long as they are not very very wild and belong to some distribution. Even if you have different distributions every session they will converge as sum to normal eventually as long as they have sd and avg.

Also on the Bayesian thing i was suggesting a line of thinking not using any real numbers as i said also.

Also notice in the bolded example i still gave an 8% for a down month. At a personal level i had never had a negative month playing poker in the stakes i always did. But i never played real high. The worse i ever had was 3 weeks down (that is also biased because i tended to play relentlessly more hands when down trying to recover and have my way with the process which is stupid a bit if on tilt but whatever i was in control and more like madly persistent than on a downward spiral). I also had a small volatility like 6-7bb/h (making harder to experience terrible variance) as player always playing carefully to not be chasing too much bs even when bluffing not being terribly wild at it. I was even when building a loose wild image careful not to overdo it when it made limited sense and selecting more semibluffs rather than pure bluffs plus playing tight preflop on reraises if the table was idiotic and the attention was to other players.

Quote

06-01-2017 , 11:13 AM

#15

Trolly McTrollson

Carpal \'Tunnel

Join Date: Oct 2010 Posts: 36,712

Quote:

Originally Posted by masque de Z

You say that like it's easy, but in practical terms, you'd need prior information about the game that usually isn't readily available if you wanted to do a Bayesean analysis of an actual hand history.

Quote

06-01-2017 , 11:53 AM

#16

ToothSayer

Carpal \'Tunnel

Join Date: Jul 2015 Posts: 13,237

In reality we screw up the analysis because we put our Bayesian priors for own skill usually tilted well toward the "skilled" level.

Quote

06-01-2017 , 03:59 PM

#17

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Its ok one can even assume a variety of profiles and see that the Bayesian calculation yields over 50% winner indication if it started with say 5% or 10%. For certain good starts it will be hard to argue its luck if the sd is not huge and the winrate is big. That of course is true only in microstakes or live with smaller rake than some places.

I would buy that over 50% of the ultimate winners started as winners in the first many thousands hands. Take it even to 30% or 25% of the winners and it still will give you confidence if you have a good start that is deep enough.

Of course its hard to get such data but not impossible.

I started as a winner for example. Never had a down period more than a few days initially then it became weeks and maxed at 3 weeks years later. Its very hard to have a bad start if you are tight initially and you play carefully learning fast to become a bit balanced and not predictable. Even if predictable in levels people do not pay attention it works.

Frankly a good player has a feel if they are lucky or not as much as it would need to invalidate a good winrate estimate. For example even if up you can know you have played bad if you recall your all ins and how many times you got lucky on bad all ins or that you read people wrong often and then got lucky. You can keep track of your big preflop all ins also and see how often your AA held or your lower pairs defeated higher pairs, how often sets at flop etc. All it takes is a few lucky all ins to change the direction. But a person that cares will remember that. In reality you know if the table is easy or tough within minutes because you do not observe only your hands but how they play without you involved. You can recognize their style fast.

Last edited by masque de Z; 06-01-2017 at 04:10 PM.

Quote

06-01-2017 , 09:34 PM

#18

BrianTheMick2

Long way to go and a short time to get there.

Join Date: May 2012 Posts: 19,410

Quote:

Originally Posted by masque de Z

I pretty much only play LO8 and am a nit. I'm pretty sure that you are underestimating how big variance is.

If you were playing at the beginning of the poker boom, then your results may vary. Winning was easy back then. You could simply nut peddle and win. That doesn't work anymore.

Quote

06-01-2017 , 09:37 PM

#19

BrianTheMick2

Long way to go and a short time to get there.

Join Date: May 2012 Posts: 19,410

Quote:

Originally Posted by Trolly McTrollson

All you need is an estimate of how much money is taken off the table per unit of time or hands and the number of players.

Quote

06-01-2017 , 11:48 PM

#20

Pokerlogist

veteran

Join Date: Jul 2005 Posts: 3,302

this site might help http://www.whatsmywinrate.com/

Quote

06-02-2017 , 04:08 AM

#21

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by BrianTheMick2

Never played LO8 only PLO8. But mostly nl holdem. Pot limit omaha is very volatile because people like to reraise pot bets preflop with all kinds of hands plus equities in all ins are close and then if you see flops they can have anything post flop and it gets ugly quickly because of stupid forced all ins because you have to go for it if forced. But Holdem is very different. You can control volatility easily there.

The reason i do not play poker as much anymore is because the ahole republicants closed the system down and i lost all my accounts online and then live i need to drive to San Jose to play tournaments and 1-2 or higher games. I can make more money per hour like 10x more working easy enjoyable jobs that continue to educate me at the same time and trade the market when i get good buy or sell signals.

When you go to high stakes the winrate in NL drops to 0.02bb/h volatlity goes above 8bb/h for good players and then its ridiculously hard to beat volatility consistently.

I never had volatility bigger than 7bb/h and also enjoyed from 0.05-0.15bb/h winrates consistently for long periods.

Sure i do not play a lot recently but who cares. If its not worth it its bs to play for the rake criminals.

My argument is essential this. I knew i could beat the game every time i played because i knew who i was playing against within minutes. No point to play vs very advanced opponents. Change table. Within minutes you know you can win there or not. You know it because you see a ton of mistakes others do that are idiotic so that instantly means you have a better winrate.

A good player also recognizes their own luck. Like hell i will close my eyes to getting multiple AA or sets and winning big with them. That is not standard and it means i got very good results based on luck (same with draws getting there more than they deserved or mistakes not getting punished). But a good player can make good luck work better than it does for others. You know you can beat the game even if you lose near term because the stupid things continue to exist and good reaction to them will be paid off eventually.

Quote

06-02-2017 , 09:21 AM

#22

Trolly McTrollson

Carpal \'Tunnel

Join Date: Oct 2010 Posts: 36,712

Quote:

Originally Posted by BrianTheMick2

All you need is an estimate of how much money is taken off the table per unit of time or hands and the number of players.

I believe you would need a distribution of win rates at whatever stake you're playing at.

Quote

06-02-2017 , 09:23 AM

#23

ToothSayer

Carpal \'Tunnel

Join Date: Jul 2015 Posts: 13,237

Quote:

Originally Posted by Trolly McTrollson

I believe you would need a distribution of win rates at whatever stake you're playing at.

Incidentally, does this data exist? Has anyone ever gotten the player records of a poker site? Would be fascinating to look at.

Quote

06-04-2017 , 10:43 AM

#24

SublettingProblems

centurion

Join Date: Mar 2017 Posts: 189

Quote:

Originally Posted by masque de Z

LO8 is probably the lowest variance form of poker.

Quote

06-04-2017 , 10:21 PM

#25

Zeno

Le Misanthrope

Join Date: Sep 2002 Posts: 22,354

Quote:

Originally Posted by SublettingProblems

LO8 is probably the lowest variance form of poker.

That's why old retired farts on fixed incomes pollute all the tables, its a damn geriatric ward most of the time. Smart casinos would change the high hand jackpot/bad beat to 55-gallon drums of industrial strength Viagra instead of money. It would draw more players.

Quote

Page 1 of 2

First

1 2

Last

Post Reply Subscribe

...

Page 1 of 2

First

1 2

Last