Open Side Menu Go to the Top
Register
A primer on the statistics behind variance A primer on the statistics behind variance

01-27-2008 , 02:08 PM
Quote:
Originally Posted by bottomset
the other tricky part is, that unless your name is fgators most people won't play a million hands with the same exact style, without trying to improve

just sit back, realize the longrun doesn't exist, and play your best every session and work on your game basically
100% ACK.

This is something I have in my mind for sometime now when thinking over these "in the long run it is all skill"-debates sometimes happening.

Still so many people seem not to realize/consider it at all, that I always thought it might be my more naive pennytable/pokernewbie mentality.
A primer on the statistics behind variance Quote
01-27-2008 , 09:02 PM
Quote:
Originally Posted by Albert Moulton
Isn't it likely that the curve representing a poker players win rate per 100 hands is not normally distributed. There are other distributions besides normal distributions, and the "skill" factor should mean that, on average, a skilled player who has the same normal distribution of starting hands and flops (normal distribution based on RnGs allocation of cards) will win more and lose less than a less skilled player. The card and hand distribution should be normal, but the amount of wins and losses might not be.

Is there a way to plot several winning and losing player PT results regarding win rate/100 distributions over 200K hands to see if the "curve" looks normal vs some other, skewed distribution curve? It is likely, I think, that the curves are skewed with the bad players losing more money more often with the same cards thand the good ones.
Yes, the distribution function for a given player's results over a given sample is not a normal distribution, but it probably is somewhat close if the sample size is at least hundreds of hands. This isn't to say that all players have the same normalish distribution, each one (if they always played the exact same strategy in the same types of games) would have their own "true" winrate and SD... using fiaca's graphs as an illustration, better players would have a larger slope for the pink line representing their expectation, and players with a larger SD are likely to have more variation about this line.

Different players have different winrates and SDs, so they have different distributions for their results, so I don't think that having lots of players submit results for 200K hands and plotting them would answer the question you're interested in. What would be effective (if I understand your question right) would be having one player record their result every 100 hands, and plotting, say, 1000 of these results and seeing how close they approximate a normal distribution (or if we want the result to wind up looking a little more like a normal distribution then we'd use samples of maybe 500 hands instead of 100). The key is that we want each of these 100 hand samples to be generated by the same distribution (so from the same player, with the same playing style, etc. Ideally even the same opponents/etc.), so that when we plot enough of them we start to see the shape of that underlying distribution.
A primer on the statistics behind variance Quote
01-27-2008 , 09:12 PM
Quote:
Originally Posted by CallCallCall
Will need to read this a few times (and the rest) to take it in.

.......


Interesting view bottomset, I've always tried to approach poker in the "long term" view. Yet you believe there to be no long term game.

Has my mind been corrupted by poker books?
bottomset's point is just that you don't reach the "long term" (as we saw above, if you define reaching the long term as earning within .5 ptBB/100 of your true winrate, it'll take you a couple million hands to be 95% sure you're there, and your game and the game conditions change way before you reach that). Of course he still agrees that what you should be worrying about is making good decisions that maximize your expectation (or best satisfy whatever your goals are - maybe you intentionally sacrifice a teeny amount of EV in order to lower your variance) since you can't control your luck and it should be irrelevant to your decisions.
A primer on the statistics behind variance Quote
01-28-2008 , 05:07 AM
Quote:
Originally Posted by holdem2000
While anywhere from 30 to 100 events summed together is typically considered enough for the Central Limit Theorem to apply, poker hands are a bit more extreme, with large events (+/- 100 big blinds) occurring with a frequency that I don’t believe 100 hand samples are very close to normally distributed. The SD of 100 hand samples provided by PokerTracker underestimates our true variance. For example, using the numbers used later in this post, the probability of winning 4 buyins (200 ptBB) in 100 hands is approximately 1 in 5 million.
That isn't the only reason PokerTracker underestimates our true variance. PT uses the old B&M methodology for calculating s.d. Before the internet it wasn't possible to easily track every poker hand. Therefore the session by session method was used. Sessions tend to flatten true variance. Our quitting strategy affects the variance.
Today online it's possible to calculate s.d. using the hand by hand method. This usually produces a variance about 20% higher than the other method.
A primer on the statistics behind variance Quote
03-10-2008 , 02:47 AM
Bump for awesomeness.
A primer on the statistics behind variance Quote
03-10-2008 , 04:19 AM
Very interesting post, I've done a bit of statistics so it's good to see some more indepth poker related statistical analysis. Has anybody done a similar thing in regards to winrate in tournament poker?
A primer on the statistics behind variance Quote
03-30-2008 , 12:42 PM
Okay,

A friend of mine who is a statistics major says this is flawed because you are assuming the possible winrates to be infinity to -infinity. I'm not a statistics guy, but is he correct?
A primer on the statistics behind variance Quote
03-30-2008 , 04:21 PM
Quote:
Originally Posted by fiaca
if i didn't mess up the math, here are some simulations with a winrate of 3 and standard deviation of 35 over 50k hands:


100k hands:



Nice and very informative. How does one do these?
A primer on the statistics behind variance Quote
04-08-2008 , 05:59 AM
Quote:
Originally Posted by Mike Kelley
Okay,

A friend of mine who is a statistics major says this is flawed because you are assuming the possible winrates to be infinity to -infinity. I'm not a statistics guy, but is he correct?
Sorry I took so long to respond to this...

Your friend's point is another reason that the assumption that the outcomes of samples of poker hands cannot be precisely normally distributed. I think the flaws introduced by poker hands not being independent & identically distributed outweigh the ones introduced by the assumption above, and I think the point made by jogsxyz about how PT computes variance outweighs all of these minor flawed assumptions.
A primer on the statistics behind variance Quote
04-11-2008 , 08:49 AM
Quote:
Originally Posted by jogsxyz
That isn't the only reason PokerTracker underestimates our true variance. PT uses the old B&M methodology for calculating s.d. Before the internet it wasn't possible to easily track every poker hand. Therefore the session by session method was used. Sessions tend to flatten true variance. Our quitting strategy affects the variance.
Today online it's possible to calculate s.d. using the hand by hand method. This usually produces a variance about 20% higher than the other method.
Hello,

What's exactly the "old B&M Methodology"? I've never heard of it before.

I've been searching the net and it seems that it is any kind of simulation algorithm, adressed to Barraquand & Martineau (B&M, of course).

Anyway, how does PT calculate the standard deviation? I do not really understand the term std. dev. per hour/100 hands...

If you have any link explaining it, please share it with me.

corp
A primer on the statistics behind variance Quote
04-11-2008 , 08:55 AM
Quote:
Originally Posted by Mike Kelley
Okay,

A friend of mine who is a statistics major says this is flawed because you are assuming the possible winrates to be infinity to -infinity. I'm not a statistics guy, but is he correct?
No, it isnt correct.

No normal-modeled variable in nature can reach -Inf to +Inf values, and this point does not impide us to deal it as if it was normal.

I have doubts on the distribution of gains (and its normality), but not related to what your "statistics major" friend says.

corp
A primer on the statistics behind variance Quote
04-11-2008 , 10:08 AM
Understanding variance is the first step to understanding God.
A primer on the statistics behind variance Quote
04-11-2008 , 04:03 PM
Quote:
Originally Posted by corpcd
No, it isnt correct.

No normal-modeled variable in nature can reach -Inf to +Inf values, and this point does not impide us to deal it as if it was normal.

I have doubts on the distribution of gains (and its normality), but not related to what your "statistics major" friend says.

corp

This is an incoherent thought and impide isn't a word.
A primer on the statistics behind variance Quote
04-11-2008 , 07:55 PM
Quote:
Originally Posted by corpcd
What's exactly the "old B&M Methodology"? I've never heard of it before.
B&M stands for Brick and Mortar - as in real life buildings.
A primer on the statistics behind variance Quote
04-12-2008 , 03:58 AM
Quote:
Originally Posted by Mike Kelley
This is an incoherent thought
Why? I am only telling that in practice there is no need for a variable to reach -Inf/+Inf values to model it as if it was normal.

Quote:
Originally Posted by Mike Kelley
and impide isn't a word.
Sorry, I used spanish without even notice it. I meant "prevent"...

"this point does not prevent us to deal it as if it was normal."

corp

Last edited by corpcd; 04-12-2008 at 04:10 AM.
A primer on the statistics behind variance Quote
04-12-2008 , 04:09 AM
Quote:
Originally Posted by holdem2000
B&M stands for Brick and Mortar - as in real life buildings.
Are you kidding?

So the term B&M methodology is a "colloquial sentence", or something like that :-)

Then, anyone knows how the std. dev. is calculated?

This should be:

N hands.
gi=gain in the i-hand.

sd=sqrt[sum(gi-mg)2/N]

mg=mean gain.

But the std. dev. used by PT is something related to 100 hands gains. I supose they add up 100 by 100 hands gains for income to be stable...

N hands
N/100 sets of 100-hands
gj=sum of all the gains of the set j. gj=sum(gi){i=100(j-1)+1 to 100j}

sd_poker_tracker=std.dev. of the gj variable.

żAm i right?
corp
A primer on the statistics behind variance Quote
04-14-2008 , 07:39 AM
Quote:
Originally Posted by corpcd
Why? I am only telling that in practice there is no need for a variable to reach -Inf/+Inf values to model it as if it was normal.
Why would you though? When there is a clear limit in a table stakes game that you can win or lose in 100 hands. If it's HU the most you can win is 100x100bb's or 10,000 bb's which is a long ways from infinity last time I checked. If it's full ring the most you can lose per 100 hands is 100x100 or 10,000 bb's. The most you could win is 80,000 bb's
A primer on the statistics behind variance Quote
04-14-2008 , 07:41 AM
Quote:
Originally Posted by corpcd
Are you kidding?

So the term B&M methodology is a "colloquial sentence", or something like that :-)

Then, anyone knows how the std. dev. is calculated?

This should be:

N hands.
gi=gain in the i-hand.

sd=sqrt[sum(gi-mg)2/N]

mg=mean gain.

But the std. dev. used by PT is something related to 100 hands gains. I supose they add up 100 by 100 hands gains for income to be stable...

N hands
N/100 sets of 100-hands
gj=sum of all the gains of the set j. gj=sum(gi){i=100(j-1)+1 to 100j}

sd_poker_tracker=std.dev. of the gj variable.

żAm i right?
corp
Check out www.pokertracker.com

They have a forum. I'm sure it has probably been discussed how this and everything else in the world is calculated. Please report back with findings.
A primer on the statistics behind variance Quote
04-14-2008 , 08:15 AM
Quote:
Originally Posted by Mike Kelley
Why would you though? When there is a clear limit in a table stakes game that you can win or lose in 100 hands. If it's HU the most you can win is 100x100bb's or 10,000 bb's which is a long ways from infinity last time I checked. If it's full ring the most you can lose per 100 hands is 100x100 or 10,000 bb's. The most you could win is 80,000 bb's
Because the normal distribution is well understood. If variables are normally distributed then we can make perfect predictions about the likelihoods of various events; moreover, even when variables aren't normally distributed, for some types of distributions, the central limit theorem assures us that the mistakes we make by using a normal approximation become arbitrarily small as we increase our sample size.

Incidentally, if our winrate were normally distributed with average 0 and SD per 100 hands of 35 ptBBs, then winning more than winning more than 80,000 bbs in 100 hands has a mind-boggling teeny probability. Without taking the time to actually calculate it, I would estimate this probability to be approximately 1 in 10^(300,000). The flaw with approximating poker winnings as normally distributed is not to a significant degree introduced by allowing arbitrarily large winrates. With the above approximation, winning 560 bbs in a 100 hand sample would occur with probability around 1 in 1.5 quintillion (if you played a hand once a second since the universe began, you would have expected this to happen about twice) - the normal approximation underestimates the likelihood of extreme events, not overestimates (by extreme, I mean things in the rare but possible region, when you get to large enough variations that the actual probability is 0 the approximation still indicates that these events are possible, but so rare as to not effect anything).
A primer on the statistics behind variance Quote
04-14-2008 , 01:02 PM
Interesting. Thank you.
A primer on the statistics behind variance Quote
04-14-2008 , 05:06 PM
I think there is a fundamental misunderstanding here by some posters--even though others got it right, let me try to rephrase this.

Most certainly it's true that the random variable, let's call it X, that describes the outcome of a single game is not normal (e.g., if we're not playing HU, the maximum loss is your buy-in, but the maximum profit is larger than the buy-in, assuming everybody has equal stacks; so, unlike a normal, X's distribution is asymmetric).

However, that doesn't matter for the winrate at all, if the winrate is calculated as the average win/loss over a sequence of realizations of that random variable X (say, over 100 realizations). Asymptotic theory tells us that those averages over samples of size 100 are approximately normal, even if X is not, that's the beauty of it.

So that's not a problem. However, the fact that each game is different (position, your opponents, possibly even the same opponents endogenously switching their strategy in response to your actions), we potentially cannot observe more than one realization of X; the next game likely a different random variable.

For practical purposes, however, I don't think that matters like these are terribly important for several reasons:

1.) Yes, position matters a LOT in a single game but there are only 9 different positions. Without trying to make a precise statistical argument here, think about doing the OPs experiment, done separately for each position. Well, if each of those stats is approx. normal, I would wildly guess that there is a theorem out there telling us that the average over those 9 sample stats is approx. normal as well.

2.) At the micros, there is a constant influx of new players (and exit of old players), which may be one of several types, but, in practice, there are only so many non-trivially distinct types. So my hunch is the fact that opponents are changing will average out as well. Also, my guess would be that your endogenous response to the kind of table you are facing (loose, tight, etc.) and other players' endogenous response to your play should average out over a number of sessions.

3.) Yeah, maybe over the first 20K or 50K hands or so you play (not sure about a reasonable number here), the changes in your own strategy should probably be visible. But after that (caveat: I am a total n00b), changes in your strategy should be pretty much low-frequency and I doubt it'll matter much if other factors are held constant (e.g., we don't move up levels)

Now, you could argue, due to these factors we need huge samples. I doubt it. Yes, it's true that we need quite a few more observations than if we were to observe realizations of X repeatedly, but nowhere near the 30k-50k that ppl here and the literature are suggesting.

So are they wrong? No, of course not. What is left then? I would argue, the *decisive* factor, and I think one poster above mentioned it, is the fact that when playing Hold'em, we have a large number of hands that are of second order for the winrate (because we fold those hands early). On the other hand, a good part of our winrate and its SD are driven by tail events (doubling up, stacking off,...). From all the econometrics I know, it is usually *much* harder to estimate using data from a distribution with fat tails. Roughly speaking, you need a lot more data before approximations from asymptotic theory become useful. All my intuition tells me that this is what necessitates the large sample sizes we need.

I'm just a beginner, just my naive take here. Experts, correct me if I'm wrong...
A primer on the statistics behind variance Quote
04-15-2008 , 02:31 PM
Quote:
Originally Posted by Mike Kelley
Why would you though? When there is a clear limit in a table stakes game that you can win or lose in 100 hands.
In 100 hands of Texas Holdem NL, there is no gain limit.

If you want to be exact, then, if you double your stack in every hand, then there is a maximum on 2^100 BB in a 100 hands row, assuming you beat a single villain each time.

2^100=1.267651e+30, which is a high amount, higher than expected for a normal-modelled variable.

corp
A primer on the statistics behind variance Quote
04-15-2008 , 02:32 PM
Quote:
Originally Posted by Mike Kelley
Check out www.pokertracker.com

They have a forum. I'm sure it has probably been discussed how this and everything else in the world is calculated. Please report back with findings.
I checked, I found no explanation on it...

If you have the link, I'd appreciate to share it with me.

corp
A primer on the statistics behind variance Quote
04-15-2008 , 02:35 PM
holdem, how do you interpret the SD for 100 hands?

The SD of a 100 hand sample only OR
The SD of gains in 100 hands, using all the data??

corp
A primer on the statistics behind variance Quote
04-15-2008 , 04:42 PM
Quote:
Originally Posted by corpcd
In 100 hands of Texas Holdem NL, there is no gain limit.

If you want to be exact, then, if you double your stack in every hand, then there is a maximum on 2^100 BB in a 100 hands row, assuming you beat a single villain each time.

2^100=1.267651e+30, which is a high amount, higher than expected for a normal-modelled variable.

corp
Pretty dense. You can't double your stack every time because your opponent cannot rebuy for more than 100bbs
A primer on the statistics behind variance Quote

      
m