Open Side Menu Go to the Top
Register
Reliability of HUD stats Reliability of HUD stats

03-06-2012 , 06:06 AM
First of all, let me say that I'm not very good at statistics and probability theory etc etc.

I have a two part question. Let's say we have a stat like VPIP, so we have a bunch of 0/1 boolean values for each time when the player "could VPIP", the value is 1 when he "did VPIP".

First of all, from a math perspective, how would one figure out the reliability (like a percentage 0..100) of a HUD stat given the input boolean values? Some tools (PokerSleuth) also calculate a interval, like 34+-12 would mean the VPIP (probably) lies between 22..46, how is this done?

Now, what if we also had some prior information. For example, we know the average VPIP for the global player pool is X. Could this be used to improve the reliability calculation?
Reliability of HUD stats Quote
03-06-2012 , 10:29 AM
Moved to Probability Forum under general gambling; a more appropriate place for the OP.

-Zeno
Reliability of HUD stats Quote
03-06-2012 , 11:16 AM
Let's say you have a set of 100 0s and 1s for VPIP as you suggest and the mean of this set is .25 (25 1s, 75 0s).

Now let's say we want to have 95% confidence about this person's true VPIP. First, we need to estimate the error of the measurement. The error of the measurement (sometimes called the standard error) is given by SD / sqrt(N), where N is the total number of data points and SD is the standard deviation.

Because our data are dichotomous, we can estimate the standard deviation as follows:

Variance = p*(1-p), where p is the estimate probability (.25 in this case). So we would get a variance of .25*(1-.25) = .1875.

The standard deviation is the square root of the variance so sqrt(.1875) = .433013.

The standard error is then .433013 / sqrt(100) = .0433013.

We can compute a 95% confidence interval by looking up the two-tailed value from the t-distribution with an alpha (1 - Confidence interval) of .05. To do this, in excel type =tinv(.05, 98), where 98 is N - 2. This gives a result of approximately 1.98 (note that in the absence of the ability to lookup such a number, many people use either 1.96 or 2.00 as approximations).

Our margin of error for a 95% confidence interval is then 1.98 * .0433013 = .08593.

So if our person's true VPIP is .25 (which is our best estimate thus far), we can be 95% confident that he will have a VPIP between .25 - .08593 = .16407 and .25 + .08593 = .33593 95% of the time for each set of 100 hands that he plays.

Or more directly, if a person with a true VPIP of .25 plays 100 hands, we would expect that 95% of the time his VPIP would be between .16407 and .33593 in those 100 hands. This is almost certainly the calculation your software is using (or trying to use).

Now...turning to your second question where we are in the advantaged position of knowing the prior population distribution of all VPIPs, we likely want to do something different. In this case, we change our question slightly from, "What is the probability of a person with a true VPIP of .25 playing X hands with VPIP of Y?" to "What is the probability this person's true VPIP is .25 given the X hands I have seen?"

This requires using Bayes' theorem, which requires having a prior distribution of results. One of the clearer and better discussions of how you would apply this to poker data is here: http://archives1.twoplustwo.com/show...0#Post10933141

Finally, we can ask another question. What is my opponent's most likely VPIP given the information I have?

Let's say that we happen to know the population VPIP is .30 with a standard deviation of .10 and we have seen our opponent have a VPIP of .25 over 100 hands. As I showed, the standard deviation of that measurement is .433013. We can use this information to make a new estimate by doing the following:

Numerator = .30 / .102 + .25 / .4330132 = 31.333
Denominator = 1 / .102 + 1 / .4330132 = 105.333

31.333 / 105.333 = .297468.

This method of "shrinking" moves our observed value closer to the population value depending on how well we have measured the observed value (i.e. sample size) and known population variation. Note that in this case we moved a lot closer to the population value (almost all the way there).

I discuss using this method in an MTT context here: http://forumserver.twoplustwo.com/25...cture-1161363/ and Aaron Brown offers his thoughts on it.
Reliability of HUD stats Quote
03-06-2012 , 11:39 AM
TYVM for the answer! I'll go through it later and see if there's anything I want to add, but from skimming through it it looks like you covered it all. Thanks again!

Quote:
Originally Posted by Sherman
Or more directly, if a person with a true VPIP of .25 plays 100 hands, we would expect that 95% of the time his VPIP would be between .16407 and .33593 in those 100 hands. This is almost certainly the calculation your software is using (or trying to use).
I noticed just now PokerSleuth does use Bayes' theorem for this actually (source).
Reliability of HUD stats Quote
03-06-2012 , 02:28 PM
Quote:
Originally Posted by idonteven
I noticed just now PokerSleuth does use Bayes' theorem for this actually (source).
In that case, they probably follow something similar to the archived uNL post I linked. Pretty cool that they do that. Actually, it shouldn't be that hard for other software (i.e. HEM, PT3) to include that as well.
Reliability of HUD stats Quote
03-08-2012 , 09:38 AM
Quote:
Originally Posted by Sherman
Now...turning to your second question where we are in the advantaged position of knowing the prior population distribution of all VPIPs, we likely want to do something different. In this case, we change our question slightly from, "What is the probability of a person with a true VPIP of .25 playing X hands with VPIP of Y?" to "What is the probability this person's true VPIP is .25 given the X hands I have seen?"

This requires using Bayes' theorem, which requires having a prior distribution of results. One of the clearer and better discussions of how you would apply this to poker data is here: http://archives1.twoplustwo.com/show...0#Post10933141
This was a good read, thanks. If I understood correctly, the generalized form is this:
P(plays did/could when his true VPIP is V) = (could choose did)*[ V^did * (1 - V)^(could-did) ]
(where "choose" is the binomial coefficient)

However, in his examples he only uses two groups of players (loose and tight). How would one generalize this to more groups (with varying VPIP ranges and relative frequencies)? I'm guessing the denominator would be the sum of the probabilities of each group (multiplied by the frequency)?

Moreover, what would be the best way to construct those groups from a list of players and their VPIPs? Should we divide them to, say, 5 groups, each with the same number of players in them, or divide evenly based on the VPIP range (for 5 groups that would be 0-20, 20-40, 40-60, 60-80 and 80-100), with varying number of people in each one.

I think I'm going to run some tests with Excel, fortunately HEM allows exporting the data from the Players view in CSV format for stuff like this.

To clarify once more, here's the question I'm asking: Given we know the VPIPs of the entire player population, what is the most likely true VPIP of the observed player given we have seen him play "did_vpip" out of "could_vpip" hands so far?

Quote:
Finally, we can ask another question. What is my opponent's most likely VPIP given the information I have?

Let's say that we happen to know the population VPIP is .30 with a standard deviation of .10 and we have seen our opponent have a VPIP of .25 over 100 hands. As I showed, the standard deviation of that measurement is .433013. We can use this information to make a new estimate by doing the following:

Numerator = .30 / .102 + .25 / .4330132 = 31.333
Denominator = 1 / .102 + 1 / .4330132 = 105.333

31.333 / 105.333 = .297468.

This method of "shrinking" moves our observed value closer to the population value depending on how well we have measured the observed value (i.e. sample size) and known population variation. Note that in this case we moved a lot closer to the population value (almost all the way there).
This I don't understand, why would the value move towards the population value if the sample size grows? Also I don't see how this accounts for the sample size because it wasn't used when calculating the standard deviation.
Reliability of HUD stats Quote
03-08-2012 , 09:58 AM
Quote:
Originally Posted by idonteven
This I don't understand, why would the value move towards the population value if the sample size grows? Also I don't see how this accounts for the sample size because it wasn't used when calculating the standard deviation.
I'll let someone else chime in (hopefully) for your other question because I don't use that Bayesian approach very often. I am aware of it and understand the logic of the equation, but I think the answer to that question does require you to determine "bins" or "groups" of VPIPs for your population of players to fall into...I'm not sure though.

Anyhow, as for this question, the formula will actually move you closer to your own sample as the sample size gets larger. This is because your own variance (or SD) will get smaller because the formula for Variance (or SD) has N in the denominator. So as N (the sample size) gets larger the variance (SD) gets smaller. If you make your own variance smaller in the equation I gave above, you will see that there is less movement towards the population mean.

For example, if your sample SD is equal to the population SD (in this case a made up number of .10), your "shrunken" estimate will fall exactly between the population mean and the sample mean.
Reliability of HUD stats Quote
03-08-2012 , 04:35 PM
Quote:
Originally Posted by Sherman
Anyhow, as for this question, the formula will actually move you closer to your own sample as the sample size gets larger. This is because your own variance (or SD) will get smaller because the formula for Variance (or SD) has N in the denominator. So as N (the sample size) gets larger the variance (SD) gets smaller. If you make your own variance smaller in the equation I gave above, you will see that there is less movement towards the population mean.

For example, if your sample SD is equal to the population SD (in this case a made up number of .10), your "shrunken" estimate will fall exactly between the population mean and the sample mean.
My "grudge" was that the variance (SD) was calculated using p*(1-p), which doesn't take the sample size in account. So I assume when you say the above stuff you're talking about the actual SD, not this estimate?
Reliability of HUD stats Quote
03-09-2012 , 09:01 AM
Quote:
Originally Posted by idonteven
My "grudge" was that the variance (SD) was calculated using p*(1-p), which doesn't take the sample size in account. So I assume when you say the above stuff you're talking about the actual SD, not this estimate?
Oh, I see. The issue here is that we have a binomial outcome, so I gave you the formula for the SD using a shortcut. Here is the more general formula:

Variance = sum[ (X - M)2] / N
SD = sqrt(Variance)

Some formulas have N-1 in the denominator of the Variance equation (used when we have a sample and want to estimate the population variance, if we have the population we just use N).

In that equation, X is a single observed score, M is the mean of all scores, and N is the number of scores.

As it turns out though, the p*(1-p) equation gives the same result as the longer equation above when you are dealing with a binomial outcome. (Although not quite the same as the N-1 in the denominator equation).

In any case, if we think of standard deviation as that equation, you can see that indeed sample size is taken into account.
Reliability of HUD stats Quote
03-13-2012 , 03:28 PM
I made a couple of mistakes in here and rather than try to identify them and correct them, I am just going to write the right thing here.

Skill = Numerator / Denominator

Numerator = PopMean / PopSD2 + ObsMean / ObsSE2
Denominator = 1 / PopSD2 + 1 / ObsSE2

Where PopMean is the known population mean and PopSD is the known population standard deviation. ObsMean is the mean observed from the data so far and ObsSE is the standard error of the observed data. The standard error is defined as the observed standard deviation (SD) divided by the square root of the sample size (N).

This method properly takes sample size into account.
Reliability of HUD stats Quote

      
m