Two Plus Two Publishing LLC Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > General Gambling > Probability

Notices

Probability Discussions of probability theory

Reply
 
Thread Tools Display Modes
Old 07-31-2012, 06:33 AM   #16
adept
 
MApoker's Avatar
 
Join Date: Jun 2005
Location: Playin' It Smart
Posts: 742
Re: Survey Probabilities

OK, I get your point. I don't think the error in my approximation is quite as horrendous as you do, but whatever.

In any case, I finally crunched the numbers, and I worked out the proper solution using the multinomial approach, which properly accounts for the covariance. Here it is, for the benefit of the OP:

As explained above, we want to find Var(G3 - G2).

By the properties of variance, we know that for two dependent random variables G3 and G2:

Var(G3 - G2) = Var(G3) + Var(G2) - 2*Cov(G2,G3).

The variance calculations are just like those from the binomial formula, which OP knows:

Var(G2) = (.21*(1-.21))/100 = 0.001659
Var(G3) = (.12*(1-.12))/100 = 0.001056

Now for the covariance, using the multinomial formula we get:

Cov(G2,G3) = (-1*p1*p2)/n = -1*(.21*.12)/100 = -0.000252

Now:

Var(G3) + Var(G2) - 2*Cov(G2,G3) = 0.001659 + 0.001056 + 2*0.000252 = 0.003219.

Now to get SE(G3 - G2), I just take the square root = sqrt(0.003219) = 0.05673623.

Finally, I use the normal approximation to get the Pr((G3-G2) < -.095). Using the calculator here, I plug in a mean of 0, an SD of 0.05673623, and find the area below -0.095 = 4.75%.


Voila, that's the way to do it using the proper (multinomial) approach.

I've double-checked all this with my own simulations.

So... Hopefully this will be BruceZ approved!

Last edited by MApoker; 07-31-2012 at 06:46 AM.
MApoker is offline   Reply With Quote
Old 07-31-2012, 09:06 AM   #17
Carpal \'Tunnel
 
BruceZ's Avatar
 
Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
Originally Posted by MApoker View Post
OK, I get your point. I don't think the error in my approximation is quite as horrendous as you do, but whatever.

In any case, I finally crunched the numbers, and I worked out the proper solution using the multinomial approach, which properly accounts for the covariance. Here it is, for the benefit of the OP:

As explained above, we want to find Var(G3 - G2).

By the properties of variance, we know that for two dependent random variables G3 and G2:

Var(G3 - G2) = Var(G3) + Var(G2) - 2*Cov(G2,G3).

The variance calculations are just like those from the binomial formula, which OP knows:

Var(G2) = (.21*(1-.21))/100 = 0.001659
Var(G3) = (.12*(1-.12))/100 = 0.001056

Now for the covariance, using the multinomial formula we get:

Cov(G2,G3) = (-1*p1*p2)/n = -1*(.21*.12)/100 = -0.000252

Now:

Var(G3) + Var(G2) - 2*Cov(G2,G3) = 0.001659 + 0.001056 + 2*0.000252 = 0.003219.

Now to get SE(G3 - G2), I just take the square root = sqrt(0.003219) = 0.05673623.

Finally, I use the normal approximation to get the Pr((G3-G2) < -.095). Using the calculator here, I plug in a mean of 0, an SD of 0.05673623, and find the area below -0.095 = 4.75%.


Voila, that's the way to do it using the proper (multinomial) approach.

I've double-checked all this with my own simulations.

So... Hopefully this will be BruceZ approved!
That's the way to do it, and the OP should ignore all previous approximations. The above takes hardly any more computations, and it is very accurate. Actually, there seems to be a bug with that normal distribution calculator because I get 4.70% in Excel using exactly the same input numbers, and that matches the simulation value too.

I also misinterpreted my approximation. When I said that I was adding 2*1 for the covariance, that was close to the right number to add, but that's not because G1+G2 and G1+G3 are highly correlated. Perfectly correlated would be a correlation of 1. This would be a covariance of -1 which would mean that they are slightly negatively correlated just as G2 and G3 are slightly negatively correlated. I'm in different units from you because I'm computing means and standard deviations in students, not percentages, so the variances are 100p(1-p). So when I say that I have a covariance of -1, that's -1/10000 for you. When I said to use 0.5 and -9, for you that would be 0.005 and -0.09. Anyway, disregard the whole discussion about assuming perfect correlation.

Last edited by BruceZ; 07-31-2012 at 11:51 AM.
BruceZ is offline   Reply With Quote
Old 07-31-2012, 01:34 PM   #18
adept
 
MApoker's Avatar
 
Join Date: Jun 2005
Location: Playin' It Smart
Posts: 742
Re: Survey Probabilities

You're right about the bug in that calculator... sheesh. If I'd run my simulations with a large enough N I would have spotted that too.

Any other error can be attributed to the normal approximation, including the fact that we're trying to approximate a discrete distribution with a continuous one.

OP -- if you want the really, truly exact answer, you can do it by using the probability mass function for the multinomial distribution, and calculating the probability of each outcome wherein G3 > G2.

For example, for G1=65%, G2 = 10%, G3 = 20% and G4 = 5%, then you go to this calculator, plug in 4 for then number of outcomes, plug in 65, 10, 20, and 5 for the frequencies, and plug in .65, .21, .12 and .02 for the probabilities. The output will give you the probability of that set of outcomes. (Actually, you'll need to find a calculator with more than five significant digits, since that one simply shows it as 0.00000.)

Then you add up these probabilities for ALL possible combinations such that G3 > G2. Obviously, this requires calculating the probability of a buttload of combinations, which is why we usually stick to the normal approximation. (Or you could use R to calculate it with a for-loop.)

* Adding: You can reduce the number of computations by combining G1 and G4, and using 3 as the number of outcomes, but I don't want to confuse you too much more!

Last edited by MApoker; 07-31-2012 at 01:44 PM.
MApoker is offline   Reply With Quote
Old 07-31-2012, 04:55 PM   #19
adept
 
MApoker's Avatar
 
Join Date: Jun 2005
Location: Playin' It Smart
Posts: 742
Re: Survey Probabilities

One more post for the true stats nerds amongst you:

Let's drop the assumption that we're drawing our sample with replacement, and let's make the sample size n=100 without replacement from a population size N=300.

Now we need to resort to the multivariate hypergeometric distribution as referenced in the footnote of my first post.

Exercise:

(1) Calculate Var(G3-G2), and use the normal approximation to estimate Pr(G3>G2). Q: Is the normal approximation accurate here? Can you show why or why not?

(2) Using R, calculate the exact value of Pr(G3>G2) based on the probability mass function for the multivariate hypergeometric distribution. Confirm your result with Monte Carlo simulations.

Extra credit: Explain the connection between the Pythagorean Theorem and the calculation of the standard error of the difference of two independent random variables. Then extend your explanation to explain the calculation for the difference of two dependent random variables. (Hint.)

Last edited by MApoker; 07-31-2012 at 05:15 PM.
MApoker is offline   Reply With Quote
Old 07-31-2012, 05:46 PM   #20
Carpal \'Tunnel
 
BruceZ's Avatar
 
Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
Originally Posted by MApoker View Post
You're right about the bug in that calculator... sheesh. If I'd run my simulations with a large enough N I would have spotted that too.
I told them about it. Back in March someone else said that their normal distribution calculator was completely wrong, and the admin replied that it was fixed.
BruceZ is offline   Reply With Quote
Old 07-31-2012, 07:25 PM   #21
Carpal \'Tunnel
 
BruceZ's Avatar
 
Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

I see what's wrong with that calculator. It only uses 2 decimal places for (x-u)/sigma. It's what you would get if you used a printed table of the standard normal distribution where the entries for x only have 2 decimal places, and you didn't do any interpolation. In our case (x-u)/sigma is 1.674421884... They just used 1.67 which gives 4.75%. It doesn't change until you give it 1.675 which they round to 1.68 and then it gives 4.65%. If you give it 1.67499999 it still gives 4.75%. If they had done linear interpolation, they would have gotten 4.70%.

Last edited by BruceZ; 08-04-2012 at 11:53 AM. Reason: Revised linear interpolation
BruceZ is offline   Reply With Quote
Old 08-04-2012, 11:57 AM   #22
Carpal \'Tunnel
 
BruceZ's Avatar
 
Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
Originally Posted by BruceZ View Post
I see what's wrong with that calculator. It only uses 2 decimal places for (x-u)/sigma. It's what you would get if you used a printed table of the standard normal distribution where the entries for x only have 2 decimal places, and you didn't do any interpolation. In our case (x-u)/sigma is 1.674421884... They just used 1.67 which gives 4.75%. It doesn't change until you give it 1.675 which they round to 1.68 and then it gives 4.65%. If you give it 1.67499999 it still gives 4.75%. If they had done linear interpolation, they would have gotten 4.70%.
This is a revised paragraph. To fix this they only need to do linear interpolation between the values for 1.67 and 1.68, and in fact that would even be accurate in the 0.001% place.
BruceZ is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



All times are GMT -4. The time now is 12:37 PM.


Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.6.0 ©2011, Crawlability, Inc.
Copyright © 2008-2010, Two Plus Two Interactive