Two Plus Two Poker Forums Survey Probabilities
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read Video Directory TwoPlusTwo.com

 Notices

 Probability Discussions of probability theory

 Thread Tools Display Modes
 07-31-2012, 06:33 AM #16 adept     Join Date: Jun 2005 Location: Playin' It Smart Posts: 742 Re: Survey Probabilities OK, I get your point. I don't think the error in my approximation is quite as horrendous as you do, but whatever. In any case, I finally crunched the numbers, and I worked out the proper solution using the multinomial approach, which properly accounts for the covariance. Here it is, for the benefit of the OP: As explained above, we want to find Var(G3 - G2). By the properties of variance, we know that for two dependent random variables G3 and G2: Var(G3 - G2) = Var(G3) + Var(G2) - 2*Cov(G2,G3). The variance calculations are just like those from the binomial formula, which OP knows: Var(G2) = (.21*(1-.21))/100 = 0.001659 Var(G3) = (.12*(1-.12))/100 = 0.001056 Now for the covariance, using the multinomial formula we get: Cov(G2,G3) = (-1*p1*p2)/n = -1*(.21*.12)/100 = -0.000252 Now: Var(G3) + Var(G2) - 2*Cov(G2,G3) = 0.001659 + 0.001056 + 2*0.000252 = 0.003219. Now to get SE(G3 - G2), I just take the square root = sqrt(0.003219) = 0.05673623. Finally, I use the normal approximation to get the Pr((G3-G2) < -.095). Using the calculator here, I plug in a mean of 0, an SD of 0.05673623, and find the area below -0.095 = 4.75%. Voila, that's the way to do it using the proper (multinomial) approach. I've double-checked all this with my own simulations. So... Hopefully this will be BruceZ approved! Last edited by MApoker; 07-31-2012 at 06:46 AM.
07-31-2012, 09:06 AM   #17
Carpal \'Tunnel

Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
 Originally Posted by MApoker OK, I get your point. I don't think the error in my approximation is quite as horrendous as you do, but whatever. In any case, I finally crunched the numbers, and I worked out the proper solution using the multinomial approach, which properly accounts for the covariance. Here it is, for the benefit of the OP: As explained above, we want to find Var(G3 - G2). By the properties of variance, we know that for two dependent random variables G3 and G2: Var(G3 - G2) = Var(G3) + Var(G2) - 2*Cov(G2,G3). The variance calculations are just like those from the binomial formula, which OP knows: Var(G2) = (.21*(1-.21))/100 = 0.001659 Var(G3) = (.12*(1-.12))/100 = 0.001056 Now for the covariance, using the multinomial formula we get: Cov(G2,G3) = (-1*p1*p2)/n = -1*(.21*.12)/100 = -0.000252 Now: Var(G3) + Var(G2) - 2*Cov(G2,G3) = 0.001659 + 0.001056 + 2*0.000252 = 0.003219. Now to get SE(G3 - G2), I just take the square root = sqrt(0.003219) = 0.05673623. Finally, I use the normal approximation to get the Pr((G3-G2) < -.095). Using the calculator here, I plug in a mean of 0, an SD of 0.05673623, and find the area below -0.095 = 4.75%. Voila, that's the way to do it using the proper (multinomial) approach. I've double-checked all this with my own simulations. So... Hopefully this will be BruceZ approved!
That's the way to do it, and the OP should ignore all previous approximations. The above takes hardly any more computations, and it is very accurate. Actually, there seems to be a bug with that normal distribution calculator because I get 4.70% in Excel using exactly the same input numbers, and that matches the simulation value too.

I also misinterpreted my approximation. When I said that I was adding 2*1 for the covariance, that was close to the right number to add, but that's not because G1+G2 and G1+G3 are highly correlated. Perfectly correlated would be a correlation of 1. This would be a covariance of -1 which would mean that they are slightly negatively correlated just as G2 and G3 are slightly negatively correlated. I'm in different units from you because I'm computing means and standard deviations in students, not percentages, so the variances are 100p(1-p). So when I say that I have a covariance of -1, that's -1/10000 for you. When I said to use 0.5 and -9, for you that would be 0.005 and -0.09. Anyway, disregard the whole discussion about assuming perfect correlation.

Last edited by BruceZ; 07-31-2012 at 11:51 AM.

 07-31-2012, 01:34 PM #18 adept     Join Date: Jun 2005 Location: Playin' It Smart Posts: 742 Re: Survey Probabilities You're right about the bug in that calculator... sheesh. If I'd run my simulations with a large enough N I would have spotted that too. Any other error can be attributed to the normal approximation, including the fact that we're trying to approximate a discrete distribution with a continuous one. OP -- if you want the really, truly exact answer, you can do it by using the probability mass function for the multinomial distribution, and calculating the probability of each outcome wherein G3 > G2. For example, for G1=65%, G2 = 10%, G3 = 20% and G4 = 5%, then you go to this calculator, plug in 4 for then number of outcomes, plug in 65, 10, 20, and 5 for the frequencies, and plug in .65, .21, .12 and .02 for the probabilities. The output will give you the probability of that set of outcomes. (Actually, you'll need to find a calculator with more than five significant digits, since that one simply shows it as 0.00000.) Then you add up these probabilities for ALL possible combinations such that G3 > G2. Obviously, this requires calculating the probability of a buttload of combinations, which is why we usually stick to the normal approximation. (Or you could use R to calculate it with a for-loop.) * Adding: You can reduce the number of computations by combining G1 and G4, and using 3 as the number of outcomes, but I don't want to confuse you too much more! Last edited by MApoker; 07-31-2012 at 01:44 PM.
 07-31-2012, 04:55 PM #19 adept     Join Date: Jun 2005 Location: Playin' It Smart Posts: 742 Re: Survey Probabilities One more post for the true stats nerds amongst you: Let's drop the assumption that we're drawing our sample with replacement, and let's make the sample size n=100 without replacement from a population size N=300. Now we need to resort to the multivariate hypergeometric distribution as referenced in the footnote of my first post. Exercise: (1) Calculate Var(G3-G2), and use the normal approximation to estimate Pr(G3>G2). Q: Is the normal approximation accurate here? Can you show why or why not? (2) Using R, calculate the exact value of Pr(G3>G2) based on the probability mass function for the multivariate hypergeometric distribution. Confirm your result with Monte Carlo simulations. Extra credit: Explain the connection between the Pythagorean Theorem and the calculation of the standard error of the difference of two independent random variables. Then extend your explanation to explain the calculation for the difference of two dependent random variables. (Hint.) Last edited by MApoker; 07-31-2012 at 05:15 PM.
07-31-2012, 05:46 PM   #20
Carpal \'Tunnel

Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
 Originally Posted by MApoker You're right about the bug in that calculator... sheesh. If I'd run my simulations with a large enough N I would have spotted that too.
I told them about it. Back in March someone else said that their normal distribution calculator was completely wrong, and the admin replied that it was fixed.

 07-31-2012, 07:25 PM #21 Carpal \'Tunnel     Join Date: Sep 2002 Posts: 8,895 Re: Survey Probabilities I see what's wrong with that calculator. It only uses 2 decimal places for (x-u)/sigma. It's what you would get if you used a printed table of the standard normal distribution where the entries for x only have 2 decimal places, and you didn't do any interpolation. In our case (x-u)/sigma is 1.674421884... They just used 1.67 which gives 4.75%. It doesn't change until you give it 1.675 which they round to 1.68 and then it gives 4.65%. If you give it 1.67499999 it still gives 4.75%. If they had done linear interpolation, they would have gotten 4.70%. Last edited by BruceZ; 08-04-2012 at 11:53 AM. Reason: Revised linear interpolation
08-04-2012, 11:57 AM   #22
Carpal \'Tunnel

Join Date: Sep 2002
Posts: 8,895
Re: Survey Probabilities

Quote:
 Originally Posted by BruceZ I see what's wrong with that calculator. It only uses 2 decimal places for (x-u)/sigma. It's what you would get if you used a printed table of the standard normal distribution where the entries for x only have 2 decimal places, and you didn't do any interpolation. In our case (x-u)/sigma is 1.674421884... They just used 1.67 which gives 4.75%. It doesn't change until you give it 1.675 which they round to 1.68 and then it gives 4.65%. If you give it 1.67499999 it still gives 4.75%. If they had done linear interpolation, they would have gotten 4.70%.
This is a revised paragraph. To fix this they only need to do linear interpolation between the values for 1.67 and 1.68, and in fact that would even be accurate in the 0.001% place.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are Off Pingbacks are Off Refbacks are Off Forum Rules

All times are GMT -4. The time now is 12:37 PM.

 Contact Us - Two Plus Two Publishing LLC - Privacy Statement - Top