Disregard previous question. I got it straightened out and revised this yet again.
Quote:
Originally Posted by David Sklansky
You can estimate the answer this way:
A particular country is even money to be chosen after about 135 days. The chances that a country will be oicked after 135x8=1080 tries is 255/256. 70% of 255 is 178. So in 1080 tries its a little better than even money that one of the 195 countries wil be unpicked.
1080 days gives a probability of 53.5% that some country will remain unpicked. The actual median is 1099. If we had 178 countries, the median would be 987, and your method gives 984 (not what I said it gave before).
David's 70% comes from ln(2) which is about 0.693. We know that the probability of not getting a particular country in N days is exactly (194/195)^N. That is about 1/2 when N is 135. It is 1/256 when N is 8*135 = 1080. That's still for 1 country, so if we treat all the countries as independent, (255/256)^N = 0.5 when N = 177. He got 178 by multiplying 256 by 0.7,and that gives 49.8%. The 177 actually comes from multiplying 255 by ln(2), or 256 by ln(2). Anyway, the basis for this estimate is
(255/256)^N = 0.5
(256/255)^N = 2
N = ln(2) / ln(256/255)
= ln(2) / ln(1 + 1/255)
=~ ln(2) / (1/255)
=~ ln(2) * 255.
=~ 0.693 * 255 =~ 176.7 or 177
=~ 0.7 * 255 = 178.5
Since we have 195 countries and not 178 countries, it will be a little higher than 50% to not have all countries in 1080 days, and it is in fact 53.5%.
Note that we could have estimated the median as 1103 (off by just 4) by solving for N in just this 3 term formula:
195*(194/195)^N - C(195,2)*(193/195)^N + C(195,3)*(192/195)^N = 0.5
But David's estimate doesn't require a calculator.
I had some stuff here before about how this is similar to the way I estimate the median for streaks without a streak calculator, and I had a demonstration of how accurate that is, and some R scripts to do that. But David's is a little different because he is applying the method a second time over the countries, plus you can estimate it with nothing but multiplications. For the streak calculation the probabilities are usually too big to approximate the log that way. What I said about the streak calculation wasn't anything I hadn't posted before, so it's gone now.
Last edited by BruceZ; 12-17-2012 at 08:49 PM.