Open Side Menu Go to the Top
Register
Ask a probabilist Ask a probabilist

07-14-2010 , 12:22 PM
Quote:
Originally Posted by Enrique
Wouldn't the expected number for the A_n's when n > 1 be a bit different than 479.5? I guess that is a good approximation though. So then the first one takes urn 1 and the rest take urn 2. Would the probabilitists have an edge with this technique? Since the first player has expected value .5 and the rest have .49, whereas the other team all of them have EV .49.
If I was the first player, and if I drew 479 white stones and 520 black stones, and if, on the basis of only this information, I had to unilaterally decide which urn my teammates and I would choose, then yes, Urn 1 for me and Urn 2 for everyone else would be the choice that maximizes our EV (or rather, our conditional EV given my information).

But if I was permitted to use my information to give my teammates conditional advice, rather than unilaterally deciding their urn, then clearly I would tell them, "If you draw 478 or 479 white stones, then choose Urn 1, but if you choose 480 stones, then choose Urn 2." If my calculations are correct, the probability that they would select a white stone, given my information, would then be 60.69%.

Incidentally, as a follow-up to my previous response to this topic, there is another, simpler principle of probability that this puzzle seems to violate: the additivity of EV. Namely, we all know that if X1, ..., Xn are random variables, then
and it does not matter whether the random variables are independent or not. For indicator variables, this means that the expected number of outcomes is the sum of the probabilities:
In the puzzle, this identity seems to become 479.5 = 500. However, making the information explicit, we see that the mathematically correct identity is
and the mathematically incorrect identity, which leads to 479.5 = 500, is
(Actually, as Enrique pointed out, this fallacious identity leads to 479.5205 = 500.)
Ask a probabilist Quote
07-14-2010 , 12:53 PM
Quote:
Originally Posted by jason1990
I have not. But I do know at least one seasoned, professional probabilist who has. If you would like to read his thoughts on it, they are here.

Thanks for the link, I just read it.

I would say the review is on balance pretty positive. He has some complaints about the book but much of them pertain to rhetoric and the overemphasis of the importance of the phenomena the author is interested in.

I was particularly interested to read this:

"I am always puzzled that writers on financial mathematics (Taleb included) tend to ignore what strikes me as the most important insight that mathematics provides. [the Kelly formula]"


I'm not sure if you have much of an interest in finance but the absence of Kelly from financial and economic literature is quite striking considering what a powerful tool it is.
Ask a probabilist Quote
07-14-2010 , 04:40 PM
Quote:
Originally Posted by PairTheBoard
The natural criteria described above would have a probabilist switch if he sees 488 white balls or less. He could then conclude the urn has no more than 489 white balls and is a strictly worse urn than the 490 white ball urn. With this criteria, if the probabilists' urn has 490 or more white balls their group average results will be the same as if they never switch. If their urn contains 488 or fewer white balls then they will all switch to the 490 white ball urn and their group average will be better than if they never switched. So the only place where the EV can be lost with this switching criteria is if their urn contains exactly 489 white balls.

...

So what happens when the probabilist urn contains exactly 489 white balls? What happens is that 48.9% of the probabilists will see 488 white balls in their 999 ball peek and switch to the 490 white ball urn where they will pick a white ball 49% of the time. However, 51.1% of the probabilists will peek at 489 white balls and by the criteria will not switch urns. All of these 51.1% of the probabilists will chose the last black ball in the urn. In the 489 white ball urn case, the EV for the group average - and for prepeek individual probabilists - will be (0.489)(.49) = 0.23861 . That's a huge reduction from the .489 EV for never switching in this case. It's huge under this criteria, for example, compared to the EV improvement from 0.48 to 0.49 in the case of a 480 white ball probabilists' urn.

Unless I'm mistaken, the above discussion shows the following strict inequality to hold:


(2) Sum{k=1...488} [ C(1000,k)(.5)^1000 * (.49 - k/1000) ] <

< 0.489 - (0.489 * 0.49)

The left hand side calculates the improvement in EV under this prior criteria for urns with 488 or fewer white balls. The right hand side calculates the loss of EV in the 489 case where 48.9% of the probabilists switch under this criteria instead of getting their certain white ball had they not switched.
I'm afraid I missed a factor in (2) above. The inequality should be:

(2) Sum{k=1...488} [ C(1000,k)(.5)^1000 * (.49 - k/1000) ] <

< C(1000,489)(.5)^1000 * [0.489 - (0.489 * 0.49)]


PairTheBoard
Ask a probabilist Quote
07-14-2010 , 05:41 PM
Quote:
Originally Posted by PairTheBoard
I think this is quite a bit different from something like the prisoner's paradox. ... Suppose, going into the experiment, they get together and agree to operate in such a way as to maximize thier results as a group.
What do you mean by "their results"? Your post suggests that you would have the probabilists maximize their expected team score. Why? I would prefer to have them maximize the probability that they beat the slots players (who are all picking from Urn 2). I would interpret the final outcome of the game as binary: either the probabilists win or they lose. (Suppose draws are decided by a coin toss, if necessary.) They should try to maximize the probability that they win.

As an individual probabilist, I might reason that for each fixed set of choices and outcomes for my teammates and the opposing team, I can only affect whether we win or lose in the cases where the scores differ by at most 1. In those cases, I maximize the probability (given what I know) that we will win by maximizing the probability that I choose a white stone, and thereby score a point. I do that by choosing from Urn 1.

Following this strategy (which amounts to every probabilist selecting from Urn 1), the probability that the probabilists will win, given the information they have prior to the game, is a little less than 64.3%. (Calculation sketch is below.)

My claim is that the probabilists can instead choose, prior to the game, to act in such a way so that the probability they will win, given the information they have prior to the game, is a little more than 67.3%. In fact, we may have them act according to your suggestion here:

Quote:
Originally Posted by PairTheBoard
The natural criteria described above would have a probabilist switch if he sees 488 white balls or less.
Let us sketch the calculations that lead to the numbers I claimed above. I will use normal approximations. To the skeptics who think the qualitative result will be reversed if the calculations are done exactly, I will leave it to them to do the exact calculations on their own machines.

Let us first consider Strategy A, which is to always draw from Urn 1. All probabilities, unless otherwise noted, will be conditioned on what is known prior to anyone drawing from Urn 1.

Let k be the number of white stones in Urn 1. Let p = k/1000. The probabilists' score will be approximately k + sU, where s2 = 1000p(1 - p) and U ~ N(0,1). The slots players' score will be approximately 490 + tV, where t2 = 1000(0.49)(0.51) and V ~ N(0,1). Note that U, V, and k are all independent.

Their score difference is therefore D = k - 490 + (s2 + t2)1/2Z, where Z ~ N(0,1). The probability the probabilists win, given k, is thus
where Φ is the cumulative distribution function of a standard normal random variable. We therefore have
According to Mathematica, this evaluates to 0.642506.

Now let us consider Strategy B, which is to draw from Urn 1 if and only if we see 489 or more white stones after drawing 999 stones from the urn. In this case, if k < 489, then all probabilists will draw from Urn 2. Under the normal approximation, the probability of a draw is zero, so the probabilists win with probability 1/2. (Or, if draws are decided by a coin toss, then the probability the probabilists win is 1/2 without any normal approximation.)

If k > 489, then all probabilists will draw from Urn 1, and P(D > 0 | k) will be the same as above.

Therefore,
According to Mathematica, this evaluates to 0.673515.
Ask a probabilist Quote
07-14-2010 , 11:46 PM
What if you consider the strategy where the first guy always draws from the first one, while the rest follow the strategy of picking the first only if they draw less than 489? It probably won't change much, but it seems the probability might change in the latter decimals.
Ask a probabilist Quote
07-15-2010 , 01:22 PM
Quote:
Originally Posted by jason1990
What do you mean by "their results"? Your post suggests that you would have the probabilists maximize their expected team score. Why? I would prefer to have them maximize the probability that they beat the slots players (who are all picking from Urn 2). I would interpret the final outcome of the game as binary: either the probabilists win or they lose. (Suppose draws are decided by a coin toss, if necessary.) They should try to maximize the probability that they win.

As an individual probabilist, I might reason that for each fixed set of choices and outcomes for my teammates and the opposing team, I can only affect whether we win or lose in the cases where the scores differ by at most 1. In those cases, I maximize the probability (given what I know) that we will win by maximizing the probability that I choose a white stone, and thereby score a point. I do that by choosing from Urn 1.

Following this strategy (which amounts to every probabilist selecting from Urn 1), the probability that the probabilists will win, given the information they have prior to the game, is a little less than 64.3%. (Calculation sketch is below.)

My claim is that the probabilists can instead choose, prior to the game, to act in such a way so that the probability they will win, given the information they have prior to the game, is a little more than 67.3%. In fact, we may have them act according to your suggestion here:


Let us sketch the calculations that lead to the numbers I claimed above. I will use normal approximations. To the skeptics who think the qualitative result will be reversed if the calculations are done exactly, I will leave it to them to do the exact calculations on their own machines.

Let us first consider Strategy A, which is to always draw from Urn 1. All probabilities, unless otherwise noted, will be conditioned on what is known prior to anyone drawing from Urn 1.

Let k be the number of white stones in Urn 1. Let p = k/1000. The probabilists' score will be approximately k + sU, where s2 = 1000p(1 - p) and U ~ N(0,1). The slots players' score will be approximately 490 + tV, where t2 = 1000(0.49)(0.51) and V ~ N(0,1). Note that U, V, and k are all independent.

Their score difference is therefore D = k - 490 + (s2 + t2)1/2Z, where Z ~ N(0,1). The probability the probabilists win, given k, is thus
where Φ is the cumulative distribution function of a standard normal random variable. We therefore have
According to Mathematica, this evaluates to 0.642506.

Now let us consider Strategy B, which is to draw from Urn 1 if and only if we see 489 or more white stones after drawing 999 stones from the urn. In this case, if k < 489, then all probabilists will draw from Urn 2. Under the normal approximation, the probability of a draw is zero, so the probabilists win with probability 1/2. (Or, if draws are decided by a coin toss, then the probability the probabilists win is 1/2 without any normal approximation.)

If k > 489, then all probabilists will draw from Urn 1, and P(D > 0 | k) will be the same as above.

Therefore,
According to Mathematica, this evaluates to 0.673515.

Cool. I hadn't thought of looking at the goal of maximizing the probability the probabilists beat the slot players rather than maximizing the expected value of the total white balls picked by the probabilists. So this is another example were EV isn't everything. By playing this strategy, even though it reduces their expected total white balls, the probabilists give their team a better chance of beating the slots players.

We see something simliar with the Kelly Criteria. If two equally advantage gamblers with equal bankrolls independently play with the goal of beating the other player after say 30 gambles, and one player played to maximize the EV of his bankroll while the other played the Kelly Criteria the Max-EV player would practically never win. He would play double or nothing each time and practically always be broke after 30 gambles, even though he maximizes the expected value of his bankroll this way. The Kelly player accepts a lower Bankroll EV in exchange for a very high probability of being ahead after 30 gambles. I wonder if Kelly maximizes the probability of beating the other gambler after a large number of gambles? If so, I don't think I've ever seen Kelly presented that way.

We also saw an example where "EV isn't everything" in a recent thread about equally advantage servers in ping pong. With standard rules does first server have an advantage in the game? Some people in that thread found it very hard to believe the fact that 1st server having a higher EV for his total points scored does not translate to a greater probability of winning the game.

Thinking about your post I've been looking at what's going on and what's making it work. I think the key is the large number of players, 1000 on each team. It's coincidental that the number of players on each team is the same as the number of balls in each urn. Since they are picking with replacement we could have a million players on each team just as easily. What happens with a large number of players is that, for each k, the distribution for total white balls picked is concentrated relatively close to it's mean. So even a small advantage for individual chances of picking a white ball produces a relatively sharp seperation between the respective delta-like Team-Total distributions and therefore a high probability for a team win.

For example with k=488. The criteria-players make a small gain in EV from 0.488 to 0.49 (times number of players). This small gain compares unfavorably to the large loss in EV for the criteria-players when k=499. But when the number of players is large enough the slots players with 0.49 individual EV will nearly always beat the non-switchers with EV 0.488. With a large enough number of players and k=488, the switchers improve their probability of winning from close to zero to 1/2. That's a huge improvement. Furthermore, unlike in the EV measure, the criteria-players lose practically nothing in winning chances when k=499. When k=499 and a large number of players, the non-switchers have close to zero chance of beating the slots players while the criteria-players have a nearly impossible, even closer to zero chance. No big loss compared to the huge gains when k=488. When k is less than 488 the effect is even greater.

So playing to beat the Slots team, with a large number of players it's vital for the probabilists to pick from an urn that's no worse than the 490 slots urn. It's well worth it for them to just concede when k=499 if they can avoid picking from an urn even slightly worse than the slots' urn when k is less than 499. That's essentially what you've done in your last inequality by leaving the k=499 case out on the right.

I'm wondering, if we denote N = number of players on each team, how large does N need to be for the probabilists to improve their winning chances by adopting the switch criteria? For N=1 we are back to maximizing EV by never switching. You've shown for N=1000 criteria-players do better. So how small can N be for criteria-play? Maybe N=2 is enough. In the case of the huge EV sink when k=499, I'm thinking as N>1 increments the criteria-players' chances of winning go quickly to zero while the non-switchers chances of winning go to zero more slowly. So things might get worse before they get better. I don't know. It also might be interesting to see what the limit is for criteria-player improvement compared to non-switchers as N goes to infinity. I guess in the limit it just boils down to binomial calculations for the Urn-1 composition.

At any rate, I think this puzzle touches on a lot of interesting concepts and intuitions. As always, the first class probability analysis you generously provide brings a remarkable ease of clarity to the situation.



PairTheBoard
Ask a probabilist Quote
07-15-2010 , 01:56 PM
Past time to edit my last post. I repeatedly refer to the case k=499 when I meant k=489.


PairTheBoard
Ask a probabilist Quote
07-15-2010 , 09:23 PM
thanks for doing this jason. I am going to be a senior at uiuc doing a math and econ degree (separate degrees). I am going to run out of college before I get to take some of the classes I would like. I am currently taking Grad Analysis which covers Lebesgue measure and such. I may have time to take Real 2 which covers abstract measure theory. With this background, what books would you recommend for stochastic processes and stochastic calculus. If you have a text you recommend for practitioners vs. pure math in both these topics, I would appreciate if you would make any comments on who the text is better for.

Thanks again for doing this thread.
Ask a probabilist Quote
07-17-2010 , 01:55 AM
Quote:
Originally Posted by myammy
With this background, what books would you recommend for stochastic processes and stochastic calculus.
Grimmett and Stirzaker for stochastic processes without measure theory, Durrett for probability with measure theory, Rogers and Williams vol II for stochastic calculus.

If you want to try your hand at stochastic calculus without a really strong probability background, then L.C. Evans has some nice lecture notes available on his website.
Ask a probabilist Quote
07-17-2010 , 06:29 AM
Consider the following type of problem. I just roughly describe it, but I'm hoping someone recognizes it as something familiar and studied.

Suppose for row vectors (or linear functionals) and (complex) numbers
, for all

what can we know about an unknown column vector
where


, for all

(If `' were `=' this would just be linear algebra.)

Here `' is meant to stand for something like `is probably approximately equal to' and we'd like to pin down to similar standards.

E.g. for some, the meaning of could be with probability at least though one could consider any other reasonable interpretations.

So we'd like to find a probably approximate solution (perhaps with bigger ). Here is some appropriate norm.

What is known about this kind of thing.

For example suppose the unknown is a (degree at most d) polynomial f, and we are given some probably approximate values , for all , then what can we say about f? To what extent can we pin down evaluations of f and coefficients of f from the given type of information.
Ask a probabilist Quote
07-19-2010 , 10:06 PM
Quote:
Originally Posted by Enrique
What if you consider the strategy where the first guy always draws from the first one, while the rest follow the strategy of picking the first only if they draw less than 489? It probably won't change much, but it seems the probability might change in the latter decimals.
To really determine the change, I think we would have to do the calculations exactly, without the normal approximations. But I agree that the change should be small.
Ask a probabilist Quote
07-19-2010 , 10:12 PM
Quote:
Originally Posted by PairTheBoard
I wonder if Kelly maximizes the probability of beating the other gambler after a large number of gambles?
Quote:
Originally Posted by PairTheBoard
I'm wondering, if we denote N = number of players on each team, how large does N need to be for the probabilists to improve their winning chances by adopting the switch criteria? For N=1 we are back to maximizing EV by never switching. You've shown for N=1000 criteria-players do better. So how small can N be for criteria-play? Maybe N=2 is enough. In the case of the huge EV sink when k=[489], I'm thinking as N>1 increments the criteria-players' chances of winning go quickly to zero while the non-switchers chances of winning go to zero more slowly. So things might get worse before they get better. I don't know. It also might be interesting to see what the limit is for criteria-player improvement compared to non-switchers as N goes to infinity. I guess in the limit it just boils down to binomial calculations for the Urn-1 composition.
These are all great questions. Maybe I will pass them on to a student.

Quote:
Originally Posted by PairTheBoard
So this is another example were EV isn't everything.
EV is overrated.
Ask a probabilist Quote
07-19-2010 , 10:33 PM
Quote:
Originally Posted by myammy
thanks for doing this jason. I am going to be a senior at uiuc doing a math and econ degree (separate degrees). I am going to run out of college before I get to take some of the classes I would like. I am currently taking Grad Analysis which covers Lebesgue measure and such. I may have time to take Real 2 which covers abstract measure theory. With this background, what books would you recommend for stochastic processes and stochastic calculus. If you have a text you recommend for practitioners vs. pure math in both these topics, I would appreciate if you would make any comments on who the text is better for.

Thanks again for doing this thread.
Of course, one must learn probability theory before stochastic calculus. blah_blah makes some good recommendations. I will add these:

Lighter treatment:
http://books.google.com/books?id=OK_...page&q&f=false
http://books.google.com/books?id=kXw...sendal&f=false

More heavy-handed coverage:
http://books.google.com/books?id=9tv...page&q&f=false
http://books.google.com/books?id=ATN...page&q&f=false

Online notes:
http://www.math.wisc.edu/~kurtz/735/main735.pdf
http://www.math.wisc.edu/~seppalai/sa-book/notes.pdf
Ask a probabilist Quote
07-19-2010 , 10:53 PM
Quote:
Originally Posted by thylacine
Consider the following type of problem. I just roughly describe it, but I'm hoping someone recognizes it as something familiar and studied.
If we capture the notion of "probably approximately equal to" by specifying that aix = bi + εi, where εi is a random variable with a small variance, then this just looks like linear regression. There must be many references on linear regression in infinite dimensions. If this is what you are looking for, then maybe you could start here:

http://www.springerlink.com/content/f4732136107224v7/
http://www.jstor.org/pss/2100831
Ask a probabilist Quote
07-20-2010 , 06:57 AM
Quote:
Originally Posted by jason1990
If we capture the notion of "probably approximately equal to" by specifying that aix = bi + εi, where εi is a random variable with a small variance, then this just looks like linear regression. There must be many references on linear regression in infinite dimensions. If this is what you are looking for, then maybe you could start here:

http://www.springerlink.com/content/f4732136107224v7/
http://www.jstor.org/pss/2100831
Ah, of course, it's linear regression (slaps forehead). So actually I need to start here http://en.wikipedia.org/wiki/Linear_regression which also leads me here http://en.wikipedia.org/wiki/Polynomial_regression which is enough to get me started. Thanks.
Ask a probabilist Quote
09-10-2010 , 06:56 PM
I was playing WoW the other day and was messing around with the /roll 0-100 command (rolls a fair 101-sided die from 0-100).

The game I made up played out something like this:

roll 0-100 : 35
roll 0-35 : 29
roll 0-29 : 4
roll 0-4 : 2
roll 0-2 : 1
roll 0-1 : 1
roll 0-1 : 1
roll 0-1 : 0 (stop)
Total rolls: 8

Would it be possible to calculate the mean number of rolls it would take to get to "0" using this method? And possibly the standard deviation?

Would an "average" game play out something like:

100->50->25->12->6->3->2->1->0? Something tells me that this isn't correct.

Last edited by beansroast01; 09-10-2010 at 07:20 PM.
Ask a probabilist Quote
09-10-2010 , 10:17 PM
Quote:
Originally Posted by imjoshsizemore
I was playing WoW the other day and was messing around with the /roll 0-100 command (rolls a fair 101-sided die from 0-100).

The game I made up played out something like this:

roll 0-100 : 35
roll 0-35 : 29
roll 0-29 : 4
roll 0-4 : 2
roll 0-2 : 1
roll 0-1 : 1
roll 0-1 : 1
roll 0-1 : 0 (stop)
Total rolls: 8

Would it be possible to calculate the mean number of rolls it would take to get to "0" using this method? And possibly the standard deviation?

Would an "average" game play out something like:

100->50->25->12->6->3->2->1->0? Something tells me that this isn't correct.
So you mean you roll the 101-side die 35 times and then a 36 sided dice 29 times and so on?
Ask a probabilist Quote
09-10-2010 , 10:33 PM
Quote:
Originally Posted by Enrique
So you mean you roll the 101-side die 35 times and then a 36 sided dice 29 times and so on?
No, he rolls the 101 die once and it shows x. He then rolls an x+1 sided die, so on and so forth. This continues until he eventually rolls 0.
Ask a probabilist Quote
09-10-2010 , 10:54 PM
Quote:
Originally Posted by imjoshsizemore
Would it be possible to calculate the mean number of rolls it would take to get to "0" using this method?
The mean is 1 + H100 ≈ 6.18, where Hn is the n-th Harmonic number.

Let xn be the mean time it takes starting with a roll of a die numbered 0-n. Then x1 = 2. For general n, it takes an average of (n + 1)/n rolls to first roll something between 0 and n-1. This leads to the recursion
If we let sn = x1 + ... + xn, then this gives
Rearranging and dividing by n + 1 gives
From here, it follows easily that
Thus,
A similar, but more difficult, computation can be done for the variance, but I will leave that one to someone with more spare time.
Ask a probabilist Quote
09-11-2010 , 01:57 AM
Can I just say how much I've enjoyed this thread for the past two years, despite not really getting any of the answers. I've come back to it countless times. Thanks jason.
Ask a probabilist Quote
09-11-2010 , 09:11 AM
Awesome, thanks
Ask a probabilist Quote
09-11-2010 , 12:21 PM
The approach to the problem seems similar to the problem of what is the expected number of rolls before you get all six numbers on a normal 6-sided die.
The answer is 1/1+1/2+1/3+1/4+1/5+1/6
Ask a probabilist Quote
09-11-2010 , 01:39 PM
Quote:
Originally Posted by Enrique
The approach to the problem seems similar to the problem of what is the expected number of rolls before you get all six numbers on a normal 6-sided die.
The answer is 1/1+1/2+1/3+1/4+1/5+1/6
Surely 6 times that, right?
Ask a probabilist Quote
09-11-2010 , 10:37 PM
Quote:
Originally Posted by thylacine
Surely 6 times that, right?
Oops, indeed. 6/6 + 6/5 + ... + 6/1.
Ask a probabilist Quote
05-25-2011 , 04:46 PM
For all I know this has been raised in the thread before, but what the hey.

I'm told that by conservative estimates, there are at least 200bn stars in our galaxy, and at least 100bn galaxies in the universe. Assuming our galaxy is broadly typical, this means there are at least 2 x 1022 stars in the universe. Let's say that only one star in a hundred thousand has a planet capable of supporting life. Let's say only one planet in a thousand capable of supporting life goes on to develop it. Let's say that only one planet in a thousand that develops life goes on to develop life-forms capable of broadcasting radio signals into outer space.

That leaves 2 x 1011 species in the universe broadcasting radio signals into outer space.

Now consider the birthday problem.

When there are 24 people in the room, the probability that two of them share a birthday is >50%.

What I'm wondering is whether there's any way to calculate a rough probability that somewhere, at some time, two species have made contact with one another. Looking at the 2 x 1011 figure, and considering the implications of the birthday problem, intuitively it seems like a near certainty, but there are differences between this case and the birthday problem - clustering into separate galaxies seems like an obvious one, as well as the fact that time as well as space must be considered - Technological Society A can't make contact with Technological Society B if B emerges half a billion years after A goes extinct.

So I suppose a separate issue would be the probability that somewhere in the universe at least one species has received radio transmissions from another - whether the transmitting species still exists or not.

Looking at the 2 x 1011 figure, it seems absurdly high, so I'm worried I've made some laughable mistake in my figgerin. But even if my figures are completely out of whack, does anyone have any estimates?

My suspicion is that it is indeed probable that two alien species have contacted each other, which makes the fact that we're very unlikely ever to make contact with aliens all the more depressing. Somewhere out there is a party - and we're not invited.
Ask a probabilist Quote

      
m