Open Side Menu Go to the Top
Register
Ask a probabilist Ask a probabilist

10-14-2009 , 05:51 PM
Quote:
Originally Posted by Private Message
I'm trying to intuitively grasp what's going on in the [Borel-Cantelli] lemma at the moment but I'm having a bit of trouble and I was wondering whether you could provide some help
The Borel-Cantelli lemma states that if
then
In this post, I will try to explain what this means. First of all,
Working from the inside out, we first need to understand what the sup of a collection of sets is.

Subsets of the sample space can be partially ordered by inclusion. The supremum of a collection of elements in a partially ordered set is its least upper bound. So
should be an upper bound for all the sets Aj, j ≥ n. In other words, Bn is a set, and it should satisfy Aj ⊂ Bn for all j ≥ n.

Also, it should be the least upper bound. So it should be the smallest set with this property. From here, it is not hard to work out that
So we have determined that
Next, what does it mean to take a limit of a sequence of sets? Well, in general, this may not be a well-defined concept without further specification. But this is a special case, because the sequence Bn happens to be a monotone decreasing sequence. That is,
That means that the limit is just the greatest lower bound (aka, the infimum). In exactly the same way as above, we can work out that the infimum is the intersection of all the Bn's. So in the end, we have
Finally, I will leave it as an exercise to prove that
In other words, this is the event that infinitely many of the Aj's occur. Another way to say this is that the Aj's happen "infinitely often". In probability, we abbreviate this with "i.o.", so that
In words, then, the Borel-Cantelli lemma says that if the probabilities of the Aj's are summable, then the probability that they happen infinitely often is 0.

----------

Incidentally, a collection of declarative sentences can be partially ordered by logical implication. In that case, the supremum of two sentences, A and B, is the sentence "A or B"; and the infimum is the sentence "A and B". This is exactly why, in probability, union means "or" and intersection means "and". If, in this context, {An} is a sequence of such sentences, then the limsup of this sequence is the assertion that infinitely many of the sentences A1, A2, ... are true.
Ask a probabilist Quote
10-15-2009 , 01:54 AM
The PM was sent by me, I wasn't sure if people would find it interesting enough.

But thanks so much Jason, it's a lot more clear to me now. Thanks a lot for taking the time to writing that down. I'm confident now that I can start writing a descent bachelor thesis
Ask a probabilist Quote
10-20-2009 , 03:59 PM
Suppose I flip a coin and I can see the outcome and you can't. For me, P(H) = 0 or 1. For you, P(H) = 0.5. Does that indicate that probability is a measure not of the world as it is but of what we know about the world?
Ask a probabilist Quote
10-20-2009 , 07:05 PM
Here is a statistical question which I thought about.

You know the height of all the people in population A.
You know that the average height of the people in population A is 1.500 meters.

Inside population A, you have 10000 people who eat at least one kiwi a day. Let's call this sub-population A(k). You know the height of all the people in A(k). The average height of A(k) is 1.540 meters.

Inside population A, you have 25 people who eat at least one orange a day. Let's call this sub-population A(o). You know the height of all the people in A(o). The average height of A(o) is 1.543 meters.


Now, you have a population B, whose heights are known and whose average is also 1.500 meters. If you were to take the sub-population B(k) or B(o) in order to maximize the average height, which one would you choose?



Thanks a lot.
Ask a probabilist Quote
10-20-2009 , 07:24 PM
Quote:
Originally Posted by mastertop101
Here is a statistical question which I thought about.

You know the height of all the people in population A.
You know that the average height of the people in population A is 1.500 meters.

Inside population A, you have 10000 people who eat at least one kiwi a day. Let's call this sub-population A(k). You know the height of all the people in A(k). The average height of A(k) is 1.540 meters.

Inside population A, you have 25 people who eat at least one orange a day. Let's call this sub-population A(o). You know the height of all the people in A(o). The average height of A(o) is 1.543 meters.


Now, you have a population B, whose heights are known and whose average is also 1.500 meters. If you were to take the sub-population B(k) or B(o) in order to maximize the average height, which one would you choose?



Thanks a lot.
I think you need to give the standard deviation of the height within those populations for him to be able to give a mathematical answer. If the variance is 0 (which I assume it isn't), then you'd obv take B(o).

Regardless, my intuition would suggest definitely taking B(k) to reliably maximise height, assuming some reasonable standard deviation of heights.
Ask a probabilist Quote
10-20-2009 , 07:32 PM
Quote:
Originally Posted by lastcardcharlie
Suppose I flip a coin and I can see the outcome and you can't. For me, P(H) = 0 or 1. For you, P(H) = 0.5. Does that indicate that probability is a measure not of the world as it is but of what we know about the world?
wat.

EDIT: Well obviously you can't flip a coin that's in between heads or tails. So I suppose the answer to your question is yes. But you can get into some very philosophical arguments about mathematics about now.
Ask a probabilist Quote
10-20-2009 , 08:40 PM
Quote:
Originally Posted by olliepa
I think you need to give the standard deviation of the height within those populations for him to be able to give a mathematical answer. If the variance is 0 (which I assume it isn't), then you'd obv take B(o).

Regardless, my intuition would suggest definitely taking B(k) to reliably maximise height, assuming some reasonable standard deviation of heights.
Thanks for your answer. Standard deviation can be arbitrary
Ask a probabilist Quote
10-21-2009 , 01:01 AM
Quote:
Originally Posted by lastcardcharlie
Give him one of two envelopes, one containing twice the amount of money as the other, and ask him if he wants to swap.
Present him with Newcomb's Paradox IMO.

edit: Give him $5 and say that if you refuse it and I predicted that you refuse it, I put $50 in this card (hold out a sealed envelope w/ a card); if you accept it and I predicted that you accept it, you only get the $5 and I didn't put $50 in this card (hold out that same sealed envelope). Then, obviously don't put the $50 in the envelope since he'll accept the $5...then tell him that you knew he'd do that.

Last edited by durkadurka33; 10-21-2009 at 01:15 AM.
Ask a probabilist Quote
10-22-2009 , 08:33 AM
Quote:
Originally Posted by lastcardcharlie
Suppose I flip a coin and I can see the outcome and you can't. For me, P(H) = 0 or 1. For you, P(H) = 0.5. Does that indicate that probability is a measure not of the world as it is but of what we know about the world?
I think it indicates how important it is to remember that: all probabilities are conditional.

In your example, the sample space is Ω = {H, T}, and the probability measure is P(H) = P(T) = 0.5. From my perspective, the probability of heads is
P(H | Ω) = 0.5.
From yours, if you saw heads, it is
P(H | H) = 1.
And if you saw tails, it is
P(H | T) = 0.
Ask a probabilist Quote
10-22-2009 , 07:58 PM
What do you think about Bourbaki? I have heard the treatment of probability there is among the worst of all the topics covered.
Ask a probabilist Quote
10-24-2009 , 10:46 AM
Quote:
Originally Posted by mastertop101
Here is a statistical question which I thought about.
Many additional assumptions would need to be made before this could be a well-posed question. But here is some information which you may find interesting/helpful.

Suppose I take n independent Gaussians with mean 1.54 and standard deviation σ1. Then their average, call it X, would also be Gaussian with mean 1.54 and standard deviation n-1/2σ1.

Suppose I then take m independent Gaussians with mean 1.543 and standard deviation σ2. Then their average, call it Y, would also be Gaussian with mean 1.543 and standard deviation m-1/2σ2.

Let us suppose that X and Y are independent.

Then P(X < Y) = P(Y - X > 0). The random variable Y - X is Gaussian with mean 0.003 and variance n-1σ12 + m-1σ22. No matter what this variance happens to be, since the mean is positive, we have P(Y - X > 0) > 0.5. In other words, Y is more likely to be larger than X, and this does not depend on the relative magnitudes of n and m.

(The assumption that X and Y are independent may be dropped, and replaced with the assumption that X and Y are jointly Gaussian.)
Ask a probabilist Quote
10-24-2009 , 11:13 AM
Quote:
Originally Posted by Max Raker
What do you think about Bourbaki? I have heard the treatment of probability there is among the worst of all the topics covered.
I know nothing of Bourbaki, but from what little I just read on Wikipedia, I think I am not missing much. It sounds as if they think there are no non-Radon probability measures worth discussing. Of course, that's completely ridiculous.
Ask a probabilist Quote
10-24-2009 , 12:45 PM
Quote:
Originally Posted by jason1990
Many additional assumptions would need to be made before this could be a well-posed question. But here is some information which you may find interesting/helpful.

Suppose I take n independent Gaussians with mean 1.54 and standard deviation σ1. Then their average, call it X, would also be Gaussian with mean 1.54 and standard deviation n-1/2σ1.

Suppose I then take m independent Gaussians with mean 1.543 and standard deviation σ2. Then their average, call it Y, would also be Gaussian with mean 1.543 and standard deviation m-1/2σ2.

Let us suppose that X and Y are independent.

Then P(X < Y) = P(Y - X > 0). The random variable Y - X is Gaussian with mean 0.003 and variance n-1σ12 + m-1σ22. No matter what this variance happens to be, since the mean is positive, we have P(Y - X > 0) > 0.5. In other words, Y is more likely to be larger than X, and this does not depend on the relative magnitudes of n and m.

(The assumption that X and Y are independent may be dropped, and replaced with the assumption that X and Y are jointly Gaussian.)
Interesting stuff, thank you. But I am quite surprised by your conclusion:
Suppose the following case: you want to know which player has the biggest true winrate: a player A whose winrate is 6bb/100hands over 250 hands and a player B whose winrate is 5.5bb/100hands over 1 000 000 hands.
Surely, player B rates to have a bigger winrate.
How can you explain this?
By the way, you say that Y is more likely to be larger than X, but does it really necessarily mean that Y's average is more likely to be larger than X's average considering that the data might not be normally distributed?
Ask a probabilist Quote
10-24-2009 , 02:14 PM
Quote:
Originally Posted by mastertop101
Suppose the following case: you want to know which player has the biggest true winrate: a player A whose winrate is 6bb/100hands over 250 hands and a player B whose winrate is 5.5bb/100hands over 1 000 000 hands. Surely, player B rates to have a bigger winrate. How can you explain this?
This can be modeled by a Bayesian analysis using a prior winrate distribution that puts the bulk of its mass much lower than 5.5. See here for details.

Quote:
Originally Posted by mastertop101
By the way, you say that Y is more likely to be larger than X, but does it really necessarily mean that Y's average is more likely to be larger than X's average
Y and X are the averages.

Quote:
Originally Posted by jason1990
Suppose I take n independent Gaussians with mean 1.54 and standard deviation σ1. Then their average, call it X, would also be Gaussian with mean 1.54 and standard deviation n-1/2σ1.

Suppose I then take m independent Gaussians with mean 1.543 and standard deviation σ2. Then their average, call it Y, would also be Gaussian with mean 1.543 and standard deviation m-1/2σ2.
Ask a probabilist Quote
10-25-2009 , 05:40 PM
Very good thread.
Ask a probabilist Quote
10-26-2009 , 12:46 AM
Quote:
Originally Posted by jason1990
This can be modeled by a Bayesian analysis using a prior winrate distribution that puts the bulk of its mass much lower than 5.5. See here for details.


Y and X are the averages.
Thank you so much, but, shouldn't we have used this system for my initial problem? (My prior opinion was that eating kiwis or oranges would have no effect on average height, i.e. average height should be 1.500 meters)
This should give the most reliable mean, right?
Ask a probabilist Quote
10-27-2009 , 08:59 AM
Quote:
Originally Posted by mastertop101
Thank you so much, but, shouldn't we have used this system for my initial problem?
Your original question was not well-posed.

Quote:
Originally Posted by mastertop101
This should give the most reliable mean, right?
Bayesian methods give answers that are only as reliable as our priors, just like logical arguments give conclusions that are only as reliable as our premises.
Ask a probabilist Quote
10-27-2009 , 01:10 PM
Jason, what's your current research topic? And how is your research going? (I'm sorry if this has been asked before.)
Ask a probabilist Quote
10-28-2009 , 09:27 PM
Quote:
Originally Posted by Styhn
Jason, what's your current research topic? And how is your research going? (I'm sorry if this has been asked before.)
I am currently working on models of interacting particle systems. For example, what does the behavior of the system as a whole and/or the behavior of individual particles look like when the total number of particles is very large? Is there a law of large numbers and a central limit theorem for these systems? If so, what do these theorems look like, and how can we use them to model these particle systems? These are some of the questions I am working on. These projects are relatively new for me, so they are just in the early stages.
Ask a probabilist Quote
10-28-2009 , 09:59 PM
Quote:
Originally Posted by jason1990
I am currently working on models of interacting particle systems. For example, what does the behavior of the system as a whole and/or the behavior of individual particles look like when the total number of particles is very large? Is there a law of large numbers and a central limit theorem for these systems? If so, what do these theorems look like, and how can we use them to model these particle systems? These are some of the questions I am working on. These projects are relatively new for me, so they are just in the early stages.
Interesting. Can the particles be stars?
Ask a probabilist Quote
10-28-2009 , 10:19 PM
Quote:
Originally Posted by jason1990
I am currently working on models of interacting particle systems. For example, what does the behavior of the system as a whole and/or the behavior of individual particles look like when the total number of particles is very large? Is there a law of large numbers and a central limit theorem for these systems? If so, what do these theorems look like, and how can we use them to model these particle systems? These are some of the questions I am working on. These projects are relatively new for me, so they are just in the early stages.
Are you modeling classical distinguishable particles (which is usually higher energy if you are talking about things like atoms) or identical quantum mechanical particles (bosons/fermions)?
Ask a probabilist Quote
10-30-2009 , 09:18 AM
Quote:
Originally Posted by thylacine
Interesting. Can the particles be stars?
Good question. I have never thought about this. In principle, I suppose so, although I have never heard of anyone using the models in this way.
Ask a probabilist Quote
10-30-2009 , 09:26 AM
Quote:
Originally Posted by Max Raker
Are you modeling classical distinguishable particles (which is usually higher energy if you are talking about things like atoms) or identical quantum mechanical particles (bosons/fermions)?
To be clear, I am primarily a pure mathematician. The objects I am studying are stochastic processes (often Markov processes) that take values in certain metric or topological spaces. So they are just (very complicated) functions. My knowledge of physics is actually quite limited.

That said, I do not know of any applications in quantum theory for these models. The applications I am familiar with are all classical.
Ask a probabilist Quote
10-30-2009 , 11:57 AM
I have little to add. I just want to say that if I were to name "best thread ever in SMP" then this would be the one I named. It's awesome.
Ask a probabilist Quote
10-30-2009 , 02:02 PM
Quote:
Originally Posted by tame_deuces
I have little to add. I just want to say that if I were to name "best thread ever in SMP" then this would be the one I named. It's awesome.
+1
Ask a probabilist Quote

      
m