Ask a probabilist - Page 12 - Science, Math and Philosophy Forum

Originally Posted by Private Message
I'm trying to intuitively grasp what's going on in the [Borel-Cantelli] lemma at the moment but I'm having a bit of trouble and I was wondering whether you could provide some help

The Borel-Cantelli lemma states that if

then

In this post, I will try to explain what this means. First of all,

Working from the inside out, we first need to understand what the sup of a collection of sets is.

Subsets of the sample space can be partially ordered by inclusion. The supremum of a collection of elements in a partially ordered set is its least upper bound. So

should be an upper bound for all the sets A_j, j ≥ n. In other words, B_n is a set, and it should satisfy A_j ⊂ B_n for all j ≥ n.

Also, it should be the least upper bound. So it should be the smallest set with this property. From here, it is not hard to work out that

So we have determined that

Next, what does it mean to take a limit of a sequence of sets? Well, in general, this may not be a well-defined concept without further specification. But this is a special case, because the sequence B_n happens to be a monotone decreasing sequence. That is,

That means that the limit is just the greatest lower bound (aka, the infimum). In exactly the same way as above, we can work out that the infimum is the intersection of all the B_n's. So in the end, we have

Finally, I will leave it as an exercise to prove that

In other words, this is the event that infinitely many of the A_j's occur. Another way to say this is that the A_j's happen "infinitely often". In probability, we abbreviate this with "i.o.", so that

In words, then, the Borel-Cantelli lemma says that if the probabilities of the A_j's are summable, then the probability that they happen infinitely often is 0.

----------

Incidentally, a collection of declarative sentences can be partially ordered by logical implication. In that case, the supremum of two sentences, A and B, is the sentence "A or B"; and the infimum is the sentence "A and B". This is exactly why, in probability, union means "or" and intersection means "and". If, in this context, {A_n} is a sequence of such sentences, then the limsup of this sequence is the assertion that infinitely many of the sentences A₁, A₂, ... are true.

Quote

10-15-2009 , 01:54 AM

#277

Styhn

old hand

Join Date: Aug 2007 Posts: 1,270

The PM was sent by me, I wasn't sure if people would find it interesting enough.

But thanks so much Jason, it's a lot more clear to me now. Thanks a lot for taking the time to writing that down. I'm confident now that I can start writing a descent bachelor thesis

Quote

10-20-2009 , 03:59 PM

#278

lastcardcharlie

Carpal \'Tunnel

Join Date: Aug 2006 Posts: 10,435

Suppose I flip a coin and I can see the outcome and you can't. For me, P(H) = 0 or 1. For you, P(H) = 0.5. Does that indicate that probability is a measure not of the world as it is but of what we know about the world?

Quote

10-20-2009 , 07:05 PM

#279

mastertop101

adept

Join Date: Jun 2008 Posts: 723

Here is a statistical question which I thought about.

You know the height of all the people in population A.
You know that the average height of the people in population A is 1.500 meters.

Inside population A, you have 10000 people who eat at least one kiwi a day. Let's call this sub-population A(k). You know the height of all the people in A(k). The average height of A(k) is 1.540 meters.

Inside population A, you have 25 people who eat at least one orange a day. Let's call this sub-population A(o). You know the height of all the people in A(o). The average height of A(o) is 1.543 meters.

Now, you have a population B, whose heights are known and whose average is also 1.500 meters. If you were to take the sub-population B(k) or B(o) in order to maximize the average height, which one would you choose?

Thanks a lot.

Quote

10-20-2009 , 07:24 PM

#280

olliepa

grinder

Join Date: Oct 2008 Posts: 588

Quote:

Originally Posted by mastertop101

I think you need to give the standard deviation of the height within those populations for him to be able to give a mathematical answer. If the variance is 0 (which I assume it isn't), then you'd obv take B(o).

Regardless, my intuition would suggest definitely taking B(k) to reliably maximise height, assuming some reasonable standard deviation of heights.

Quote

10-20-2009 , 07:32 PM

#281

olliepa

grinder

Join Date: Oct 2008 Posts: 588

Quote:

Originally Posted by lastcardcharlie

wat.

EDIT: Well obviously you can't flip a coin that's in between heads or tails. So I suppose the answer to your question is yes. But you can get into some very philosophical arguments about mathematics about now.

Quote

10-20-2009 , 08:40 PM

#282

mastertop101

adept

Join Date: Jun 2008 Posts: 723

Quote:

Originally Posted by olliepa

Thanks for your answer. Standard deviation can be arbitrary

Quote

10-21-2009 , 01:01 AM

#283

durkadurka33

Carpal \'Tunnel

Join Date: Apr 2008 Posts: 8,709

Quote:

Originally Posted by lastcardcharlie

Give him one of two envelopes, one containing twice the amount of money as the other, and ask him if he wants to swap.

Present him with Newcomb's Paradox IMO.

edit: Give him $5 and say that if you refuse it and I predicted that you refuse it, I put $50 in this card (hold out a sealed envelope w/ a card); if you accept it and I predicted that you accept it, you only get the $5 and I didn't put $50 in this card (hold out that same sealed envelope). Then, obviously don't put the $50 in the envelope since he'll accept the $5...then tell him that you knew he'd do that.

Last edited by durkadurka33; 10-21-2009 at 01:15 AM.

Quote

10-22-2009 , 08:33 AM

#284

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by lastcardcharlie

I think it indicates how important it is to remember that: all probabilities are conditional.

In your example, the sample space is Ω = {H, T}, and the probability measure is P(H) = P(T) = 0.5. From my perspective, the probability of heads is

P(H | Ω) = 0.5.

From yours, if you saw heads, it is

P(H | H) = 1.

And if you saw tails, it is

P(H | T) = 0.

Quote

10-22-2009 , 07:58 PM

#285

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

What do you think about Bourbaki? I have heard the treatment of probability there is among the worst of all the topics covered.

Quote

10-24-2009 , 10:46 AM

#286

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by mastertop101

Here is a statistical question which I thought about.

Many additional assumptions would need to be made before this could be a well-posed question. But here is some information which you may find interesting/helpful.

Suppose I take n independent Gaussians with mean 1.54 and standard deviation σ₁. Then their average, call it X, would also be Gaussian with mean 1.54 and standard deviation n^-1/2σ₁.

Suppose I then take m independent Gaussians with mean 1.543 and standard deviation σ₂. Then their average, call it Y, would also be Gaussian with mean 1.543 and standard deviation m^-1/2σ₂.

Let us suppose that X and Y are independent.

Then P(X < Y) = P(Y - X > 0). The random variable Y - X is Gaussian with mean 0.003 and variance n^-1σ₁² + m^-1σ₂². No matter what this variance happens to be, since the mean is positive, we have P(Y - X > 0) > 0.5. In other words, Y is more likely to be larger than X, and this does not depend on the relative magnitudes of n and m.

(The assumption that X and Y are independent may be dropped, and replaced with the assumption that X and Y are jointly Gaussian.)

Quote

10-24-2009 , 11:13 AM

#287

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by Max Raker

What do you think about Bourbaki? I have heard the treatment of probability there is among the worst of all the topics covered.

I know nothing of Bourbaki, but from what little I just read on Wikipedia, I think I am not missing much. It sounds as if they think there are no non-Radon probability measures worth discussing. Of course, that's completely ridiculous.

Quote

10-24-2009 , 12:45 PM

#288

mastertop101

adept

Join Date: Jun 2008 Posts: 723

Quote:

Originally Posted by jason1990

Interesting stuff, thank you. But I am quite surprised by your conclusion:
Suppose the following case: you want to know which player has the biggest true winrate: a player A whose winrate is 6bb/100hands over 250 hands and a player B whose winrate is 5.5bb/100hands over 1 000 000 hands.
Surely, player B rates to have a bigger winrate.
How can you explain this?
By the way, you say that Y is more likely to be larger than X, but does it really necessarily mean that Y's average is more likely to be larger than X's average considering that the data might not be normally distributed?

Quote

10-24-2009 , 02:14 PM

#289

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by mastertop101

Suppose the following case: you want to know which player has the biggest true winrate: a player A whose winrate is 6bb/100hands over 250 hands and a player B whose winrate is 5.5bb/100hands over 1 000 000 hands. Surely, player B rates to have a bigger winrate. How can you explain this?

This can be modeled by a Bayesian analysis using a prior winrate distribution that puts the bulk of its mass much lower than 5.5. See here for details.

Quote:

Originally Posted by mastertop101

By the way, you say that Y is more likely to be larger than X, but does it really necessarily mean that Y's average is more likely to be larger than X's average

Y and X are the averages.

Quote:

Originally Posted by jason1990

Suppose I take n independent Gaussians with mean 1.54 and standard deviation σ₁. Then their average, call it X, would also be Gaussian with mean 1.54 and standard deviation n^-1/2σ₁.

Suppose I then take m independent Gaussians with mean 1.543 and standard deviation σ₂. Then their average, call it Y, would also be Gaussian with mean 1.543 and standard deviation m^-1/2σ₂.

Quote

10-25-2009 , 05:40 PM

#290

Ryanb9

Carpal \'Tunnel

Join Date: Aug 2006 Posts: 7,459

Very good thread.

Quote

10-26-2009 , 12:46 AM

#291

mastertop101

adept

Join Date: Jun 2008 Posts: 723

Quote:

Originally Posted by jason1990

This can be modeled by a Bayesian analysis using a prior winrate distribution that puts the bulk of its mass much lower than 5.5. See here for details.

Y and X are the averages.

Thank you so much, but, shouldn't we have used this system for my initial problem? (My prior opinion was that eating kiwis or oranges would have no effect on average height, i.e. average height should be 1.500 meters)
This should give the most reliable mean, right?

Quote

10-27-2009 , 08:59 AM

#292

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by mastertop101

Thank you so much, but, shouldn't we have used this system for my initial problem?

Your original question was not well-posed.

Quote:

Originally Posted by mastertop101

This should give the most reliable mean, right?

Bayesian methods give answers that are only as reliable as our priors, just like logical arguments give conclusions that are only as reliable as our premises.

Quote

10-27-2009 , 01:10 PM

#293

Styhn

old hand

Join Date: Aug 2007 Posts: 1,270

Jason, what's your current research topic? And how is your research going? (I'm sorry if this has been asked before.)

Quote

10-28-2009 , 09:27 PM

#294

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by Styhn

Jason, what's your current research topic? And how is your research going? (I'm sorry if this has been asked before.)

I am currently working on models of interacting particle systems. For example, what does the behavior of the system as a whole and/or the behavior of individual particles look like when the total number of particles is very large? Is there a law of large numbers and a central limit theorem for these systems? If so, what do these theorems look like, and how can we use them to model these particle systems? These are some of the questions I am working on. These projects are relatively new for me, so they are just in the early stages.

Quote

10-28-2009 , 09:59 PM

#295

thylacine

Pooh-Bah

Join Date: Jul 2003 Posts: 4,029

Quote:

Originally Posted by jason1990

Interesting. Can the particles be stars?

Quote

10-28-2009 , 10:19 PM

#296

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

Quote:

Originally Posted by jason1990

Are you modeling classical distinguishable particles (which is usually higher energy if you are talking about things like atoms) or identical quantum mechanical particles (bosons/fermions)?

Quote

10-30-2009 , 09:18 AM

#297

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by thylacine

Interesting. Can the particles be stars?

Good question. I have never thought about this. In principle, I suppose so, although I have never heard of anyone using the models in this way.

Quote

10-30-2009 , 09:26 AM

#298

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by Max Raker

Are you modeling classical distinguishable particles (which is usually higher energy if you are talking about things like atoms) or identical quantum mechanical particles (bosons/fermions)?

To be clear, I am primarily a pure mathematician. The objects I am studying are stochastic processes (often Markov processes) that take values in certain metric or topological spaces. So they are just (very complicated) functions. My knowledge of physics is actually quite limited.

That said, I do not know of any applications in quantum theory for these models. The applications I am familiar with are all classical.