Ask a probabilist - Page 8 - Science, Math and Philosophy Forum

How could someone, or can someone, mathematically prove/disprove the existence of god/bigfoot/aliens, or any "question of existence". . . . If we have discovered the existence of dinosauers, but even with active scientific and amateur projects to discover bigfoot, have yet to find ONE piece of verifiable evidence with something still currently living says to me that the existence of such a creature is highly unlikely.

I think you are missing the distinction between mathematical proof and a strong inductive argument supported by mathematical reasoning. When somebody says that you can never prove something like that, they're just saying that it's not something that you can formally deduce. The odds may be extremely low that somebody could produce a counterexample, but a counterexample isn't logically impossible. By contrast, a correctly done mathematical proof admits no such possibilities.

Quote

03-06-2009 , 11:09 AM

#177

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by lastcardcharlie

This must have been discussed in this or another thread but I missed it. That spinner situation where each has three numbers on it:
Spinner A: 1, 5, 9
Spinner B: 2, 6, 7
Spinner C: 3, 4, 8
and when they play HU the highest number spun wins. Then A > C > B > A. Meaning that the relation "is better than" is not necessarily transitive. What, if any, are the most important consequences of this in probability?

Quote:

Originally Posted by lastcardcharlie

Well "better than" isn't symmetric anyway. Let me rephrase the question: if I'm teaching it in class, what do I say other than "here's a real neat probability trick, kids"?

Well, if I were teaching this in a class, I would be much more careful about how I presented it. What do we mean by "A > C"? What is A? What is C? Evidently A and C are random variables (the results of the respective spinners). In that case, "A > C" is not a statement that can be true or false. It is an event, which has a probability.

In this example, what we can say is that P(A > C) > 1/2. We could define a relation on the set of random variables on a given probability space by defining X ≳ Y to mean that X = Y (almost surely) or P(X > Y) > 1/2. But as this example illustrates, the relation is not transitive, so it is not a partial ordering.

On the other hand, if X ≳ Y means that X ≥ Y almost surely, then ≳ is a partial ordering.

I would say this phenomenon has consequences for decision theory. In trying to prove that a certain choice is optimal (in some sense) among a given family of choices, it is important to establish some kind of partial or total ordering. However, some very natural relations, which are usually well-defined orderings, lose their nice properties in the presence of uncertainty, and can no longer be used as orderings. We must therefore be extra careful when making decisions under uncertainty, since casual reasoning that relies on previously established intuition can lead us astray.

Quote

03-06-2009 , 11:23 AM

#178

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by The Evil Polka Man

Player A : t5000
Player B : t1500
Player C : t2000
Player D : t5000

...

(Pr(A 3rd)) =
(Pr(B 1st))(Pr(C 2nd))(Pr(A beats D for 3rd)) +
(Pr(B 1st))(Pr(D 2nd))(Pr(A beats C for 3rd)) +
(Pr(C 1st))(Pr(B 2nd)(Pr(A beats D for 3rd)) +
(Pr(C 1st))(Pr(D 2nd))(Pr(A beats B for 3rd)) +
(Pr(D 1st))(Pr(B 2nd))(Pr(A beats C for 3rd)) +
(Pr(D 1st))(Pr(C 2nd))(Pr(A beats B for 3rd)) =

(Pr(A 3rd)) =
(.111111111 )(.192810459)(.5) +
(.111111111 )(.328573457)(714285714) +
(.148148148 )(.150042626)(.5) +
(.148148148 )(.328573457)(.769230769) +
(.370370370 )(.150042626)(714285714) +
(.370370370 )(.192810459)(.769230769) =

The bold parts need to be conditioned on the first event. For example, consider the second and fourth lines containing bold text. If B comes 1st, then the probability D comes 2nd is 5000/12000. But if C comes first, then the probability D comes 2nd is 5000/11500.

Quote

03-06-2009 , 11:27 AM

#179

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by bias1

i am taking a probability and stats. class for my poly sci major next year. i need to get A's in them, yet i am terrible in math. what can i do to prepare myself for these classes?

Brush up on your algebra.

Quote

03-06-2009 , 07:21 PM

#180

lastcardcharlie

Carpal \'Tunnel

Join Date: Aug 2006 Posts: 10,435

Quote:

Originally Posted by jason1990

What do we mean by "A > C"? What is A? What is C? Evidently A and C are random variables (the results of the respective spinners). In that case, "A > C" is not a statement that can be true or false. It is an event, which has a probability.

In this example, what we can say is that P(A > C) > 1/2.

Sure. My A > C notation was a little off.

Quote:

We could define a relation on the set of random variables on a given probability space by defining X ≳ Y to mean that X = Y (almost surely) or P(X > Y) > 1/2. But as this example illustrates, the relation is not transitive, so it is not a partial ordering.

I didn't understand your remarks on random variables so I went to wiki. There, a random variable is a measurable function between measurable spaces X and Y, and with a probability measure P on X. In the A vs C example, X is the product of {1, 5, 9} and {3, 4, 8}, and P(a, c) = 1/9? If so, I'm having trouble understanding what Y is and how your remarks apply.

Quote

03-06-2009 , 07:39 PM

#181

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by Frylock

Ok you left the door open, not sure if this can be answered but I'm curious

...

Well, from a certain very strict point of view, nothing can be proven (except a tautology). Even the truth of a mathematical theorem is contingent on the truth of the axioms it is built upon. Logic cannot produce truth out of nothing. We must take truths we already know (or agree upon), then combine them using the rules of logic, and this produces new truths.

Something similar happens in probability. We take probabilities we have already agreed upon (sometimes called prior probabilities), then combine them using the rules of probability, and this produces new probabilities (sometimes called posterior probabilities).

In principle, there is nothing which prevents us from applying the rules of logic and probability to topics such as gods, aliens, or Bigfoot. However, if you are debating these topics with someone who holds the opposite view, you may find it difficult to come to an agreement about the facts in evidence and the prior probabilities. Also, most people fail miserably when they try to intuitively apply the rules of probability. This means you must be extra skeptical about probabilistic arguments that seem intuitively obvious. It also means you may have a hard time convincing someone of your conclusion, even when your argument is sound.

Quote

03-08-2009 , 01:42 PM

#182

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by Bonecrusher Smith

why do we divide by the d.o.f. in the sample standard deviation

This is actually two questions. The first is, why do we divide by n - 1 in the the sample variance? The second is, why is n - 1 the degree of freedom for the residuals, X_i - X¯? I do not believe these questions have much to do with one another. I will address them here, but if your background in probability theory is weak, the answers will be difficult to understand.

Strictly speaking, variance is a property of a random variable, not of a data set. A random variable is a function from a probability space to the reals. When we speak of a data set as a "population," we usually mean that the data set itself is the sample space, and the probability measure on the sample space is typically taken to be the uniform measure. The identity function on this probability space is then a random variable which represents a randomly chosen element from this "population." If we calculate the variance of this random variable, then we must divide by n, because that is the definition of variance. This is what is meant by the "population variance."

When we speak of a data set as a "sample," we usually mean that we are modeling this data set as a particular realization of a finite sequence of independent random variables, each one described as above. In this case, when we talk about the sample "variance," it is not really a variance at all. It is an estimate of the population variance. If we divide by n, we get one possible estimate. If we divide by n - 1, we get another possible estimate. We often choose to divide by n - 1, because then the resulting estimate is "unbiased."

Now consider degrees of freedom. Suppose I have a sample of size n. Then my data set is a particular realization of n independent random variable, X_1, ..., X_n. I then define the sample mean to be

X¯ = (X_1 + ... + X_n)/n.

Note that this is also a random variable. The residuals are X_i - X¯, which are also random variables. Random variables are functions, and functions are vectors, so we can talk about the dimension of the space spanned by these residuals, which is at most n, since there are n of them.

These residuals are linearly dependent, since they add up to the 0 random variable. So they span a space of dimension at most n - 1. In most cases, in fact, they span a space of dimension exactly n - 1, and this is what we mean when we say there are n - 1 degrees of freedom.

Quote:

Originally Posted by Bonecrusher Smith

I understand that d.o.f. is the number of independent quantities that can vary. The argument in most textbooks seems to be that since the sample mean is known, only n-1 of the quantities can vary, so we divide the sum of the squared deviations by the d.o.f.

This is a (hand-wavy) explanation for why there are n - 1 degrees of freedom. But it does not explain why we should divide by n - 1 in the sample variance.

Quote:

Originally Posted by Bonecrusher Smith

It seems to me that this explanation would also apply to the population. If we knew n-1 values, and the population mean, shouldn't we be dividing by the d.o.f. too?

As mentioned above, when we speak of a data set as a "population," we are thinking of the data set as being the sample space itself. It is then just a collection of numbers, and not a sequence of random variables. The real line has dimension 1, so the linear span of any set of real numbers will be either 0 or 1. Strictly speaking, it is therefore not correct to say that a population of size n has n - 1 degrees of freedom.

Quote:

Originally Posted by Bonecrusher Smith

As a sub question, why do we need to divide by the d.o.f. at all? It is my understanding that in calculating the variance, we are getting an average squared deviation. To get an average don't we need to divide by n?

Correct. By the definition of variance, we must divide by n. As mentioned above, when we calculate the so-called sample "variance," we are not actually calculating a variance. We are calculating an estimate of the population variance.

Quote

03-08-2009 , 10:06 PM

#183

joeg

centurion

Join Date: Feb 2003 Posts: 134

Hey Jason

Its very generous of you to take the time to answer all these questions :P

I saw a few suggestions for books but they all looked pretty advanced, I studied maths at university but barely scraped through with a pass and that was about 10 years ago, now I find myself in the situation where it would be very usefull to have a good knowledge of probability, especially in relation to modelling

Can you reccomend any good 'refresher' type books for someone with a bit of mathemeatical experience in the closest but largely forgoten to get up to degree speed and also any books relating to modeling (markov process's at the moment)

cheers

Quote

03-09-2009 , 01:26 AM

#184

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by lastcardcharlie

The letters X and Y are usually reserved for random variables, so I would say a random variable is a measurable function from a probability space Ω to the real line R. If you wanted to construct only A and C, then you could do as you described by letting Ω = {1,5,9} x {3,4,8}, and letting P be uniform measure. In that case, A and C would be defined by A(a,c) = a and C(a,c) = c.

However, I would probably define A, B, and C all at once in the following way. Let Ω = {1,2,3}³, and let P be uniform measure. Define

A(1,y,z) = 1,
A(2,y,z) = 5,
A(3,y,z) = 9,

for all y,z;

B(x,1,z) = 2,
B(x,2,z) = 6,
B(x,3,z) = 7,

for all x,z; and

C(x,y,1) = 3,
C(x,y,2) = 4,
C(x,y,3) = 8,

for all x,y.

But really, the technical construction of these random variables is a little off-topic. It is enough to say that A, B, and C are independent random variables with

P(A = 1) = P(A = 5) = P(A = 9) = 1/3,
P(B = 2) = P(B = 6) = P(B = 7) = 1/3,
P(C = 3) = P(C = 4) = P(C = 8) = 1/3.

Quote

03-09-2009 , 10:05 AM

#185

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by joeg

it would be very usefull to have a good knowledge of probability, especially in relation to modelling

Can you reccomend any good 'refresher' type books

You could try an elementary undergraduate textbook on probability, such as this one by Sheldon M. Ross. You might also want to browse through Ross's other books. A couple that might help you are Introduction to Probability Models and Stochastic Processes.

Quote

03-09-2009 , 02:08 PM

#186

Frylock

adept

Join Date: Jan 2009 Posts: 726

Quote:

Originally Posted by jason1990

Thanks for your response

Quote

03-09-2009 , 02:14 PM

#187

lastcardcharlie

Carpal \'Tunnel

Join Date: Aug 2006 Posts: 10,435

Quote:

Originally Posted by jason1990

A random variable is a formal interpretation of the marks on the spinners as real numbers? It's pure semantics? What's random about it?

Quote

03-09-2009 , 09:16 PM

#188

KUJustin

veteran

Join Date: Mar 2004 Posts: 3,346

If I wanted to learn more about how to perform non-linear multi-dimensional regressions what would you suggest I do?

I'm not entirely sure what they are, or if that's a real term, but that's what best describes what I want to be able to do.

Quote

03-10-2009 , 04:36 AM

#189

Bonecrusher Smith

enthusiast

Join Date: Aug 2008 Posts: 54

Thanks Jason. You are a legend!

Quote

03-10-2009 , 11:42 AM

#190

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by lastcardcharlie

A random variable is a formal interpretation of the marks on the spinners as real numbers?

No, a (real-valued) random variable is a Borel measurable function from a probability space to the real line.

Quote:

Originally Posted by lastcardcharlie

It's pure semantics?

No, it's pure mathematics.

Quote:

Originally Posted by lastcardcharlie

What's random about it?

I am not exactly sure what your question is here. Probability spaces (and the random variables defined on them) are the mathematical objects we use to model random phenomena. The sample space, Ω, models the possible outcomes of the phenomenon in question; the σ-algebra, F, models the events of interest to us; and the probability measure, P: F → [0,1], models the probabilities we wish to assign to the events. A random variable is simply a (measurable) function, X: Ω → R. It models, obviously, a function of the outcome.

Quote

03-11-2009 , 04:18 PM

#191

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by KUJustin

As I said in the OP, I am a probabilist, not a statistician. But if you tell me in more detail what it is you want to do, I might be able to help.

Quote

04-02-2009 , 10:46 AM

#192

domdomdom

stranger

Join Date: Apr 2009 Posts: 2

This is a simple probability question but I have a hard time getting through it.

Say I flip a coin eight times and I get HTTHHHTH (or whatever else), call it the reference. If I do 10 other trials (eight toss each), what are my chances of having : at least 1 toss (whichever) that corresponds to the reference (position is important) in all 10 trials; same question but to have at least 2; at least 3; and so on until 8.

How can I calculate this same thing for more trials, and/or more toss and/or more possibilities than just head or tail?

Thank you!

Quote

04-02-2009 , 04:16 PM

#193

twoheadedboy

journeyman

Join Date: Apr 2009 Posts: 254

Quote:

Originally Posted by domdomdom

I'm not Jason obviously but I think I can handle this one.

Call event A the event that you get at least one toss that corresponds to the reference in a given trial.

P(A) = 1 - P(A^c), which means that we want to find the probability that no tosses correspond to the reference and subtract from 1.

P(A^c) = (1/2)⁸, since the probability of a match on any given toss is 1/2.

So that's the probability for any one trial. Since each trial is independent, we can find probability that it occurs in all ten trials by just raising it to the tenth power.

(255/256)¹⁰ ~= .9616

For at least two matches:

P(A) = 1 - P(A^c) = 1 - [P(no matches) + P(1 match)]

For at least three matches:

P(A) = 1 - P(A^c) = 1 - [P(no matches) + P(1 match) + P(2 matches)]

and so on...

When you get up to at least 5 matches, you don't need to subtract from one (You could still do it the other way if you've been keeping track of the cumulative probabilities). For example for 5 or more matches:

P(A) = [P(5 matches) + P(6 matches) + P(7 matches) + P(8 matches)]

To find P(x matches), use the binomial distribution. Check out wiki if you don't know what that is.

Edit:
Or if you want, plug =(1-BINOMDIST(x,8,0.5,TRUE))^10 into an Excel cell, where x is the number of matches minus one. So probability of at least 3 matches in each of ten trials would be =(1-BINOMDIST(2,8,0.5,TRUE))^10

Quote

04-02-2009 , 04:26 PM

#194

twoheadedboy

journeyman

Join Date: Apr 2009 Posts: 254

Quote:

Originally Posted by domdomdom

How can I calculate this same thing for more trials, and/or more toss and/or more possibilities than just head or tail?

I missed this. For more trials, you'll just take your answer for one trial, and raise it to the power of # of trials. For more tosses, you need a different parameter for your binomial. The 8 in the above Excel equation stood for the number of tosses.

If you change the # of possibilities just change the probability parameter from .5 to 1/3 in the case of 3 possibilities, or 1/4 for four, etc.

If you want to have unequal probabilities for these outcomes it becomes a bit more complicated.

Quote

04-03-2009 , 08:15 AM

#195

jason1990

old hand

Join Date: Sep 2004 Posts: 1,889

Quote:

Originally Posted by domdomdom

I give my seal of approval to the solution provided by twoheadedboy.

Quote

04-03-2009 , 09:25 PM

#196

domdomdom

stranger

Join Date: Apr 2009 Posts: 2

Thanks jason and twoheadedboy. When then do you use combinatorials?

Quote

04-04-2009 , 11:01 PM

#197

esbo2

banned

Join Date: Mar 2009 Posts: 368

What are the chances of being deal an Ace?

Quote

04-06-2009 , 01:59 AM

#198

Tiburon

adept

Join Date: Nov 2006 Posts: 740

As an undergrad accounting major would a minor in actuarial science be useful?

Quote

04-06-2009 , 09:22 AM

#199

twoheadedboy

journeyman

Join Date: Apr 2009 Posts: 254

Quote:

Originally Posted by Tiburon

As an undergrad accounting major would a minor in actuarial science be useful?

I'm an actuarial science master's student. What courses would be involved in the minor?

Quote

04-06-2009 , 10:38 AM

#200

Tiburon

adept

Join Date: Nov 2006 Posts: 740

Quote:

Originally Posted by twoheadedboy

I'm an actuarial science master's student. What courses would be involved in the minor?

Aside from three semesters of calc (business students at my school are only required to take one) I would need an intro probability and statistics course, theory of probability, random processes and applications, and a one-credit course that prepares you to for the first actuarial exam.

Quote

Page 8 of 24

First

3 4 5 6 7 8 9 10 11 12 13 18

Last

Post Reply Subscribe

...

Page 8 of 24

First

3 4 5 6 7 8 9 10 11 12 13 18

Last