Two balls, two colors, well-posed? - Science, Math and Philosophy Forum

"Bayesian probability interprets the concept of probability as 'a measure of a state of knowledge', and not as a frequency...

The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability."[1]

"A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence....

A prior is often the purely subjective assessment of an experienced expert."[2]

"Subjective Bayesian[s]... claim the choice of the prior is necessarily subjective... [Objective Bayesians] claim that the prior state of knowledge [i.e. the information we have about the uncertain quantity] uniquely defines a prior probability distribution for well posed problems."[1]

[1] http://en.wikipedia.org/wiki/Bayesian_probability
[2] http://en.wikipedia.org/wiki/Prior_probability

Here is an example of a well-posed problem. Consider a die in the shape of a regular dodecahedron, with faces labeled 1-12. We are going to roll this die in the "standard" way. This is the only information we have. Let p_i be the probability that we roll an i. How should we determine the p_i's on the basis of only this information?

A common argument is this: "We have no reason to think any face is more or less likely to come up, so we should assign them all equal probabilities." This is a very sloppy way to put it. We need something more precise if we are going to be able to apply this to more complicated problems. Here is a better analysis.

For any i and j, there is a rotational symmetry of the die that sends i to j. Whatever probabilities we assign, they should remain unchanged under this symmetry, so we should have p_i = p_j. This is true for all pairs i and j, so it follows that every p_i should be the same, i.e. p_i = 1/12 for all i.

Now here is an example of a problem that is not well-posed. Consider a die in the shape of a hexagonal prism. Let the rectangular sides be numbered 2-7, and the hexagonal sides 1 and 8. We are going to roll this die in the "standard" way. This is the only information we have. Let p_i be the probability that we roll an i. How should we determine the p_i's on the basis of only this information?

In this case, we have two kinds of symmetry. One involves permuting the numbers 2-7, and the other involves swapping 1 and 8. What this tells us is that we should have

p_1 = p_8 and p_2 = p_3 = p_4 = p_5 = p_6 = p_7.

But beyond that, we cannot say. Our information does not tell us what the relative magnitudes of p_1 and p_2 ought to be.

Here is one more well-posed problem before I give the example that this thread is really about. We have an urn with 3 balls numbered 1-3. Each ball is either black or white. There is at least one ball of each color in the urn. That is all the information we have. There are 6 possibilities for the color combination in the urn:

BWW
WBW
WWB
WBB
BWB
BBW

We do not know which is the correct combination. What probabilities should we assign to these 6 possibilities? There are two types of symmetry here: we can permute the numbers on the balls, and we can swap B and W. Whatever probabilities we assign, they should remain unchanged under these symmetries.

By permuting the numbers, we find that

P(BWW) = P(WBW) = P(WWB), and
P(WBB) = P(BWB) = P(BBW).

By swapping B and W, we get

P(BWW) = P(WBB), etc.

Putting these together shows that all probabilities ought to be equal, and are therefore equal to 1/6.

Okay, at long last, here is the problem that prompted this post. We have two balls in an urn numbered 1 and 2. Each ball is either black or white. That is all the information we have. There are four possibilities:

BB, BW, WB, WW.

What probabilities should we assign to these four possibilities?

Swapping 1 and 2 shows that P(BW) = P(WB). Swapping B and W shows that P(BB) = P(WW). But this is not enough to determine the probabilities. Is there an objective way to assign the probabilities? In short, is this a well-posed problem?

Quote

01-28-2009 , 02:24 AM

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

I only read this question once, so I might not get what you are asking.
But I think there are infinitely many ways to assign the probabilities. A basis would be:

1. The simplest would be to make urn exchange and color exchange global symmetries.
[You have 4 balls, 1 coin. Flip for earn 1, then flip for earn 2]
2. Make 1 global and maximally violate the other.
[Flip a coin for urn 1, put other (or same) color in urn 2] OR [Put each (or same) color in one urn and flip for which is 1 and 2]
3. Maximally violate both.
[Urn 1 always has white, Urn 2 always black] or whatever colors

From these you can get anything you want including approximate symmetries.

EDIT: Oh and the question is is there an objective way to assign the probabilities. When I was 10 I would have said 1, at 15 I would have said any of those three scenarios are possible and I can't know which one without running a few experiments. Now I would say that it could be any combination of the basis I wrote and one must do it experimentally and we would only know for sure exactly what is correct if we do an infinite number of trials.

Last edited by Max Raker; 01-28-2009 at 02:30 AM.

Quote

01-28-2009 , 02:56 AM

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

Oh woops, I totally missed the details of the question. Should have reread it.

EDIT: The concept of what I says still applies I think. Even the condition that at least 1 of every color must be present could be determined experimentally.

Quote

01-28-2009 , 02:59 AM

punter11235

Carpal \'Tunnel

Join Date: Mar 2005 Posts: 8,210

Very interesting. First I thought "it must be well posed" but I came up with arguments for different distributions. Is "well posed" defined formally ? If not "objective Bayesians" don't convince me at all.

Quote

01-28-2009 , 02:59 AM

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

I'm with you through the geometric shapes- the symmetries are physical and the lack of the h/w ratio in the prism is also a physical quantity that isn't specified (and could result in p(1 or 8) being anything from 0<p<1. When you get to the balls, what lets you claim B/W or 1/2/3 symmetry? How is this different from 1 ball that's either B or W?

Quote

01-28-2009 , 03:18 AM

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Actually, it's clearly not well posed. I'm just not sure the one before it is well-posed either. If we assume your 3-ball example is well-posed, and we also assume that this example is well posed:

Randomly (fairly) select a b/w ball and write 1 on it.
Randomly (fairly) select a b/w ball and write 2 on it.
Randomly (fairly) select a b/w ball and write 3 on it.
Put them in an urn.

It's obvious (by independence and 50/50 selection) that each of the 8 possibilities are equally likely.

So, now, we can get to your "special" problem by having our beautiful assistant look in the urn, remove ball 3 without showing it to us, and then asking us what the probabilities should be for balls 1 and 2.

If this was from your example, then BW and WB are twice as likely as WW/BB.
If this was from my example, then BW and WB are equally likely as WW/BB.

Both fit the parameters of the information given, but two different answers are possible.

Quote

01-28-2009 , 03:26 AM

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Yeah, your 3-ball example isn't well posed either. It can come from the 8 equally likely, discard WWW/BBB, then roll a 6-sided die (which is your distribution), or it could also come from a similar 4-ball example:

16 equally likely, discard all combinations where 3 consecutive numbers (either 1/2/3 or 2/3/4) are the same color, then roll a 10-sided die. Then get rid of ball 4.

So you have

WWWW
WWWB
WWBW
WWBB
WBWW
WBWB
WBBW
WBBB
BWWW
BWWB
BWBW
BWBB
BBWW
BBWB
BBBW
BBBB

whittled down to

WWBW
WWBB
WBWW
WBWB
WBBW
BWWB
BWBW
BWBB
BBWW
BBWB

Now WW/BB is in a 4:6 ratio to WB/BW instead of your 1:2 ratio, still within the parameters of the problem.

Quote

01-28-2009 , 10:34 AM

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Ok, upon further reflection, the last 2 posts are kind of pointless, but the original question remains.

Quote

01-28-2009 , 03:08 PM

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by Max Raker

The concept of what I says still applies I think. Even the condition that at least 1 of every color must be present could be determined experimentally.

I do not believe an experiment would be relevant. The question is whether a unique set of probabilities can be objectively inferred from a given set of information. If an experiment convinces you that the inferred probabilities are wrong, that might just mean your information was wrong, and not that your process of inference was incorrect. Conversely, it is conceivable that you might arrive at probabilities that agree with experiment through a reasoning process that is entirely subjective, or even logically contradictory.

Quote

01-28-2009 , 03:10 PM

#10

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by punter11235

It is not defined mathematically, of course. Objective Bayesianism is a philosophical position, so the best you can get is a philosophical definition. I think it would be fair to say that a problem is "well-posed" if there exists a set of objective principles which, when applied to the information in the problem, yields a unique set of probabilities. The natural follow-up question is: what is an "objective principle." I believe the standard answer is: objective principles include, but are not limited to, the principle of indifference, the principle of maximum entropy, and the method of transformation groups; the search for other objective principles is a matter of ongoing research.

Quote

01-28-2009 , 03:13 PM

#11

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by TomCowley

In objective Bayesianism -- in particular, in the application of the principle of indifference -- informational symmetry is the key. This may or may not result from a physical symmetry described in the given information.

Informational symmetry means roughly this. Suppose I have two hypotheses, H1 and H2. I also have some information about them, call it J. Now suppose I relabel the hypotheses by defining H1' = H2 and H2' = H1. If my information J tells me the same things about (H1', H2') as it does about (H1, H2), then J ought to generate the same probabilities for each pair. But that means, relative to J, the probabilities for H1 and H2 ought to be equal.

I think most objective Bayesians would consider the 1 ball problem well-posed. The information is J = "There is one ball. It is either black or white." The hypotheses are H1 = "The ball is black" and H2 = "The ball is white." If J is our only information, then it hardly makes any sense to assign probabilities other than 1/2.

Now, technically, we could nitpick here. The information J tells me that the ball is darker under H1 than under H2. It does not tell me the same thing when I switch the labels. The problem is that the words "black" and "white" are not just labels used to index two indistinguishable properties. They refer to specific colors that are qualitatively different. If we allow this nitpick, then the problem is not well-posed and we cannot assign probabilities. The nitpick is easily corrected, though, if we change J to "There is one ball. It is painted one of two distinct colors which we arbitrarily denote by B and W."

(Incidentally, the nitpick is not entirely without merit. If we had a different problem where J = "I am about to flip a coin in the 'standard' way. The coin has one flat side and one rounded side." In this case, "flat" and "rounded" are not just labels used to index two indistinguishable sides. The words are actually describing the sides and giving information about them which is not preserved when we swap the two words.)

Quote

01-28-2009 , 06:24 PM

#12

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Fair enough. For your 3-ball case, why can't you use the indifference principle on conditional probabilities: P(3=w|1=w,2=b) and P(3=b|1=w,2=b). Both are legal outcomes, and the problem statement and the (assumed) information don't create a bias. These should be equal, and then by doing a few more, all 6 possibilities are equal.

Quote

01-28-2009 , 07:50 PM

#13

gumpzilla

Carpal \'Tunnel

Join Date: Feb 2005 Posts: 13,973

Is there no method of combining two well-posed 1 ball problems? Obviously your hexagonal prism shows that you can't arbitrarily combine two well-posed problems (rec. faces 1-6 and hex faces 1 or 2) and get a well posed problem, but in that case you aren't joining two of the same thing. I guess what I'm imagining is that the argument you make for showing that 1 ball is well posed - there's no information that plausibly suggests anything else as a prior - could also be used to show that in the 2 ball case there's nothing to suggest a correlation between the two balls that would lead to anything other than the 1/4 probabilities we'd expect.

EDIT: I suspect you're going to say that I'm implicitly (or maybe explicitly) exploiting number symmetry exchange with this, and perhaps that is so. Put more succinctly, I guess I'm wondering why it wouldn't be acceptable to claim informational agnosticism about correlations to get 1/2^n as your probabilities when you're combining n identical systems.

Quote

01-28-2009 , 07:56 PM

#14

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

Quote:

Originally Posted by jason1990

I was assuming that we just walk in and the system is set up and that we can keep repeating this. I'm not sure what you mean by your information could be wrong.

Quote

01-28-2009 , 08:08 PM

#15

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Quote:

Originally Posted by gumpzilla

Given finite N>1 objects with finite characteristic possibilities, and a number K 1<=K<=N-1, if you can claim indifference for conditional probabilities of K given characteristics and N-K "variable" characteristics (so like P(N-K objects' characteristics = something | K objects' characteristics = something) where the results are legal (P isn't 0 by definition, like P(3=b|1=b,2=b)=0 by definition in the 3-ball case)), and I don't see why that isn't valid, then you can get that every possibility for a finite number of objects with finite characteristic possibilities is equally likely, barring something in the problem statement that creates a bias between legal full-set possibilities.

Quote

01-28-2009 , 08:33 PM

#16

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by TomCowley

This looks fine. When we do this, we are essentially creating a new problem, whose information is:

We have an urn with 3 balls numbered 1-3. Each ball is either black or white. There is at least one ball of each color in the urn. Ball 1 is white and ball 2 is black.

We then ask, what is the probability that ball 3 is white? Now we have to be careful here. This new information is no longer symmetric under the permutation B ↔ W. It is, however, symmetric under the joint permutation of B ↔ W and 1 ↔ 2. Under this joint permutation, "ball 3 is white" becomes "ball 3 is black," so these hypotheses should have the same probability.

I do not see this working, though, for the 2 ball problem. Suppose we try to argue that P(2=w|1=w) = P(2=b|1=w). We create the new information:

We have two balls in an urn numbered 1 and 2. Each ball is either black or white. Ball 1 is white.

This information is not symmetric under any permutation of the labels.

Quote

01-28-2009 , 08:48 PM

#17

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by gumpzilla

I guess I'm wondering why it wouldn't be acceptable to claim informational agnosticism about correlations to get 1/2^n as your probabilities when you're combining n identical systems.

This sounds more like strong atheism about correlations. Independence is an extremely strong assumption in any model, and I would tend to think one ought not to assume it out of ignorance. Nonetheless, I would love to see an argument that derives independence from the indifference principle. I do not have one at the moment.

I am skeptical, though, because of the following example. Here is your information:

We have 1000 balls in an urn, numbered 1-1000. Each ball is either black or white.

The question is, what is the probability that the number of white balls in the urn is somewhere between 400 and 600? Let us suppose you are right. Then we are objectively entitled to infer from this information that every one of the 2^1000 possible color combinations is equally likely. We can then calculate the probability that the number of white balls is between 400 and 600. It is about 0.999999999746. How can we be that certain on the basis of such meager information?

Quote

01-28-2009 , 08:52 PM

#18

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

I'm just not seeing the logical justification for only allowing those permutations. If we modify your problem and say that WBW is not a legal possibility (leaving 5), does the problem suddenly become malformed since pairwise number exchange and full color exchange symmetries aren't universally valid?

Quote

01-28-2009 , 09:03 PM

#19

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by Max Raker

I was assuming that we just walk in and the system is set up and that we can keep repeating this.

Quote:

Originally Posted by Max Raker

I'm not sure what you mean by your information could be wrong.

Suppose I am playing poker. I am told that the deck is a well-shuffled, ordinary, 52-card pack of playing cards. On the basis of that information, I can calculate all kinds of probabilities. If my information was wrong, and the deck was stacked, or it was a trick deck, then experiment will eventually reveal that my probabilities were wrong. But that does not mean my method of inference was wrong.

The issues here all revolve around the methods used to take some given information and turn it into probabilities.

Quote

01-28-2009 , 09:05 PM

#20

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Quote:

Originally Posted by jason1990

We have 1000 balls in an urn, numbered 1-1000. Each ball is either black or white.

This looks like a straw man- you can't be sure that the 1-ball probability isn't 0 or 1 because you know nothing about the selection process. You basically just call it .5 because you inherently assume, absent further information, that any potential bias to B could just as well be a bias to W, and they cancel (if you don't, there's no basis for assuming label symmetry). It's the same argument for 1000 balls- any bias towards one legal individual configuration could just as easily be a bias towards any other legal individual configuration, so barring any knowledge of these biases, you would treat them all as equally likely (which leads to the .9999999 above). Why this assumption is restricted to label equivalence and not legal configuration equivalence is what I'm not getting so far.

Quote

01-28-2009 , 09:31 PM

#21

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by TomCowley

If we modify your problem and say that WBW is not a legal possibility (leaving 5), does the problem suddenly become malformed

The original information was

J = We have an urn with 3 balls numbered 1-3. Each ball is either black or white. There is at least one ball of each color in the urn.

We would now like to add

E = The color combination is not WBW.

So the new information is J' = E & J. To show that this problem is well-posed, we need to find an objective way of using J' to determine the probabilities for the 5 possibilities BWW, WWB, WBB, BWB, BBW. The indifference principle is not our only objective tool. We may also use the laws of probability. According to the laws of probability,

P(E & BWW | J) = P(E | J)P(BWW | E & J)
= P(E | J)P(BWW | J').

The indifference principle tells us that P(E & BWW | J) = 1/6 and P(E | J) = 5/6, so we solve to get P(BWW | J') = 1/5. Similar calculations show that, under J', all 5 possibilities are equally likely. This was determined using the indifference principle, the laws of probability, and algebra, all of which are objective (at least according to objective Bayesians), so this problem is well-posed.

Quote

01-28-2009 , 10:11 PM

#22

Max Raker

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 15,738

Quote:

Originally Posted by jason1990

No, there is no repetition. There are just some hypotheses and some information about those hypotheses. The information does not tell us whether the hypotheses are true or false, so we wish to assign probabilities. The question is whether the information provided is sufficient to allow us to do that in some objective manner.

Suppose I am playing poker. I am told that the deck is a well-shuffled, ordinary, 52-card pack of playing cards. On the basis of that information, I can calculate all kinds of probabilities. If my information was wrong, and the deck was stacked, or it was a trick deck, then experiment will eventually reveal that my probabilities were wrong. But that does not mean my method of inference was wrong.

The issues here all revolve around the methods used to take some given information and turn it into probabilities.

Oh, i guess this is more philosophy, sorry if i took you off track

Quote

01-28-2009 , 10:15 PM

#23

TomCowley

Pooh-Bah

Join Date: Sep 2004 Posts: 5,649

Ok, so let's just define the problem as "J = We have an urn with 3 balls numbered 1-3. The urn either contains WBB, WWB, BWB, BWW, or BBW."

The symmetries you're invoking for the 6-ball case (including WBW) have no basis in the reality of the problem as posed. There is no full or partial color inversion symmetry among the possibilities, and the only number exchange is 1-3, but that's not sufficient by itself. What gives you the right to claim all the symmetries to use in indifference? They really seem plucked right out of thin air in this case. If there's a synopsis somewhere that justifies this while excluding legal configuration indifference, please link it- it'll save both of us some time.

Quote

01-29-2009 , 10:57 AM

#24

jason1990

old hand

Join Date: Sep 2004 Posts: 1,896

Quote:

Originally Posted by TomCowley

What gives you the right to claim all the symmetries to use in indifference? They really seem plucked right out of thin air in this case. If there's a synopsis somewhere that justifies this while excluding legal configuration indifference, please link it- it'll save both of us some time.

The question in the OP is whether the 2-ball problem is well-posed, in the context of objective Bayesianism. I have tried to explain that context and I think you understand it quite well. You simply do not agree with it. That is fine, of course, but it is quite irrelevant to the question in the OP.

If you really want the full details, you will not get them from some linked synopsis. You will have to read quite a bit. To point you in the right direction, I am describing here the philosophy of objective Bayesianism, as developed on the basis of the Cox-Pólya-Jaynes desiderata. Read at least the first two chapters of this book. The indifference principle is derived in Chapter 2, and problems similar to the ones in this thread are considered in Chapter 6. The derivation of the indifference principle is based on Desideratum (IIIc), which says,

"Equivalent states of knowledge must be always represented by equivalent probability assignments. That is, if in two problems our state of knowledge is the same (except perhaps for the labeling of the propositions), then we must assign the same probabilities in both."

Quote:

Originally Posted by TomCowley

Ok, so let's just define the problem as "J = We have an urn with 3 balls numbered 1-3. The urn either contains WBB, WWB, BWB, BWW, or BBW."

All right. We begin the proof of well-posedness by first observing that J is logically equivalent to J1 & J, where J1 = "We have an urn with 3 balls numbered 1-3. The urn either contains WBW, WBB, WWB, BWB, BWW, or BBW." As before, we apply the indifference principle to J1, and then combine the two information sets using the laws of probability. QED

This proof is 100% logically correct. You seem to want a proof that uses only the indifference principle, as though the laws of probability are out of bounds, or somehow invalid. This is, of course, ridiculous from an objective Bayesian standpoint. The indifference principle and the laws of probability are both derived from the desiderata. No objective Bayesian believes that all probabilities can be calculated using only the indifference principle.

Quote:

Originally Posted by TomCowley

This looks like a straw man

This example is actually very germane to the topic. One of the applications of this subject is in machine learning. For example, imagine there is an urn with 2500 balls, some white and some black. A machine is going to sample from the urn with replacement, and use those samples to try to learn about the actual contents of the urn. The machine uses Bayes theorem to update its estimates after each sample. To get started, the machine needs a prior. The prior should be an uninformative prior, since the machine initially has no information about the contents of the urn.

The independence assumption is virtually useless in Bayesian machine learning because it produces a prior on the proportion of white balls that is very nearly a delta function. In informal language, the machine would start off so strongly convinced that half the balls are white, that it would take an enormous number of samples before it could be convinced it was wrong. Even if its first 200 samples were all white, its estimate on the proportion of white balls would only increase to about 54%.

If it samples without replacement, it is even worse. In that case, it cannot learn anything at all about the unsampled balls. In the above Jaynes book, this is called the binomial monkey prior, and is discussed in Section 6.7. It is nearly the exact opposite of what an uninformative prior is supposed to be. No serious Bayesian would ever use it in an application.

In practice, the most commonly used prior in this situation is the one that is uniform on the number of white balls. In the 2-ball example, that would give P(BB) = P(WW) = 1/3 and P(BW) = P(WB) = 1/6. My question is whether this, or any other distribution, can be derived objectively (i.e. from the desiderata, or if you cannot be bothered to read Jaynes, from the principles and methods I have described in this thread).

Incidentally, if we were permitted to permute things other than labels, then we could simply permute the number of white balls to derive P(BB) = 1/3. In fact, I believe this illustrates one reason why we should not be permitted to permute things other than labels. Labels do not carry information. They are simply bookkeeping devices that index our hypotheses. By restricting ourselves to label permutations, I believe we entirely avoid situations where one kind of permutation leads to one distribution, while another permutation leads to something different.

Last edited by jason1990; 01-29-2009 at 11:10 AM.

Quote

01-29-2009 , 01:22 PM

#25

punter11235

Carpal \'Tunnel

Join Date: Mar 2005 Posts: 8,210

Thanks for this explanation Jason. I can't answer the original question but I just want to say you are my favorite poster on 2p2.
I am ordering the Jaynes book right now to learn more about the topic

Quote

Page 1 of 2

First

1 2

Last

Post Reply Subscribe

...

Page 1 of 2

First

1 2

Last