Open Side Menu Go to the Top
Register
three weird probabilities three weird probabilities

11-10-2017 , 01:40 AM
The idea for this thread comes from a recent SMP thread. I thank that forum for letting me "borrow" the idea.

Let me break a large problem into three smaller problems.

(1) Given two vectors V1 and V2 of length N1 and N2, respectively, each consisting of unique integers from 1 to T (note that uniqueness only applies to each individual vector, there may be overlap between the two vectors) chosen at random.

What is the probability that the union of V1 and V2 contain all of the integers from 1 to T?

I believe that this is straightforward to derive. I have derived what I think is the correct formula. But I will not post anything at this time.

(2) Given three vectors V1, V2, and V3 of length N1, N2, and N3, respectively, each consisting of unique integers from 1 to T (note that uniqueness only applies to each individual vector, there may be overlap between the three vectors) chosen at random.

What is the probability that the union of V1, V2, and V3 contain all of the integers from 1 to T?

(3) Same problem as (2) except that we now further require that no union of any pair of vectors contain all of the integers from 1 to T. That is, V1 & V2 do not cover T; V1 & V3 do not cover T; and V2 & V3 do not cover T.

Perhaps there are people who may find this problem(s) interesting.
three weird probabilities Quote
11-10-2017 , 05:53 AM
(1) easy solutions first: p=0 if N1 + N2 < T
p=1 if max(N1, N2) = T

for the problem to make sense, this inequality needs to hold:
max(N1, N2) < T <= N1+N2

suppose for example N1=7, N2=4, T=9. first you pick min(N1,N2) = 4 elements from vector T, all will be unique. that leaves 5 unique elements inside. second pick will have 7 elements, you'll need to take 5 unique and 2 already contained.

in this case p = (9-4)C(9-4) * 4C(7- (9-4)) /(9C7) = 1/6
or in general p = min(N1,N2)C(max(N1,N2) - (T-min(N1,N2))) / (TC(max(N1, N2)))

edit: looks like that was the easy one maybe I'll look into 2) sometimes when I have time

Last edited by md46135; 11-10-2017 at 06:04 AM.
three weird probabilities Quote
11-10-2017 , 07:54 AM
for the three vector case;

As md46135 stated, max(N1, N2, N3) < T <= N1+N2+N3.

So we have three vectors, and we want to fill them with T integers.

Fill V1 randomly, there will now be T-N1 integers left to fill into V2 and V3.


So something along the lines of;

we need to find T-N1 unique elements in V2 and V3. There is a (N1+N3)/T (wrong!!) chance a randomly selected element in V2 will be globally unique and a (N1+N2)/T(wrong!!) chance an element in V3 will be globally unique.

Perhaps something like,

Sum i=0 to (T-N1)

Prob = prob+ (N2 choose i)*(binomial stuff) + (N3 choose (T-N1-i))*(binomial stuff)

Last edited by akkopower1; 11-10-2017 at 08:09 AM.
three weird probabilities Quote
11-10-2017 , 08:20 AM
Still need to figure out the probability that a randomly selected element in N2 is not in N1 or N3. That would just be, (T-number of unique elements in N1 U N3)/T.

or, (T-N1+number of elements in V3 that arent in V1)/T

(T-N1)/T or (1-N1/T) is the probability a randomly selected element in V3 isnt in V1

So, on average there would be (1-N1/T)*N3 elements in V3 that arent in V1.

So the probability that a randomly selected integer in V2 is not in V1 or V3 would be;

(T-N1+(1-N1/T)*N3)/T

which becomes,

(T-N1)*(T+N3)/T^2
three weird probabilities Quote
11-10-2017 , 09:06 AM
Last couple of lines were wrong,

So the probability that a randomly selected integer in V2 is not in V1 or V3 would be;

(T-N1-(1-N1/T)*N3)/T

which becomes,

(T-N1)*(T-N3)/T^2
three weird probabilities Quote
11-10-2017 , 12:23 PM
2) the condition for the range of T is as akko wrote.
suppose we pick N1 numbers first, all will be unique. after that we pick N2 with k unique items and N2-k items that were already included in the first pick. this alone would be written as:

sum(k=0, k=N2) binomial(T-N1, k)*binomial(N1, N2-k). from this arise 2 conditions for binomial coeffs to be valid: N2-k<=N1 and k<=T-N1. if those 2 aren't fulfilled, you don't sum.

next up is the 3rd choice of N3 items. T-k-N1 of these are unique and they all must be picked. of already picked items (which there are N1+k), we can pick N3-T+k+N1.

so the equation now looks like this:
sum(k=0, k=N2) binomial(T-N1, k)*binomial(N1, N2-k)*binomial(N1+k, N3-T+k+N1)

third binomial factor leads to condition N3-T+k+N1 <= N1+k, or just N3<=T which isn't really a problem since it's covered by the initial basic condition.

now the denominator: in the equation we're choosing 2 times - N2 items out of T and then N3. so denominator is binomial(T,N2)*binomial(T,N3)


and the final equation is
p = sum(k=0, k=N2) binomial(T-N1, k)*binomial(N1, N2-k)*binomial(N1+k, N3-T+k+N1) / (binomial(T,N2)*binomial(T,N3))

with the conditions for existence of binomial coeffs. I ran a simulation and this actually works.
three weird probabilities Quote
11-10-2017 , 05:07 PM
I have only skimmed the above posts so that I can post my own derivations without being accused of copying.

Without loss of generality, I assume T>N1>=N2>=N3>0.

Quote:
(1) Given two vectors V1 and V2 of length N1 and N2, respectively, each consisting of unique integers from 1 to T (note that uniqueness only applies to each individual vector, there may be overlap between the two vectors) chosen at random.

What is the probability that the union of V1 and V2 contain all of the integers from 1 to T?
Choose any of the V1 vectors. How many of the possible V2 vectors will fill in all the missing holes to achieve total coverage of 1 to T?

Clearly there are T-N1 holes to be filled. So T-N1 slots of V2 are spoken for.

This means that there are N2-(T-N1) slots available for duplication. And these available slots can take on T-(T-N1) values.

So, doing a little bit of algebra, this implies that this Prob:

= C(N1,N1+N2-T) / C(T,N2)

Quote:
(2) Given three vectors V1, V2, and V3 of length N1, N2, and N3, respectively, each consisting of unique integers from 1 to T (note that uniqueness only applies to each individual vector, there may be overlap between the three vectors) chosen at random.

What is the probability that the union of V1, V2, and V3 contain all of the integers from 1 to T?
This can be derived by considering choosing [V1,V2] at random, and then add [V3] using the formula derived above.

The number of duplicated elements in U[V1,V2] = D can take values 0,1,...,N2.

It is clear that the probability of two randomly chosen V1 and V2 vectors having D duplicated elements is:

P(D) = C(N1,D)*C(T-N1,N2-D) / C(T,N2)

Now let U[V1,V2] be denoted as U. And further let UU be the unique elements of U. Clearly, the number of elements of UU is N1+N2-D.

Using the same formula derived above, we see that the prob of having UU and V3 cover T to be:

= C(N1+N2-D,N1+N2-D+N3-T) / C(T,N3)

Combining these two formulas, that is, weighting these probs by the prob of each D, yields the overall prob:

= Sum[D=0,...,N2] C(N1,D)*C(T-N1,N2-D)/C(T,N2) * C(N1+N2-D,N1+N2+N3-T-D)/C(T,N3)

Quote:
(3) Same problem as (2) except that we now further require that no union of any pair of vectors contain all of the integers from 1 to T. That is, V1 & V2 do not cover T; V1 & V3 do not cover T; and V2 & V3 do not cover T.
I believe that I have correctly derived this formula, but I am not 100% sure. My method is to start with the prob derived in (2) and multiply it by the prob of having no pairwise coverage.

Of course, the pairwise coverage cases are as follows:

1. [V1,V2] only cover
2. [V1,V3] only cover
3. [V2,V3] only cover
4. [V1,V2] and [V1,V3] only cover
5. [V1,V2] and [V2,V3] only cover
6. [V1,V3] and [V2,V3] only cover
7. [V1,V2] and [V1,V3] and [V2,V3] cover

We want to find the probability of the converse case, of course.

The prob for each of the individual pairwise cases was derived in (1) above. I believe that independence can be used to show the prob of the "twin" pairwise cases is simply the product of the two individual pairwise probs. However, I believe that the "triple" pairwise case is not independent of the other cases.

I believe that the prob of no pairwise cases is equal to the following:

= 1 - [(1-P[(V1,V2) do not cover])*(1-P[(V1,V3) do not cover])*(1-P(V2,V3) do not cover]) + P[(V1,V2) cover]*P[(V1,V3) cover]*P[(V2,V3) cover]]

Call the above PNPC (for Prob of No Pairs Cover).

Then, the prob for (3) is:

= PNPC * Prob for (2) derived above.
three weird probabilities Quote

      
m