lottery probability - Gambling and Probability

Two Plus Two Forums Other Topics Probability

lottery probability

Post Reply Subscribe

...

Page 1 of 5

1 2 3 4 5

Page 1 of 5

1 2 3 4 5

04-21-2012 , 01:29 PM

agentbartowski

stranger

Join Date: Apr 2012 Posts: 6

In the 6/55 lottery, what is the probability that a random draw will give a combination that has 2 numbers from 1 to 18, 2 numbers from 19 to 36 and another 2 numbers from 37 to 55.

Quote

04-21-2012 , 01:45 PM

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by agentbartowski

In the 6/55 lottery, what is the probability that a random draw will give a combination that has 2 numbers from 1 to 18, 2 numbers from 19 to 36 and another 2 numbers from 37 to 55.

Oops, disregard. See next post.

C(18,2)*C(18,2)*19 / C(55,6)

=~ 1.53%

Last edited by BruceZ; 04-21-2012 at 02:29 PM.

Quote

04-21-2012 , 01:49 PM

agentbartowski

stranger

Join Date: Apr 2012 Posts: 6

Quote:

Originally Posted by BruceZ

C(18,2)*C(18,2)*19 / C(55,6)

=~ 1.53%

why is it just "*19" on the third multiplication term? isn't it supposed to be *C(19,2)?

Quote

04-21-2012 , 02:27 PM

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by agentbartowski

why is it just "*19" on the third multiplication term? isn't it supposed to be *C(19,2)?

Oh yeah, sorry.

Trying to do 2 things at once.

C(18,2)*C(18,2)*C(19,2) / C(55,6)

=~ 13.8%

Quote

04-23-2012 , 08:22 AM

zemlik

newbie

Join Date: Jan 2012 Posts: 24

how do you work out the probability of 3 consecutive numbers occurring in 6/49 lottery ?

Quote

04-23-2012 , 03:36 PM

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by zemlik

how do you work out the probability of 3 consecutive numbers occurring in 6/49 lottery ?

It's trickier than it looks.

[C(46,3) + 46*C(45,3) - C(44,2)] / C(49,6)

= 666974/13983816 =~ 4.769614%

If you're not careful, you will double count the cases where there are 2 sets of 3 consecutive numbers, or count multiple times the cases where there are 4,5, or 6 consecutive numbers. The trick is to count the number of ways to choose the FIRST 3 consecutive numbers, and then count the ways to choose the remaining numbers.

Say the numbers go from 1-49. There are 47 numbers on which the 3 consecutive can start since they can't start on 48 or 49. If the first 3 consecutive start at 1, then there are C(46,3) ways to choose the other 3. If the first 3 consecutive start on one of the other 46 numbers, the remaining numbers can't contain the number immediately prior to these 3 to make 4 consecutive because then there would be 3 consecutive that occur first. So the other numbers must be chosen from 45 possible numbers. There are C(45,3) ways to choose 3 numbers; however, some of these will still contain a separate group of 3 consecutive numbers that occur first. So we have to subtract these cases. But the number of such cases will depend on what number the 3 consecutive that we want to be first will start.

If they start at 1,2, 3, or 4, then there are no ways that 3 consecutive can occur first. Remember, when the first 3 consecutive start at 4, we have already excluded 123 which is the first part of the trick. If they start at 5, there is 1 way (123). If they start at 6, there are 2 ways (123, 234), and in general, if they start at n up to n=47, there are n-4 ways that 3 consecutive can occur first. So the number of cases that we need to subtract is

1 + 2 + 3 + ... + 43 = 43*44/2 = C(44,2).

So we subtract this term, and divide everything by C(49,6) total ways to choose the numbers to get the probability.

I hadn't considered this problem before. It is trickier than counting the number of ways to make a straight with 7 cards because of the possibility of making 2 separate "straights". I wrote a quick R program to verify the above calculation by enumerating every possible set of 6 numbers and counting the ones that contain 3 consecutive. It counted exactly 666974 just as we computed above.

Code:

count = 0
for (i in 1:44) {
  for (j in (i+1):45) {
     for (k in (j+1):46) {
      for (m in (k+1):47) {
        for (n in (m+1):48) {
          for (o in (n+1):49) {
            if ((k-i == 2) | (m-j == 2) | (n-k == 2) |(o-m ==2)) {
               count = count + 1
            }
          }
        }
      }
    }
  }
}

count
count/choose(49,6)

Output:

> count
[1] 666974
> count/choose(49,6)
[1] 0.04769614

Last edited by BruceZ; 04-23-2012 at 04:01 PM. Reason: 666974, not 999674

Quote

04-24-2012 , 10:02 AM

zemlik

newbie

Join Date: Jan 2012 Posts: 24

Quote:

Originally Posted by BruceZ

> count
[1] 666974
> count/choose(49,6)
[1] 0.04769614

thanks Brucez, I'm going to have to go over that a few times and "R" looks an interesting tip.
I have wondered if the "lucky dip" tickets that one gets in the UK might be biased to generate probably loosing numbers as I
often seem to get tickets with 3 consecutive numbers and I imagine these have less chance of winning ?
zem

Quote

04-24-2012 , 02:01 PM

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by zemlik

thanks Brucez, I'm going to have to go over that a few times and "R" looks an interesting tip.

R is great, and I use it a lot, but for this type of problem you're probably better off using almost anything else. It took me 7 minutes just to run through the 14 million possible tickets. They tell you that for loops are slow in R, and that to use it optimally you have to take a "whole object view" by operating on whole vectors and matrices and the like. So I wrote the extremely short program below to compute this probability. That's the whole program - those 2 statements. It uses no for loops. Now it takes over half an hour! I think I'll use Sage from now on unless someone can show me a faster way to do this. Are you reading this Sherman?

Code:

consec = function(x) {
  any((c(x,0,0) - c(0,0,x))[3:6] == 2)
}

sum(apply(combn(1:49,6), 2, consec)) / choose(49,6)

This code uses the combn function to generate all possible tickets, placing each one in a column of a 2-D array. That function takes a lot of time, much longer than my for loops which were already slow. Then I treat each ticket as a vector, and subtract the vector from a version of itself that is shifted by 2. Then any 3 consecutive will appear as a difference of 2 for some element 3 to 6 of the result vector.

Quote:

I have wondered if the "lucky dip" tickets that one gets in the UK might be biased to generate probably loosing numbers as I
often seem to get tickets with 3 consecutive numbers and I imagine these have less chance of winning ?

Your tickets have 3 consecutive, or the winning numbers have 3 consecutive? As long as the winning numbers are drawn randomly, any combination of 6 numbers is as likely as any other to win a prize.

If there is a prize for matching say 3 numbers, and you play multiple tickets, then the probability of winning a prize will depend on how you choose your tickets, but the EV will always be the same for a given number of tickets because the times that you have a smaller probability of winning will be exactly offset by the times that you win on more than 1 ticket.

Last edited by BruceZ; 04-24-2012 at 02:35 PM.

Quote

04-24-2012 , 02:47 PM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by BruceZ

Yes. I am reading this. I'm not sure if I can come up with a faster way to do it or not though. Largely because I don't understand the general problem that is trying to be answered. But looking at your code Bruce, usually going the apply route is way better than using loops (I used to write everything in loops until I learned how to use the apply family of functions and then I had to go back and rewrite a bunch of things). I'm not sure what to say about the slowness of this particular function though.

The consec function you build isn't clear to me. Why are you selecting elements 3:6 from the resultant vector and then checking to see if any of them is equal to 2? Again, this probably reflects my lack of understanding of the problem.

Is it just your way of checking to see if there are consecutive numbers next to each other? If so there is a function called "rle" which might be useful for you here. I'm not sure if it will speed things up for you or not though.

Quote

04-24-2012 , 03:06 PM

#10

zemlik

newbie

Join Date: Jan 2012 Posts: 24

without wanting to extend this unduly, to try to be clear.
the UK lottery picks random numbers 6 out of 49 (and a bonus ball which we can ignore)
When you buy a ticket at a kiosk or online you can select a "lucky dip" where the machine selects your numbers for you.
I noticed that recently I, more often than not, got 3 consecutive numbers out of the 6 numbers on my ticket.
I wondered if there was less possibility of the draw having 3 consecutive numbers than not.
As they sell more tickets as the prize rolls over it occured to me if that was true then there might be a bias in the " lucky dip" to select consecutive numbers.
When the Irish Lotto started the 3 guys who won it by buying loads of tickets certainly didn't buy 1,2,3,4,5,6 for example, as far as I know.

Quote

04-24-2012 , 03:24 PM

#11

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by zemlik

Of course it is less likely for a randomly drawn set of 6 out of 49 numbers (with replacement) to share 3 numbers than to "not share" 3 numbers. But that doesn't have any effect of your probability of winning. The probability of any combination of numbers, shared or unshared, is the same as any other combination of numbers.

To make this clear:

What is the probability of the numbers coming:

1, 1, 1, 1, 1, 1?

Answer = 1/49^6

What is the probability of the numbers coming:

1, 1, 1, 17 35, 22?

Answer = 1/49^6

What is the probability of the numbers coming:

8, 14, 3, 29, 17, 12?

Answer = 1/49^6

Any combination of numbers is equally likely so long as the drawing is random. So the machine isn't screwing you. Sorry.

Quote

04-24-2012 , 03:45 PM

#12

zemlik

newbie

Join Date: Jan 2012 Posts: 24

yes I know that any combination is equally likely and I can toss a coin 20 times heads and then put it in my pocket and go on holiday and then toss it a week later and it would be still be 50/50 but because "say" we already have out 2,3,4,5 lottery number balls the chances that the other two are going to be 6 and 7 is less than any other combination of numbers it seems to me.

Quote

04-24-2012 , 04:03 PM

#13

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by zemlik

It better not be less or the drawing is unfair. The probability of every combination is exactly the same.

You should get 3 consecutive less than 1 time in 20, both on your random ticket, and in the random drawing. If you get 3 consecutive "more often than not", then something is wrong.

Quote

04-24-2012 , 04:21 PM

#14

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by Sherman

Yes. I am reading this. I'm not sure if I can come up with a faster way to do it or not though. Largely because I don't understand the general problem that is trying to be answered.

We want the probability of 3 consecutive numbers when 6 numbers are drawn out of 49. The draw is without replacement, so you can't have duplicate numbers.

Quote:

The consec function you build isn't clear to me. Why are you selecting elements 3:6 from the resultant vector and then checking to see if any of them is equal to 2? Again, this probably reflects my lack of understanding of the problem.

A 2 in positions 3:6 after the shift and subtract means that there were 3 consecutive numbers in the original ticket. The numbers in the original ticket are in ascending order, so a difference of 2 means that the nth and n+2nd number differed by 2 for some n, so the nth, n+1st, and n+2nd numbers were consecutive. For example:

> c(1,3,4,5,7,9,0,0) - c(0,0,1,3,4,5,7,9)
[1] 1 3 3 2 3 4 -7 -9

The 2 is there because of the 3,4,5. The 0's are added to the first one to make the arrays the same length. I tried not adding the 0's to the first one to see if it would be faster, but it is much slower, and it generates warnings. I suppressed the warnings, but it was still much slower.

Quote:

Is it just your way of checking to see if there are consecutive numbers next to each other? If so there is a function called "rle" which might be useful for you here. I'm not sure if it will speed things up for you or not though.

That counts runs of the same number. We want 3 consecutive numbers. I could subtract a single shifted version and look for runs of 1's of length 2 or more in the result, but I doubt that would be faster than what I have. The combn function is taking most of the time. It's supposed to be a fast version of it too.

I used to wonder how they expect you to get around the main for loop in a Monte Carlo sim. I didn't think they would expect you to allocate an array the size of your sim until I realized that a for loop from 1:n actually allocates an array of size n. If you make n too big, it complains that you are out of memory. You can make nested loops of size n1 and n2, and then it only allocates n1+n2, not n1*n2 even though it does n1*n2 iterations.

Last edited by BruceZ; 04-24-2012 at 05:21 PM.

Quote

04-24-2012 , 04:25 PM

#15

zemlik

newbie

Join Date: Jan 2012 Posts: 24

Quote:

Originally Posted by BruceZ

I noticed it a few months ago where I was getting 3 consecutive numbers on 5 following days which made me aware of the phenomena. I noticed I "always" am given 2 consecutive numbers in the 6 numbers and " more often than not" 3 consecutive numbers and the other week 4 consecutive numbers.
I don't know if there are others in the UK have noticed this phenomena ?

Quote

04-24-2012 , 06:07 PM

#16

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by BruceZ

We want the probability of 3 consecutive numbers when 6 numbers are drawn out of 49. The draw is without replacement, so you can't have duplicate numbers.

A 2 in positions 3:6 after the shift and subtract means that there were 3 consecutive numbers in the original ticket. The numbers in the original ticket are in ascending order, so a difference of 2 means that the nth and n+2nd number differed by 2 for some n, so the nth, n+1st, and n+2nd numbers were consecutive. For example:

> c(1,3,4,5,7,9,0,0) - c(0,0,1,3,4,5,7,9)
[1] 1 3 3 2 3 4 -7 -9

The 2 is there because of the 3,4,5. The 0's are added to the first one to make the arrays the same length. I tried not adding the 0's to the first one to see if it would be faster, but it is much slower, and it generates warnings. I suppressed the warnings, but it was still much slower.

That counts runs of the same number. We want 3 consecutive numbers. I could subtract a single shifted version and look for runs of 1's of length 2 or more in the result, but I doubt that would be faster than what I have. The combn function is taking most of the time. It's supposed to be a fast version of it too.

I used to wonder how they expect you to get around the main for loop in a Monte Carlo sim. I didn't think they would expect you to allocate an array the size of your sim until I realized that a for loop from 1:n actually allocates an array of size n. If you make n too big, it complains that you are out of memory. You can make nested loops of size n1 and n2, and then it only allocates n1+n2, not n1*n2 even though it does n1*n2 iterations.

Not enough time to read everything here very carefully, but I see what you mean about rle not applying here. What about diff()? The diff function computes the difference between consecutive numbers in an array and returns the differences. You could combine that result with rle() looking for run lengths of 2 zeros which would indicate that three consecutive numbers were equal. I'm not sure if that would help speed things up here or not because you basically need to first create a matrix of every possible combination of orders.

Quote

04-25-2012 , 12:10 AM

#17

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by Sherman

I'd have to pick out the length corresponding to 1 somehow, and it isn't always in the same place in the data frame. The longest run is listed first, and that doesn't always correspond to 1. Just those 2 functions rle and diff seem to take much longer than what I had. diff serves a similar purpose to my shift and subtract, but rle is more involved than the any function. But even if I do absolutely nothing in that function, it still takes longer than the for loops by almost a factor of 3. The combn function alone is taking up over half the time. The only way to make it faster is to find a way to do it without creating the matrix of all combinations using the combn function. Of course the fastest way to do it is to just solve the problem analytically like I did. Still, this doesn't seem to be something that R is good at, and the idea that using apply is faster than using for loops doesn't seem to always be true. Certainly using apply to execute a function repeatedly is faster than calling the same function in for loops, but using the combn function with apply is not as fast as the for loops without the combn function. The combn function has to work for any inputs, while my loops are set up for a specific number (6 loops for 6 numbers) so that isn't a fair comparison. The question is whether there is a faster way to process these specific inputs faster without using loops. I doubt it.

The combn function is just written in R, not compiled, and contains a while loop which I haven't found to be any more efficient than for loops; in fact I've measured them as being slower. There should be a compiled or fast hand optimized routine for combn. Perhaps I'll try to write one in assembly language. I'll bet I could do this in seconds rather than minutes. If I'm going to play with the lotto covering problem, I'll need something like this anyway. Plus it gives me an excuse to build a high speed multi-core multi-processor machine. This problem is highly parallelizable and could probably be done in a fraction of a second on such a machine. I'm used to using processors with hardware do loops and pipelining that could generate almost 1 combination per clock cycle per core.

EDIT: I found a document detailing how to write a C++ routine for this very purpose! http://www.biostatisticien.eu/textes/rc0408.pdf under "A C++ Combn Function". They complain about how slow the R version is too and say that theirs is much faster.

Last edited by BruceZ; 04-25-2012 at 08:04 AM.

Quote

04-25-2012 , 11:10 AM

#18

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by BruceZ

I'd have to pick out the length corresponding to 1 somehow, and it isn't always in the same place in the data frame. The longest run is listed first, and that doesn't always correspond to 1. Just those 2 functions rle and diff seem to take much longer than what I had. diff serves a similar purpose to my shift and subtract, but rle is more involved than the any function. But even if I do absolutely nothing in that function, it still takes longer than the for loops by almost a factor of 3. The combn function alone is taking up over half the time. The only way to make it faster is to find a way to do it without creating the matrix of all combinations using the combn function. Of course the fastest way to do it is to just solve the problem analytically like I did. Still, this doesn't seem to be something that R is good at, and the idea that using apply is faster than using for loops doesn't seem to always be true. Certainly using apply to execute a function repeatedly is faster than calling the same function in for loops, but using the combn function with apply is not as fast as the for loops without the combn function. The combn function has to work for any inputs, while my loops are set up for a specific number (6 loops for 6 numbers) so that isn't a fair comparison. The question is whether there is a faster way to process these specific inputs without using loops. I doubt it.

The combn function is just written in R, not compiled, and contains a while loop which I haven't found to be any more efficient than for loops; in fact I've measured them as being slower. There should be a compiled or fast hand optimized routine for combn. Perhaps I'll try to write one in assembly language. I'll bet I could do this in seconds rather than minutes. If I'm going to play with the lotto covering problem, I'll need something like this anyway. Plus it gives me an excuse to build a high speed multi-core multi-processor machine. This problem is highly parallelizable and could probably be done in a fraction of a second on such a machine. I'm used to using processors with hardware do loops and pipelining that could generate almost 1 combination per clock cycle per core.

EDIT: I found a document detailing how to write a C++ routine for this very purpose! http://www.biostatisticien.eu/textes/rc0408.pdf under "A C++ Combn Function". They complain about how slow the R version is too and say that theirs is much faster.

This is where I was going next if you were still having trouble, but it seems you already have it. I almost wrote that the problem is the combn() function in my first post, but I thought it better to try some other solutions to see if they would help first.

I see the while loop in the combn function (now that I actually looked at it) and that is almost certainly the problem. Loops in R are just bad and need to be avoided whenever possible. But of course, as you pointed out above there are times where loops are necessary and work faster (e.g. resampling and simulation studies).

It would be nice if someone where to build in the C++ version you now have into the R system so that it would work better. Unfortunately I don't know anything about how to even go about doing that.

Last edited by BruceZ; 04-25-2012 at 06:41 PM.

Quote

04-25-2012 , 12:10 PM

#19

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

You can often do better than a generic comb() function by making your own. As a dumb example, if you are choosing 2 from 3 you know that whatever your list is, you'll want this combination of elements:
01 02 12
so you can pre-store that as a list or matrix or whatever is convenient in R (I don't know R but it's programming paradigm seems similar to matlab, where you want to operate on vectors or matrices and not do loops)

Then your comb function just needs to apply the indices to the list iteratively (or jointly if it's supported, which it probably is in R)

In your case it may actually be that the combn function is getting called multiple times - if that's the case you'd be way better off (probably) calling it once yourself, storing the result in a variable, and using that.

Quote

12-14-2012 , 12:59 PM

#20

Mirapep

enthusiast

Join Date: Dec 2012 Posts: 53

Hello brucez i ask your kind help. If the combinations Are c(n,k) and i want to at least x consecutive numbers with k-x > x how to calculate combinations to substract? Example c(20,10) with at least three consecutive is c(17,7) + 17*c(16,7) - ????? thanks for help and forgive my grammar, I'm Italian and I live in Rome

Quote

12-15-2012 , 03:24 PM

#21

Mirapep

enthusiast

Join Date: Dec 2012 Posts: 53

Help me

Quote

12-21-2012 , 06:07 AM

#22

Mirapep

enthusiast

Join Date: Dec 2012 Posts: 53

Please

Quote

12-21-2012 , 07:57 AM

#23

BruceZ

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,877

Quote:

Originally Posted by Mirapep

You can use the R script below to count all of them for 3 consecutive. For c(20,10) it is 60.4%. If you want to do more than 3 consecutive, the consec function would have to be modified, but that's easy. For a large number of numbers like 49, this can take a long time to run, and it would be better to use separate loops as I showed earlier in this thread. It could also be simulated easily.

Counting the number to subtract looks very difficult. With c(20,10) you can have 3 groups of 3 consecutive. The position of the 3 of them changes the number of possible positions for other groups. Why do you need this?

R is free to download and only takes a minute to install.

Code:

n = 20
k = 10

consec = function(x) {
  any((c(x,0,0) - c(0,0,x))[3:6] == 2)
}

combs = combn(20,10)
sum(apply(combs, 2, consec))/choose(n,k)

Output:

[1] 0.6044892

Last edited by BruceZ; 12-21-2012 at 08:05 AM.

Quote

12-21-2012 , 11:51 AM

#24

Mirapep

enthusiast

Join Date: Dec 2012 Posts: 53

Hello BruceZ,
thanks for your kind reply. I need to solve this problem for its application in money management. There is, and can be downloaded from google, an Excel spreadsheet for betting operations but is also used for trading: his name is multimasaniello. In a nutshell it determines the initial cash you are willing to take risks, you insert quotation of each bet, then you decide how many events to win and the program calculates the overall performance and the amount of each progressive bet. Now you want to influence, in order to increase the yield, which for example are not allowed ko more consecutive winning bets.
To resolve the problem of the number to be subtracted had thought to a recursive formula for more groups of consecutive occurring before. You think it's possible to get this?
Thank you again,
Giuseppe

Quote

12-21-2012 , 11:53 AM

#25

Mirapep

enthusiast

Join Date: Dec 2012 Posts: 53

Hello BruceZ,
thanks for your kind reply. I need to solve this problem for its application in money management. There is, and can be downloaded from google, an Excel spreadsheet for betting operations but is also used for trading: his name is multimasaniello. In a nutshell it determines the initial cash you are willing to take risks, you insert quotation of each bet, then you decide how many events to win and the program calculates the overall performance and the amount of each progressive bet. Now you want to influence, in order to increase the yield, which for example are not allowed k or more consecutive winning bets.
To resolve the problem of the number to be subtracted had thought to a recursive formula for more groups of consecutive occurring before. You think it's possible to get this?
Thank you again,
Giuseppe

Quote

Page 1 of 5

First

1 2 3 4 5

Last

Post Reply Subscribe

...

Page 1 of 5

First

1 2 3 4 5

Last