Open Side Menu Go to the Top
Register
The Riddler The Riddler

03-04-2016 , 12:03 PM
This is from a weekly column on FiveThirtyEight called The Riddler. How would one go about solving this?
Two players go on a hot new game show called “Higher Number Wins.” The two go into separate booths, and each presses a button, and a random number between zero and one appears on a screen. (At this point, neither knows the other’s number, but they do know the numbers are chosen from a standard uniform distribution.) They can choose to keep that first number, or to press the button again to discard the first number and get a second random number, which they must keep. Then, they come out of their booths and see the final number for each player on the wall. The lavish grand prize — a case full of gold bullion — is awarded to the player who kept the higher number.

Which number is the optimal cutoff for players to discard their first number and choose another? Put another way, within which range should they choose to keep the first number, and within which range should they reject it and try their luck with a second number?
The Riddler Quote
03-04-2016 , 01:04 PM
based only on intuition and logic i'd say you have to keep the number if it's >(?=)0.5. if you get 0.49, you have a better chance to get a better number in the second try than a worse one so you should discard 0.49. if you get 0.51, second try is more likely to give you a worse number so you would keep that one. i don't know if this is correct, but makes sense to me.

0.5 is kind of 'whatever', since the chance for a bigger number is the same as for a lower one.
The Riddler Quote
03-04-2016 , 01:45 PM
The other guy gets to replace also. Can he beat your strategy?
The Riddler Quote
03-04-2016 , 02:20 PM
if what i posted is GTO solution for this problem, opponent has to play exactly the same and it would be EV=0 for both players. i think it can also be EV=0 for both players if they play exactly the same way non-GTO, for example both change when it's lower than 0.6.

but if two players play differently, one has to have an edge. let's say that one player changes the number when it's lower than 0.6. he should then have a bigger quantity of low numbers on a large sample than the guy playing GTO and he would be EV-. GTO strategy would be EV+ against that player, but there would exist a strategy that's even better than GTO against that player, to exploit him. that strategy would have a limit lower than 0.5 for us keeping the number we get.
The Riddler Quote
03-04-2016 , 03:50 PM
Quote:
Originally Posted by md46135
if what i posted is GTO solution for this problem, opponent has to play exactly the same and it would be EV=0 for both players. i think it can also be EV=0 for both players if they play exactly the same way non-GTO, for example both change when it's lower than 0.6.

but if two players play differently, one has to have an edge. let's say that one player changes the number when it's lower than 0.6. he should then have a bigger quantity of low numbers on a large sample than the guy playing GTO and he would be EV-. GTO strategy would be EV+ against that player, but there would exist a strategy that's even better than GTO against that player, to exploit him. that strategy would have a limit lower than 0.5 for us keeping the number we get.
Nobody is doubting your "if-then" logic. But simply saying that x=0.5 is the best strategy is highly doubtful.

Consider if you knew the other player's cutoff = Y. Would you still choose your own cuttoff X = 0.5? If your cutoff is X and your opponent's cutoff is Y, what is the probability that you'll win?

Anyway, I will post my answer in the next post.
The Riddler Quote
03-04-2016 , 04:03 PM
Spoiler:
I worked through all the cases ({(discard, discard), (discard, keep), (keep, discard), and (keep, keep)}) using simple probability. In each case you need to calculate the prob of the case occurring and the prob that you win.

By doing so, if you knew your opponent's cutoff is Y and your cutoff is X, it is straightforward to calculate your probability of winning P[(X,Y)] using simple calculus. Then it is a bit more of simple calculus to determine your best-response cutoff X*(Y) that maximizes your probability of winning. It takes a bit of care, but it is pretty easy to do. Of course, this gives your probability of winning P[(X*(Y),Y)].

Finally, your opponent can now counter-respond by changing his cutoff to Y*(X*(Yo)) where the Yo denotes his initial cutoff. Of course, this response - counter-response dance continues until an equilibrium is reached which is easy to calculate.

I am hesitant to post my answer for two reasons: (i) I don't want to stifle anyone else from working on this neat problem; and (ii) I may have made a silly math error which would be totally embarrassing (lol). Of course, I have double-checked my answer so I am quite confident that it is correct.

I will leave it at that for now.

Last edited by whosnext; 03-04-2016 at 04:58 PM.
The Riddler Quote
03-04-2016 , 05:34 PM
interesting, it's not as simple as it first seemed
i wrote a matlab program to test it. i tried it for cutoff 0.5 and here are the results.
Spoiler:

hero cutoff is fixed at 0.5. on x axis you have opponent's cutoff from 0 to 1 with step 0.01. for every opponent's cutoff (101 of them total) i ran 100000 trials and plotted hero's chance of winning on y axis. if you compare it to the red 50 line, you can see hero's probability of winning at somewhere near 0.6 goes below 50%.

here are the exact numbers
Spoiler:
Columns 1 through 20

62.3560 61.8290 61.7730 61.4030 60.9380 60.7110 60.4990 59.6530 59.5600 59.4150 59.1760 58.7550 58.1140 57.8450 58.0640 57.4120 57.2210 57.0540 56.4340 56.2590

Columns 21 through 40

55.6770 55.5510 55.3180 55.3690 55.0120 54.6390 54.1940 53.8670 54.0030 53.7710 53.6380 52.9920 52.9180 52.8590 52.6610 52.4830 52.1000 51.9020 51.6750 52.0130

Columns 41 through 60

51.1090 51.1840 50.9040 50.9480 50.8780 50.6550 50.5010 50.2440 50.2880 49.9700 50.0770 49.9090 49.8290 49.6250 49.5610 49.8000 49.4220 49.6140 49.5240 49.2890

Columns 61 through 80

49.4970 49.6930 49.5600 49.7350 49.6600 49.7910 50.1050 49.9830 50.1250 50.3910 50.5350 50.4720 50.8750 51.1520 51.4420 51.4980 52.2020 52.1480 52.4000 52.3470

Columns 81 through 100

52.7550 53.7550 53.7530 53.9320 54.2190 54.7270 55.2310 55.7060 56.1420 56.5390 56.9240 57.8190 57.9410 58.5980 59.0590 59.6020 60.0890 60.7640 61.1280 61.7290

Column 101

62.6480


it does worst against opponent cutoff 0.59, so i ran it for 0.59. results in spoiler.
Spoiler:


you can see there are less values below the red line and they're not far below it.
again, exact results
Spoiler:
Columns 1 through 20

62.2500 62.0370 61.2830 60.9140 60.6090 60.3220 59.9540 59.7220 59.1110 58.8640 58.5090 58.3660 57.9490 57.5870 57.3900 57.1410 56.8850 56.3510 56.4140 55.9940

Columns 21 through 40

55.5700 55.6020 54.8370 54.7780 54.6560 54.8100 54.2450 54.1160 53.8060 53.5220 53.4700 53.3480 52.9750 53.0200 52.8000 52.3020 52.3110 52.2600 52.2540 51.6910

Columns 41 through 60

51.7050 51.3710 51.4170 51.2630 51.0530 51.5080 51.0080 50.7350 50.4230 50.5270 50.5900 50.2870 50.1430 50.1870 50.1190 50.3860 50.2630 49.6850 49.9140 49.9500

Columns 61 through 80

50.0990 49.8170 49.8210 50.1750 50.1610 50.1380 50.1010 50.1070 50.3060 50.3830 50.5700 50.6860 50.8320 51.4340 51.1740 51.5430 51.7750 52.1320 52.3940 52.5230

Columns 81 through 100

52.9820 53.1670 53.4450 53.7020 54.2690 54.4140 55.1620 55.3590 55.7100 56.1310 56.6400 57.3370 57.8630 58.0890 58.6040 59.4350 59.8280 60.2820 60.9090 61.5890

Column 101

62.3720


this one does worst against 0.57. finding the minimum of that curve should give an exact result.
The Riddler Quote
03-04-2016 , 06:43 PM
You are in the right ballpark.
The Riddler Quote
03-05-2016 , 12:41 AM
Quote:
Originally Posted by md46135
0.5 is kind of 'whatever', since the chance for a bigger number is the same as for a lower one.
But you are playing against a thinking opponent. There is an equilibrium somewhere but it likely isn't 0.5.
The Riddler Quote
03-05-2016 , 06:24 AM
i think i have found an exact solution for this problem. it seems to hit the numbers pretty good. anyway, here it is:

suppose you have two players - hero and villain. let h be hero's cutoff point (meaning that if a generated number is lower than that point, it gets discarded). also, let v be villain's cutoff point.

like whosnext said, there are 4 possible scenarios: both players keep, hero discards villain keeps, hero keeps villain discards and both discard. i used geometric probability (calculus) and a bit of conditional probability.

1) both players keep
(1-h)*(1-v) is the chance for both of them to keep. chance that hero's number is greater is 0.5*(1-v^2) - v*(1-v), but this has to be divided by (1-h)*(1-v) because it's conditional probability and (1-h)*(1-v) is the total observed geometric area. i hope that my writing makes sense

so, the chance for both players to keep and hero to win is simply 0.5*(1-v^2) - v*(1-v)


2) hero keeps villain discards
again, similar thinking. (1-h)*v for that scenario to occur multiplied by 0.5*(1-h^2) for hero to win divided by (1-h) because of conditional probability.

probability for this case v*0.5*(1-h^2)

3) hero discards villain keeps
h*(0.5* (1-h^2) - v*(1-h) )


4) both discard
h*v*0.5

total probability for hero to win is the sum of all 4 of those, and that's
(h*v)/2 + v*(v - 1) - v^2/2 + h*(v*(h - 1) - h^2/2 + 1/2) - (v*(h^2 - 1))/2 + 1/2
The Riddler Quote
03-05-2016 , 06:34 AM
some results:
if v=0.5, this is p(h).
Spoiler:


exact maximum: probability to win = 0.5070, h=0.607625
check my post from yesterday, seems correct.


also, it seems equilibrium point is if both play with cutoff 0.618034

Last edited by md46135; 03-05-2016 at 06:48 AM.
The Riddler Quote
03-05-2016 , 08:54 AM
Spoiler:
I think you have some errors in your probability derivations. For one, hero's probability of winning should depend upon whether h>v or h<v in one or more of the four cases.
The Riddler Quote
03-05-2016 , 09:22 AM
Quote:
Originally Posted by md46135
1) both players keep
(1-h)*(1-v) is the chance for both of them to keep. chance that hero's number is greater is 0.5*(1-v^2) - v*(1-v), but this has to be divided by (1-h)*(1-v) because it's conditional probability and (1-h)*(1-v) is the total observed geometric area. i hope that my writing makes sense
How did you get this value?
The Riddler Quote
03-05-2016 , 09:53 AM
i only said the numbers looked close to simulation results, i'm not an expert (obviously) and i don't know if it's 100% correct. i derived the first case the way i've written it on this image.
Spoiler:

(1-h)*(1-v) is the probability that you are somewhere in that green rectangle, meaning that both players keep their numbers. you have to multiply that with p(v<h) to get the probability for hero to win.

p(v<h) should be red area (triangle) divided by green area (rectangle). when you calculate that, h just disappears from the formula.

Last edited by md46135; 03-05-2016 at 10:00 AM.
The Riddler Quote
03-05-2016 , 10:22 AM
I haven't done the math yet, just came up with an intuitive guess, one that's too simple for me to be very confident in.

Pick a cutoff g that, once reached, gives Villain a 50% chance of beating g when aiming for g.

1 - g^2 = .5
g = sqrt(.5) =~ .707

If Villain has any higher cutoff, say .75, then Hero profits because of the times Villain starts with something like .73 and then squanders it (which results in Villain surpassing g <50% of the time).

If Villain has any lower cutoff, say .65, then Hero profits because some of the 50% of the time Hero doesn't reach g, Hero will still surpass .65, so overall Hero will surpass Villain >50% of the time.

I'll be more confident once I do the math. I think a proof will require some basic calculus.
The Riddler Quote
03-05-2016 , 11:47 AM
I get the same number like md46135 which is actually 1/phi, phi being the golden ratio
but in a different way which can probably be formalised.

Let a the cutoff. The expected value is E(X)=E(A)+E(B) where A is drawing twice (so P(A)=a) and B drawing once. E(A)=a*0.5 as the second try has expectation 0.5
Also E(B)=(1-a)*(1+a)/2 since P(B)=1-a and (1+a)/2 is the "average" (This can be formalised by considering uniform distribution and integrating from a to 1)

then E(X)=-0.5*(a^2-a-1)

Here is where i got stuck as I thought you have to find the maximum which leads to a=0.5. But you should instead set E(X)=a which leads to the familiar (cf. Fibbonacci sequence) equation a^2+a-1=0 and the answer above

So getting the same result as md and the (inverse) golden ratio makes me more confident. I still need to read in more depth md's answer, mine could maybe have some oversimplifications
The Riddler Quote
03-05-2016 , 01:23 PM
Seems to me in GTO it would be that we lose (1-X) times on try 1 with X being where we chose to hold, then in his second try if we are both rational he will then win (1-X)X . So our opponents total probability is (1-X)+(1-X)X or 1-X^2
The Riddler Quote
03-05-2016 , 02:15 PM
Spoiler:
md46135, your evaluations of the integrals are not correct in some cases. I gave a hint in an earlier spoiler. Plus I think there is another mistake in another integral.

kapw7, this is a clever solution technique. If I am following you are essentially saying that at the equilibrium point, a player's average value will be equal to the equilibrium cutoff point. I haven't thought of it that way.

Can anybody shed light on why this condition is true (I am not doubting that it is true, I just need some enlightenment)?

And, yes, (sqrt(5)-1)/2 is the equilibrium solution I found too.
The Riddler Quote
03-05-2016 , 02:27 PM
the paint image i posted is valid only for the first case, others are different. i don't know, but it would be very lucky to just stumble onto the right solution with 4 different integrals...
The Riddler Quote
03-05-2016 , 07:26 PM
Quote:
Originally Posted by whosnext
kapw7, this is a clever solution technique. If I am following you are essentially saying that at the equilibrium point, a player's average value will be equal to the equilibrium cutoff point. I haven't thought of it that way.

Can anybody shed light on why this condition is true (I am not doubting that it is true, I just need some enlightenment)?
You are right it is handwavy. This (that it is a necessary AND sufficient condition) has to be proved and I could not think of some convincing argument for this. You probably have to go through double integrals etc but then it doesn't look that charming anymore. For me the best part of the problem is that maximizing the final score is not the best strategy. Hero can choose a=0.5 and maximize his final score but then villain can still beat him by having a higher number of wins. So hero will be winning more comfortably the times he is ahead but villain will be making more wins overall (but more narrow).
The Riddler Quote
03-06-2016 , 09:46 PM
If my opponent and me have the same strategy, cutoff Q, and if it is unexploitable that means that when I flip against his unknown number I will have the same chance of winning as when I stand those times I am exactly at the cutoff and choose to stand. When I stand with Q my chances are the probability he needs to flip, which is Q, times the probability my cutoff hand will beat his flip, which is also Q.

So my chances are Q squared

When I have a flipping hand, my chances are 1/2 those Q times he flips (thus Q/2) plus half the value of the distance between Q and 1 (eg .15 if Q was .7), which is [(1-Q)2] times the probability he stands pat (which is 1-Q)

So my chances by flipping is (Q+1-2Q+Qsquared)/2

But that has to equal the chance of winning if you are dealt exactly Q and stand. Do you see why?

Thus Qsquared = (1-Q+ Qsquared)/2

Turning into Qsquared + Q-1=0

Which means that Q is the [(square root of 5)-1]/2
The Riddler Quote
03-08-2016 , 05:31 PM
Damn. That's awesome. Well done.
The Riddler Quote
03-11-2016 , 10:18 AM
The Answer from FiveThirtyEight
Let C be the optimal cutoff the players use. The key observation is that if the first number revealed is exactly C, then the probability of winning by keeping C equals the probability of winning by pressing the button again — you are indifferent. We can compute each of these probabilities, keeping in mind that the other player is also using C as their cutoff.

probability player 1 wins by keeping C = probability player 2 gets a number below C for both presses = CC

probability player 1 wins by pressing again = (probability player 2 presses again) * (probability player 1 wins by pressing again | player 2 presses again) + (probability player 2 keeps first number) * (probability player 1 wins by pressing again | player 2 keeps first number) = (C)⋅(1/2)+(1−C)⋅((1−C)⋅1/2)

As noted above, the above two are equal, so

CC=(C)⋅(1/2)+(1−C)⋅((1−C)/2) which simplifies to

C2+C−1=0 The quadratic formula gives the solution.

C=(5√−1)/2=0.618034…

Note that this cutoff is the golden ratio minus one, known as the golden ratio conjugate. So using the golden ratio gives the best chance to win the gold bullion!

Looks a lot like Sklansky's answer to me!
The Riddler Quote
03-11-2016 , 02:50 PM
Quote:
Originally Posted by Didace
The Answer from FiveThirtyEight
Let C be the optimal cutoff the players use. The key observation is that if the first number revealed is exactly C, then the probability of winning by keeping C equals the probability of winning by pressing the button again — you are indifferent. We can compute each of these probabilities, keeping in mind that the other player is also using C as their cutoff.

probability player 1 wins by keeping C = probability player 2 gets a number below C for both presses = CC

probability player 1 wins by pressing again = (probability player 2 presses again) * (probability player 1 wins by pressing again | player 2 presses again) + (probability player 2 keeps first number) * (probability player 1 wins by pressing again | player 2 keeps first number) = (C)⋅(1/2) + (1 − C)⋅((1 − C)⋅1/2)

As noted above, the above two are equal, so

CC = (C)⋅(1/2) + (1 − C)⋅((1 − C)/2) which simplifies to

C2 + C − 1 = 0 The quadratic formula gives the solution.

C = (√5 − 1)/2 = 0.618034…

Note that this cutoff is the golden ratio minus one, known as the golden ratio conjugate. So using the golden ratio gives the best chance to win the gold bullion!

Looks a lot like Sklansky's answer to me!
I made a couple of corrections and did a little reformatting in the posted solution above.

You missed the following from the article:
...I’m also proud to report that David Sklansky, poker god, correctly solved this problem, although he was not the randomly chosen winner. Sorry, David.
The Riddler Quote

      
m