Monkeys playing Wordle - Gambling and Probability

Two Plus Two Forums Other Topics Probability

Monkeys playing Wordle

Post Reply Subscribe

...

09-29-2022 , 05:23 PM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

A monkey has to guess a five-letter word that can include repeat letters. The monkey's guesses are uniformly random 5-letter strings. The monkey is allowed an unlimited number of guesses. When a letter in an exact position is correct, that position is solved and the monkey's next guesses only apply to the remaining unsolved positions. For instance, if the word is POKER and the monkey guesses ZZKXY, then the monkey's next guesses will automatically have K in the middle position.

What is the expected number of guesses for the monkey to solve the entire word?

Quote

09-29-2022 , 05:44 PM

Didace

Carpal \'Tunnel

Join Date: Nov 2009 Posts: 19,888

Are we assuming that all letters appear in all places randomly? Or are we going for actual words?

Quote

09-29-2022 , 06:20 PM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

The monkey is basically a random letter generator except for positions it already got right. It needs to guess a specific string. Originally I had in mind that the string is a real word, but instead let's say it's a 5-letter password (which may or may not be a real word).

Quote

09-30-2022 , 07:21 AM

Numen

centurion

Join Date: Sep 2022 Posts: 142

Let's assume for simplicity that we are dealing with a single letter word and let E be the expected value of guesses to get the letter right and p be the probability of getting the letter right on any single trial. Note that expected value starting from the 2nd, 3rd or 100th trial is still E and independent of previous guesses, since the monkey can randomly guess the same wrong letter over and over again.

Since the probability of E being 1 is p and E being E + 1 (it sounds a bit paradoxical) is 1 - p, we can derive the following equation:

E = 1 * p + (E + 1) * (1 - p)

E = p + E + 1 - pE - p

E = 1 / p

So, the expected value of random guesses monkey will make till it gets the correct letter is 1 / 1 / 26 = 26.

The probability that monkey makes a correct guess within 26 trials is not 50% though, even though our initial intuition might trick us to think that way.

The probability that monkey makes a correct guess within the expected value is 1 - P(all wrong) = 1 - (1 - p) ^ E = 1 - (25 / 26) ^ 26 = 64%.

If the question is tweaked so that the monkey doesn't do the trials simultaneously, but one by one for each letter in the word, the expected value of total guesses till it gets all the letters correct would be 5 * 26 = 130. Therefore, when done simultaneously, the answer must definitely be less than 130.

Let's think about a two letter word. Is the expected value of the number of two simultaneous and independent guesses the same as the expected value for a single letter? If so, then adding more letters should not change the answer and the expected number of guesses until the monkey gets all the letters correct in a five letter word or a thousand letter word should be 26. This is not the case since this time we need to find the expected value of the maximum guesses for the five positions, which should be more than the expected value for a single position (26), but less than the expected value of one by one guesses for each position (130). So, the answer to this question must be between 26 and 130.

The question is essentially E(max(L1, L2, L3, L4, L5)) where L1 denotes the number of guesses till the monkey finds the first letter, which has an expected value of 26, but the expected value of the maximum of multiple independent variables with the same expected value must be greater than the expected value of a single variable.

Suppose in a two letter word, probability of getting a letter correct within E guesses is x. Not getting the other letter correct within E guesses will have a probability of 1 - x. By using the same logic above, we can say that previous failures are inconsequential and the expected value of guesses till the correct letter is found after any number of failures is still E.

Therefore, E(max(L1, L2)) = Ex + (1 - x) * 2E = Ex + 2E - 2Ex = 2E - Ex = E * (2 - x) where x = 1 - ((E - 1) / E) ^ E and in our case it is 1 - (25 / 26) ^ 26 = 63.9%.

So for a two letter word my answer would be 26 * (2 - 64%) = 35.4

For a three letter word, E(max(L1, L2, L3)) = E(max(E * (2 - x), L3) = E(max(35.4, L3)) = 35.4x + (1 - x) * (35.4 + 26) where x = 1 - (25 / 26) ^ 35.4 = 75.1%

So for a three letter word, my answer would be 35.4 * 75.1% + 24.9% * 61.4 = 41.9

For a four letter word, E(max(L1, L2, L3, L4)) = E(max(41.9, L4) = 41.9x + (1 - x) * (41.9 + 26) where x = 1 - (25 / 26) ^ 41.9 = 80.7%

So for a four letter word, my answer would be 41.9 * 80.7% + 19.3% * 67.9 = 46.9

Finally for a five letter word, E(max(L1, L2, L3, L4, L5)) = E(max(46.9, L5) = 46.9x + (1 - x) * (46.9 + 26) where x = x = 1 - (25 / 26) ^ 46.9 = 84.1%

So for a five letter word, my answer would be 46.9 * 84.1% + 15.9% * 72.9 = 51.0

Quote

09-30-2022 , 04:56 PM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

The answer is about 58.7175

For a 2-letter word it's about 38.7451

Quote

10-01-2022 , 06:08 AM

Numen

centurion

Join Date: Sep 2022 Posts: 142

Could you please briefly explain your method for the 2-letter word?

Quote

10-01-2022 , 02:49 PM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

We're interested in the expected number of guesses needed to observe the intersection of events A="1st letter correct" and B="2nd letter correct". However, in this scenario, the intersection needn't occur on the same guess.

We need E(# of guesses until A∩B), for which we can apply a rearrangement of the inclusion-exclusion principle:

E(# of guesses until A∩B) = E(# of guesses until A) + E(# of guesses until B) - E(# of guesses until A∪B) = 26 + 26 - 1/[1-(25/26)²] = 1976/51

Your idea of E[max(L1,L2)] is also valid, but you botched the execution. If you go that route, I'm not sure you can avoid setting it up as an infinite series. Here, rather than add k*P(max=k), I'd add the survival function S(n) = P(max>n) = 2(25/26)^n - (25/26)^(2n). Evaluate that from n=0 to ∞. Just now, I evaluated to a large n in Julia and it spat out the correct answer, but maybe you can find the limit analytically.

Quote

10-05-2022 , 08:06 AM

Numen

centurion

Join Date: Sep 2022 Posts: 142

For the two letter case, if it was given that both letters were guessed correctly within 26 tries, what would be the expected number of tries it takes to guess both letters correctly? I was thinking about it and couldn't really come up with an answer. Is such a calculation possible?

Quote

10-07-2022 , 08:19 AM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

The straight-forward way is to take the sum of n*P(max=n) from n=1 to 26 and divide by P(max<27). I get an answer of 10.4 that way.

I'm away on a trip; when I return in a few days, I'll check if the shortcut from earlier works on the conditional expectation as well.

Quote

10-08-2022 , 07:11 PM

#10

Numen

centurion

Join Date: Sep 2022 Posts: 142

Quote:

Originally Posted by heehaww

The answer is about 58.7175

For a 2-letter word it's about 38.7451

So I have reached the same result for the 2-letter word with a different method.

My logic is as follows:

Our goal is to find E(max(L1,L2)), the expected value of the maximum of the number of tries it takes for monkey to find the first and the second letters, denoted by L1 and L2, respectively.

Let p be the probability of monkey guessing the correct letter within E tries, E being the expected number of guesses that monkey makes until it finds the correct letter.

In case of a 26 letter alphabet, p = 1 - (25/26)^26 = 63.9% and E = 26.

For a two letter word, the probability of monkey...

guessing both letters correctly within E tries is p^2 = 40.9%,
guessing both letters correctly after more than E tries is (1 - p)^2 = 13.0%,
guessing one of the letters within E tries and the other after more than E tries is 1 - p^2 - (1 - p)^2 = 2p - 2p^2 = 46.1%.

Let Ea, Eb and Ec be the respective expected number of tries under the conditions stated above.

Ea = 15.51, calculated by the function Σ Σ max(x,y) * P(x) * P(y) / p^2 where 1 ≤ x & y ≤ 26, so the summation of a total of 26 * 26 = 676 combinations.
Eb = E + E(max(L1, L2)), as both letters are not found within E tries and the expected number of tries it takes to guess both correctly is independent of the past tries.
Ec = 2E since we know one of the letters is found within E tries and the other one is not; the expected number of tries after E past tries is still E.

I will use max instead of E(max(L1, L2)) for the sake of brevity:

max = Ea * p^2 + Eb * (1 - p)^2 + Ec * (2p - 2p^2)
max = Ea * p^2 + (E + max) * (1 - p)^2 + 2E * (2p - 2p^2)
max - max * (1 - p)^2 = Ea * p^2 + E * (1 - p)^2 + 2E * (2p - 2p^2)
max = (Ea * p^2 + E * (1 - p)^2 + 2E * (2p - 2p^2)) / (1 - (1 - p)^2)
max = (15.51 * 40.9% + 26 * 13.0% + 52 * 46.1%) / (1 - 13.0%) = 38.74

Quote

10-10-2022 , 06:22 PM

#11

Numen

centurion

Join Date: Sep 2022 Posts: 142

For a three letter word, I use a similar method:

max3 is E(max(L1, L2, L3))
max2 is E(max(L1, L2)) and already found to be 38.74,
E is the expected number of guesses until monkey finds a one letter word and already found to be 26.

max3 = p^3 * Σ Σ Σ [max(x,y,z) * P(x) * P(y) * P(z)] / p^3 + 3 * (p^2) * (1 - p) * 2E + 3 * p * (1 - p)^2 * (E + max2) + (1 - p)^3 * (E + max3)
max3 - (1 - p)^3 * max3 = p^3 * Σ Σ Σ [max(x,y,z) * P(x) * P(y) * P(z)] / p^3 + 3 * (p^2) * (1 - p) * 2E + 3 * p * (1 - p)^2 * (E + max2) + (1 - p)^3 * E
max3 = [p^3 * Σ Σ Σ [max(x,y,z) * P(x) * P(y) * P(z)] / p^3 + 3 * (p^2) * (1 - p) * 2E + 3 * p * (1 - p)^2 * (E + max2) + (1 - p)^3 * E] / [1 - (1 - p)^3]
max3 = [Σ Σ Σ [max(x,y,z) * P(x) * P(y) * P(z)] + 6 * (p^2) * (1 - p) * E + 3 * p * (1 - p)^2 * (E + max2) + (1 - p)^3 * E] / [1 - (1 - p)^3]

After making the following substitutions:

p = 63.9%
E = 26
max2 = 38.74
Σ Σ Σ [max(x,y,z) * P(x) * P(y) * P(z)] = 4.65 where 1 ≤ x & y & z ≤ 26

We get max3 = 47.24

Quote

10-10-2022 , 06:42 PM

#12

Numen

centurion

Join Date: Sep 2022 Posts: 142

For a four letter word:

max4 = p^4 * Σ Σ Σ Σ [max(w,x,y,z) * P(w) * P(x) * P(y) * P(z)] / p^4 + 4 * p * (1 - p)^3 * (E + max3) + 6 * p^2 * (1 - p)^2 * (E + max2) + 4 * p^3 * (1 - p) * 2E + (1 - p)^4 * (E + max4)
max4 = Σ Σ Σ Σ [max(w,x,y,z) * P(w) * P(x) * P(y) * P(z)] + 4 * p * (1 - p)^3 * (E + max3) + 6 * p^2 * (1 - p)^2 * (E + max2) + 8 * p^3 * (1 - p) * E + (1 - p)^4 * (E + max4)
max4 - (1 - p)^4 * max4 = Σ Σ Σ Σ [max(w,x,y,z) * P(w) * P(x) * P(y) * P(z)] + 4 * p * (1 - p)^3 * (E + max3) + 6 * p^2 * (1 - p)^2 * (E + max2) + 8 * p^3 * (1 - p) * E + (1 - p)^4 * E
max4 = [Σ Σ Σ Σ [max(w,x,y,z) * P(w) * P(x) * P(y) * P(z)] + 4 * p * (1 - p)^3 * (E + max3) + 6 * p^2 * (1 - p)^2 * (E + max2) + 8 * p^3 * (1 - p) * E + (1 - p)^4 * E] / [1 - (1 - p)^4]

After making the following substitutions:

p = 63.4%
E = 26
max2 = 38.74
max3 = 47.24
Σ Σ Σ Σ [max(w,x,y,z) * P(w) * P(x) * P(y) * P(z)] = 3.22

We get max4 = 53.62

Quote

10-10-2022 , 07:00 PM

#13

Numen

centurion

Join Date: Sep 2022 Posts: 142

Finally for a five letter word:

max5 = p^5 * Σ Σ Σ Σ Σ [max(v,w,x,y,z) * P(v) * P(w) * P(x) * P(y) * P(z)] / p^5 + 5 * p * (1 - p)^4 * (E + max4) + 10 * p^2 * (1 - p)^3 * (E + max3) + 10 * p^3 * (1 - p)^2 * (E + max2) + 5 * p^4 * (1 - p) * 2E + (1 - p)^5 * (E + max5)
max5 = [Σ Σ Σ Σ Σ [max(v,w,x,y,z) * P(v) * P(w) * P(x) * P(y) * P(z)] + 5 * p * (1 - p)^4 * (E + max4) + 10 * p^2 * (1 - p)^3 * (E + max3) + 10 * p^3 * (1 - p)^2 * (E + max2) + 10 * p^4 * (1 - p) * E + (1 - p)^5 * E] / [1 - (1 - p)^5]

After making the following substitutions:

p = 63.9%
E = 26
max2 = 38.74
max3 = 47.24
max4 = 53.62
Σ Σ Σ Σ Σ [max(v,w,x,y,z) * P(v) * P(w) * P(x) * P(y) * P(z) = 2.21

We get max5 = 58.75

Quote:

Originally Posted by heehaww

The answer is about 58.7175

For a 2-letter word it's about 38.7451

Both answers seem to be in line with what heehaww said.

Note: I didn't calculate Σ Σ Σ Σ Σ [max(v,w,x,y,z) * P(v) * P(w) * P(x) * P(y) * P(z) = 2.21, I extrapolated by using the previous four summations.

Quote

10-13-2022 , 07:05 AM

#14

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

Re: conditional expectation, my method was right but I plugged something in wrong. The answer is ~15.5 (not 10.4)

The pmf is p(x) = 2(1-(25/26)^x)(1/26)(25/26)^(x-1) - (1/26^2)(25/26)^(2x-2)
The cdf is F(x) = [1-(25/26)^x]^2

[sum of k*p(k) from k=1 to 26] / F(26) = 15.51277

Quote

10-13-2022 , 07:44 AM

#15

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

@Numen I just read your post for the 2-letter case and I like what you did, it's creative! You've reduced an infinite series to a 26-term series* and I agree with all the reasoning.

*For Ea=15.51, a double summation involving 26^2 iterations isn't necessary as I showed in my last post.

Quote

Post Reply Subscribe

...