Nash Equilibrium 100 coin game - Page 2 - Science, Math and Philosophy Forum

Using masque's continuous model, suppose 2 players both choose randomly according to the probability density p(x). And suppose the 3rd player chooses according to pdf q(x). How would you calculate the EV for the three players in terms of p(x) and q(x)? Some kind of integrals?

PairTheBoard

Yeah. Conceptually, I think it's easier to think of the discrete case (which leads to discrete sums), and then proceed to increase the subdivisions (which turns the discrete sums into integrals).

Quote

10-12-2016 , 07:26 AM

#27

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Try this as a first guess;

1 and 2 play a and 1-a with chance p each and 1/2 with chance 1-2p.

3 now plays b and 1-b with chance q each and 1/2 with chance 1-2q.

Under this model express the equity E of the 3rd guy as function of a,p,b,q

Now consider the minimization of E with respect to a, p (regardless what q, b are). That will create some Em(q,b). Then consider the maximization of Em(q,b) with respect to q, b. See where that gets you.

3^3=27 terms thing likely.

Then maybe try a variational calculus problem with 2 symmetric continuous distributions, one for the first 2 and one for the 3rd guy say f, g.

Last edited by masque de Z; 10-12-2016 at 07:31 AM.

Quote

10-12-2016 , 10:12 AM

#28

plexiq

veteran

Join Date: Apr 2007 Posts: 2,557

Should be straightforward to solve for a equilibrium using fictitious-play, cfr etc?

May have been discussed already, but what happens if two players have equal distance (and/or take the same number)? I'd assume they split? [With the alternative, each gets awarded full, all players picking the same number 100% of the time would be a NE.]

Quote

10-12-2016 , 12:12 PM

#29

plexiq

veteran

Join Date: Apr 2007 Posts: 2,557

For the 100 coin game (1...100), if ties split then the following is a perfect NE:
1..25: 0%
26..75: 2% [All these numbers have an EV of exactly 100/3]
76..100: 0%

If two players play this strategy, then the third player can do no better than playing it himself.

Quote

10-12-2016 , 01:51 PM

#30

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

What if the first 2 always play 0.25 and 0.75 (or whatever a and 1-a maximizes things for them) as in colluding. Then the 3rd guy is in trouble not? Of course in playing that way 1 and 2 can get exploited by each other if they defect but why the hell would they not cooperate implicitly if they got that they can destroy the third guy by making sure they do not kill each other each time?

All the 3rd can do now is punish systematically one of the 2 by always choosing 0.25 or 0.75 (a or 1-a) too. Unless of course the 2 can always switch in every round using some algorithm that randomly chooses who has 0.25 and who has 0.75 but never both the same. They can use quantum mechanics ie entangled particles for that lol. (this is in the case there is an order in choosing still secretly, all the first need to do is think this and then guess the same source of 2 entangled particles that naturally is available to both leaving the other guy out. If they were AI they might think in some interesting way why they need to choose the same without communicating anything to each other just the thinking is so close that they can unavoidably reach the same source. Then the 3rd doesn't know who to punish and will start guessing at random to punish one but the 2 first then will always avg then 1/2*(1/4+1/2)=3/8>1/3 continuing to implicitly "cooperate" that way).

Last edited by masque de Z; 10-12-2016 at 02:05 PM.

Quote

10-12-2016 , 02:05 PM

#31

plexiq

veteran

Join Date: Apr 2007 Posts: 2,557

The strat i posted satisfies the NE condition, if all three players use that strategy then none of them can improve their EV by unilaterally deviating (while the other two stick to the strat). Of course, collusion by two players may still be possible, but the OP only asked for a NE

As far as i can tell the 0:100 (101 coin) game doesn't have a nice-form NE, it sticks to a similar pattern but the frequencies are not uniform.

NE solution for 101 coin (0:100) for giggles:

Code:

#,Played,EV
0,0.0000000,21.2911941
1,0.0000000,21.7931098
2,0.0000000,22.2911941
3,0.0000000,22.7931098
4,0.0000000,23.2911941
5,0.0000000,23.7931098
6,0.0000000,24.2911941
7,0.0000000,24.7931098
8,0.0000000,25.2911941
9,0.0000000,25.7931098
10,0.0000000,26.2911941
11,0.0000000,26.7931098
12,0.0000000,27.2911941
13,0.0000000,27.7931098
14,0.0000000,28.2911941
15,0.0000000,28.7931098
16,0.0000000,29.2911941
17,0.0000000,29.7931098
18,0.0000000,30.2911941
19,0.0000000,30.7931098
20,0.0000000,31.2911941
21,0.0000000,31.7931098
22,0.0000000,32.2911941
23,0.0000000,32.7931098
24,0.0000000,33.2911941
25,0.0100332,33.6666667
26,0.0293591,33.6666667
27,0.0104629,33.6666667
28,0.0289169,33.6666667
29,0.0109184,33.6666667
30,0.0284471,33.6666667
31,0.0114039,33.6666667
32,0.0279447,33.6666667
33,0.0119249,33.6666667
34,0.0274033,33.6666667
35,0.0124886,33.6666667
36,0.0268146,33.6666667
37,0.0131053,33.6666667
38,0.0261663,33.6666667
39,0.0137894,33.6666667
40,0.0254407,33.6666667
41,0.0145633,33.6666667
42,0.0246092,33.6666667
43,0.0154640,33.6666667
44,0.0236212,33.6666667
45,0.0165625,33.6666667
46,0.0223681,33.6666667
47,0.0180295,33.6666667
48,0.0204845,33.6666667
49,0.0205608,33.6666667
50,0.0182348,33.6666667
51,0.0205608,33.6666667
52,0.0204845,33.6666667
53,0.0180295,33.6666667
54,0.0223681,33.6666667
55,0.0165625,33.6666667
56,0.0236212,33.6666667
57,0.0154640,33.6666667
58,0.0246092,33.6666667
59,0.0145633,33.6666667
60,0.0254407,33.6666667
61,0.0137894,33.6666667
62,0.0261663,33.6666667
63,0.0131053,33.6666667
64,0.0268146,33.6666667
65,0.0124886,33.6666667
66,0.0274033,33.6666667
67,0.0119249,33.6666667
68,0.0279447,33.6666667
69,0.0114039,33.6666667
70,0.0284471,33.6666667
71,0.0109184,33.6666667
72,0.0289169,33.6666667
73,0.0104629,33.6666667
74,0.0293591,33.6666667
75,0.0100332,33.6666667
76,0.0000000,33.2911941
77,0.0000000,32.7931098
78,0.0000000,32.2911941
79,0.0000000,31.7931098
80,0.0000000,31.2911941
81,0.0000000,30.7931098
82,0.0000000,30.2911941
83,0.0000000,29.7931098
84,0.0000000,29.2911941
85,0.0000000,28.7931098
86,0.0000000,28.2911941
87,0.0000000,27.7931098
88,0.0000000,27.2911941
89,0.0000000,26.7931098
90,0.0000000,26.2911941
91,0.0000000,25.7931098
92,0.0000000,25.2911941
93,0.0000000,24.7931098
94,0.0000000,24.2911941
95,0.0000000,23.7931098
96,0.0000000,23.2911941
97,0.0000000,22.7931098
98,0.0000000,22.2911941
99,0.0000000,21.7931098
100,0.0000000,21.2911941

Last edited by plexiq; 10-12-2016 at 02:22 PM.

Quote

10-12-2016 , 02:32 PM

#32

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Thanks plexiq for this work by the way. Did it converge fast to that? I want to see what the continuous distributions variational problem also gives out and it might be very close (hopefully it doesn't look monstrous or there is an easy way to simplify it with smart arguments. I also had imagined that the 2 distributions would be symmetric and go up to some point near the corners but not at the corners, so its nice to see that).

Notice i didn't ask them to collude by communicating anything explicitly. I suggested a possible way they could get there naturally rationally. It is not exactly unethical or illegal either. In a real life war of 2 AI vs a human it might happen. And it better happen in a 2 human vs 1 AI war lol. In real life it is never true that all players are equally rational or identically versed in everything possible but with AI they may be close enough.

Quote

10-12-2016 , 02:53 PM

#33

plexiq

veteran

Join Date: Apr 2007 Posts: 2,557

Re convergence: It only took a few seconds because the game is so simple, but that was around 25k iterations of CFR+ to get a full equilibrium (epsilon <1e-12).

Quote

10-12-2016 , 06:12 PM

#34

whosnext

Carpal \'Tunnel

Join Date: Mar 2009 Posts: 6,734

From my personal experience I know that back in the day this was an interview question used by McKinsey and other management consulting firms for their entry-level analyst position. An MBA was highly desirable but not a requirement for that position so exposure to game theory concepts was fair game.

One of the first things interviewees were asked to do was to write down the expression for how many points (coins, votes, etc.) each player received given the three different players' positions. This is conceptually straightforward but interviewees weak in math could stumble here. And the issue of ties could arise here.

Equilibrium strategies were then discussed. The better interviewees could reason that the strategy each player would employ would have to be a mixed strategy. And it would need to be symmetrical around the midpoint (50.5). If needed, they were told to focus upon uniform strategies to make the problem simpler and more tractable in an interview setting.

Many interviewees knew (or were coached) to find the set of equilibrium strategies, you can "fix" two players' strategies, and then find the third player's best-response strategy. At equilibrium, the third player's best-response will be identical to the opponents' identical strategy (ignoring any indifference).

So the object is to find the optimal "range". Good interviewees would be able to see that there is an inherent tension between the opponents being too "tight" and being too "loose".

If their range is too tight (say 40-61), then the third player can simply pick a number, say one less than the bottom of the range and get more than 1/3. If their range is too loose (say 10-91), then the third player can simply pick a number in the middle, say 50, and get more than 1/3.

Some interviewees mistakenly conclude that 34-67 is the optimal range but this is incorrect since probabilistically the "edges" aren't visited often enough to make the third player playing 33, say, unprofitable.

The very best interviewees were able to find (or guess) that the optimal range is 26-75. It's been so many years that I don't remember if there is some "trick" of logic or math that leads to that solution but I think there is.

Those were the days.

Quote

10-18-2016 , 10:18 PM

#35

Atlantis1

adept

Join Date: Mar 2013 Posts: 1,006

Sorry guys I totally forgot about this topic as it didnt get any replies in the time frame I "needed". Thanks a lot for the replies. It has been very interesting to read.

Unfortunately I am unable to phrase the question in a more appropiate manner, I more or less copied it exactly how someone told me it.

Quote

10-20-2016 , 02:03 AM

#36

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

Quote:

Originally Posted by plexiq

Re convergence: It only took a few seconds because the game is so simple, but that was around 25k iterations of CFR+ to get a full equilibrium (epsilon <1e-12).

nice. can you post the code?

Quote

10-20-2016 , 07:16 AM

#37

plexiq

veteran

Join Date: Apr 2007 Posts: 2,557

That getScores method is pretty ugly, but only used for initializing. Meh

Code:

import java.util.Arrays;

public class CoinGameCfr {
  
  static final int coins = 101;
  static final double[][][] scoreCache = new double[coins][coins][coins];

  public static double[] getStrategy(double[] regret){
    double t = 0;
    for(double d : regret)
      t+=Math.max(d,0);
    double[] s = new double[regret.length];
    if(t==0.0){
      Arrays.fill(s, 1.0/regret.length);
      return s;
    }
    for(int i=0;i<s.length;i++)
      s[i]=Math.max(regret[i], 0)/t;
    return s;
  }
  
  public static double[] getScores(int[] otherPicks){
    double[] d = new double[coins];
    for(int i=0;i<coins;i++){
      for(int j=0;j<coins;j++){
        int count = 0;
        for(int p : otherPicks){
          int c = Integer.compare(Math.abs(p-j),Math.abs(i-j));
          if(c<0){
            count = -1; break;
          }else if(c==0){
            count ++;
          }            
        }        
        if(count>=0)
          d[i]+=1.0/(count+1);
      }
    }
    
    return d;      
  }
  
  public static double getStratValue(double[] evs, double[] strategy){
    double d = 0;
    for(int i=0;i<evs.length;i++)
      d+=evs[i]*strategy[i];
    return d;
  }
  
  public static void updateRegret(double[] evs, double[] regrets, double[] strategy){
    double stratVal = getStratValue(evs, strategy);
    for(int i=0;i<evs.length;i++)
      regrets[i]=Math.max(regrets[i]+evs[i]-stratVal, 0);
  }
  
  public static double[] getPickEvs(double[] strat){
    double[] s = new double[strat.length];
    for(int i=0;i<coins;i++)
      for(int j=0;j<coins;j++){
        double weight = strat[i]*strat[j];
        if(weight>0.0){
          double[] scores = scoreCache[i][j];
          for(int k=0;k<s.length;k++)
            s[k]+=weight*scores[k];
        }
      }
    return s;    
  }
  
  public static void main(String[] args){
    
    for(int i=0;i<coins;i++)
      for(int j=0;j<coins;j++)
        scoreCache[i][j]=getScores(new int[]{i,j});

    double[] regret = new double[coins];
    double[] strat=null, evs=null;
    int iter=0;
    
    double epsilon;
    do{
      strat = getStrategy(regret);
      evs = getPickEvs(strat);
      updateRegret(evs, regret, strat);
      epsilon = Arrays.stream(evs).max().getAsDouble()-coins/3.0;
      if(++iter % 1000 == 0)        
        System.out.println(String.format("%d\t%E", iter, epsilon));
    }while(epsilon>1e-12);

    for(int idx = 0; idx < coins; idx++)
      System.out.println(String.format("%d,%.7f,%.7f", (idx), strat[idx], evs[idx]));

  }
}

Last edited by plexiq; 10-20-2016 at 07:28 AM.