3 player symmetric zero-sum game with negative payoff at NE - Poker Theory

Two Plus Two Forums Poker Strategy Poker Theory & GTO

3 player symmetric zero-sum game with negative payoff at NE

Post Reply Subscribe

...

10-07-2014 , 08:26 AM

ambientsilence

journeyman

Join Date: Oct 2014 Posts: 391

Here's a small game I came up where there's a negative payoff for one player at equilibrium, despite the game being symmetric:

This game is played by 3 players. Each player secretly chooses a number from the set {1,2,3}. Payoffs are as follows:

If all players choose the same number all payoffs are 0.

If exactly 2 players choose the same number they get -3 and the third player gets 6.

If all players choose different numbers the player choosing the number 3 gets -2 and the other 2 players get 1.

The set of (pure) strategies Player1 chooses 1, Player2 chooses 2, Player3 chooses 3 is a Nash equilibrium, with payoffs of 1,1,-2 It's easy to see that no player can improve their payoff by deviating, making this a Nash Equilibrium. There are other NEs too (any permutation of (1,2,3) actually)

I sort of knew that this was possible but haven't see a specific example before. Hope some of you find this interesting.

Quote

10-08-2014 , 09:11 AM

Clicken

journeyman

Join Date: Aug 2010 Posts: 243

It seems to me that what's of most interest here, is that although this might be a Nash Equilibrium, it's arguably not one that could ever arise between rational agents.

Specifically, the expectation of the pure strategy "choose 3" is negative, assuming the other players' strategies are random. No rational agent - prior to knowing the other players' strategies - would thus ever select such a strategy to begin with.

Quote

10-08-2014 , 04:41 PM

ambientsilence

journeyman

Join Date: Oct 2014 Posts: 391

But against players who randomly and independently choose between 1 and 2 3 would have an EV of +2. Which suggests that 3 should be assigned a positive probability to start with at least in a non-repeated game (or a memory-less game).

Quote

10-08-2014 , 05:39 PM

Clicken

journeyman

Join Date: Aug 2010 Posts: 243

Quote:

Originally Posted by ambientsilence

Restricting player 1 and player 2 to a choice between "1" and "2" (and not also "3") would be an arbitrary constraint which would result in a numerically distinct game from the one you previously defined. You did for instance stipulate that:

Quote:

Each player secretly chooses a number from the set {1,2,3}.

In such a game the expectation of choosing "3" is roughly -0.44. Clearly - as this is a zero sum game - any strategy with a negative expectation is sub-optimal, and would never be chosen by a rational agent.

Quote

10-08-2014 , 05:53 PM

ambientsilence

journeyman

Join Date: Oct 2014 Posts: 391

Sure, it doesn't make sense against completely random agents. I agree with that. I'm saying it might be considered against other rational agents.

Quote

10-11-2014 , 01:07 PM

AaronBrown

Pooh-Bah

Join Date: May 2005 Posts: 4,240

This is an interesting example. It's also interesting to think about how it would actually be played. Assume the players are isolated, learn everyone's picks after each round and there are a large number of rounds.

Looking at the outcome, it's clear that the only fair one is for all players to pick the same number. People are very good at coordinating in these kinds of situations, the obvious way is for everyone to pick one.

If everyone picks one, the game might continue on that way forever. But say on the first round on player picks 2, either out of greed or mistake. Now the other two players want to punish him, which they can do by picking 1 and 2. At least one of the vengeance seekers will be player 1 or 2, so he will pick his player number, and if the other player is 3, he will pick the other of 1 or 2. It might take a few turns to coordinate this, but once it's in place, they can punish forever.

This is such a terrible fate for a defector, that no one should rationally defect. However, it could happen. The defector's likely strategy is to pick on one of the punishers by always picking the same 1 or 2, even though this costs him more than picking 3. Eventually the player suffering payback revenge will decide to save some money by picking 3. If the first defector has peaceable intentions, he would have started by picking 2; once the other player picking 2 goes to 3; the original defector signals his desire for peace by changing his pick to 1. Everyone switches to 1 and Pax Unum reigns.

Anyway, there's lots of scope for tit-for-tat and misunderstanding in this game. It would be an interesting subject of an evolutionary tournament.

Quote

10-11-2014 , 01:57 PM

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

If we allow the order of turns to be randomly selected, does this even out the payoffs?

Quote

10-11-2014 , 03:55 PM

pasita

adept

Join Date: Mar 2007 Posts: 1,101

Quote:

The defector's likely strategy is to pick on one of the punishers by always picking the same 1 or 2, even though this costs him more than picking 3.

Why would that be?
1st round: defector picks 2: D=+6
2nd round: defector anticipates revenge, picks 3: D=6-2==4
3rd round: defector signals peace, picks 1. WORST case (if others stick to 1,2): D=4-3==1
So the original defector is up 1 (worst case) after three rounds. Unless, of course, the others come up with a way of preventing this.

Is there an analytical, rather than anecdotal way of solving a game like this?

Bob: if the actions are taken in turn (with randomly selected ordering), then each guy just picks his number, it's even, fair and NE. Also somewhat boring after a few rounds, I'd quess.

Quote

10-11-2014 , 09:18 PM

AaronBrown

Pooh-Bah

Join Date: May 2005 Posts: 4,240

There are two interesting ways to analyze actual game behavior. One is to run tournaments among competing strategies. The other is to study how actual people play.

Quote

10-12-2014 , 02:57 PM

#10

StMisbehavin

centurion

Join Date: Sep 2010 Posts: 151

If anyone cares, the symmetric Nash equilibrium is at approximately (0.31274,0.31274,0.37452). In other words, if two players play this strategy, the third one can't improve over playing it as well.

Quote

10-13-2014 , 12:30 AM

#11

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by StMisbehavin

I'm not getting that as a Nash Equilibrium. If two players play that strategy I am getting EV's for the 3rd player's pure strategies being
1 = +0.373
2 = +0.373
3 = -0.623

I am getting (0.375, 0.375, 0.25) as a NE.

Last edited by bobf; 10-13-2014 at 12:36 AM.

Quote

10-13-2014 , 01:17 AM

#12

bobf

grinder

Join Date: Feb 2008 Posts: 635

FWIW... I ran this through a regret minimization algorithm, starting with random strategies for each player. After 1000 trials it converged each time to each player playing a different pure strategy from the other players... i.e. it converged to (1,2,3) or (1,3,2) or (3,1,2) etc.

Quote

10-13-2014 , 08:20 AM

#13

Clicken

journeyman

Join Date: Aug 2010 Posts: 243

Quote:

Originally Posted by bobf

Could you clarify this please?

Are you saying that for any sequence of trials, every 1000+N trial had the same player choosing "3" and getting a -2 payoff?

If so, it seems odd to me that the best that player could do is get locked into a negative pay-off for all eternity.

Quote

10-13-2014 , 10:32 AM

#14

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by Clicken

Could you clarify this please?

I run a regret minimization algorithm. In a single run of the algorithm each player's strategy converges to something after many iterations. I am talking about what it converges to, not what each iteration is.

In general, it could converge to some mixed or pure strategy for each player. In this case, in each of 1000 runs, the algorithm converged to each player playing a pure strategy.

Quote:

Are you saying that for any sequence of trials, every 1000+N trial had the same player choosing "3" and getting a -2 payoff?

After some large number of iterations, one of the players is picking "3" and getting a -2 payoff and it won't ever change after that. I tried this 1000 times and each time the same result occurred, but with a different player (random) ending up being the one to pick "3".

For example, here might be the result

Run 1
Player 1 converges to picking 3 always
Player 2 converges to picking 1 always
Player 3 converges to picking 2 always

Run 2
Player 1 converges to picking 1 always
Player 2 converges to picking 3 always
Player 3 converges to picking 2 always

Run 3
Player 1 converges to picking 3 always
Player 2 converges to picking 2 always
Player 3 converges to picking 1 always

etc.

Quote:

If so, it seems odd to me that the best that player could do is get locked into a negative pay-off for all eternity.

Regret minimization approaches "the best you could do" in hindsight against the strategy your opponents played. But you might do better by choosing a strategy that coerces your opponent's into changing their strategy.

Last edited by bobf; 10-13-2014 at 10:46 AM.

Quote

10-13-2014 , 12:58 PM

#15

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

I'm glad you clarified at the end of your post bobf. I wouldn't expect CFRM to be that useful or practical here since no player can regret as soon as the pure strategies have been assigned and the agents become aware of their regret values through trials. It is interesting how the forced loser is determined randomly or how agents find a way to place themselves in the forced loss category to begin with. How many trials did it take for the pure strategies to emerge?

I think Aaron's incites are much more interesting for this particular game structure.

Quote

10-13-2014 , 01:42 PM

#16

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by just_grindin

I just did a single run where I am displaying strategies on each iteration.

After 3 or 4 iterations I could tell which player is going to be what.
After 100 iterations - each player was above 99% pure.
After 2600 iterations - each player was pure to 3 decimal places.

Using CFR+ it takes 72 iterations to become pure to 3 decimal places.

Quote:

It is interesting how the forced loser is determined randomly or how agents find a way to place themselves in the forced loss category to begin with. How many trials did it take for the pure strategies to emerge?

It looks like whichever player has the most advantage playing a particular number relative to the other players ends up converging to that pure strategy. Like if you eyeball the EV of each pure strategy for each player based on the initial random strategies, you can make a good guess who is going to end up what.

Quote:

I think Aaron's incites are much more interesting for this particular game structure.

Yeah, I just wanted to see what would happen, because, for one, I don't understand why CFR is not guaranteed to converge for mutliplayer (>2) games. Wanted to see if it would work on this game.

Quote

10-13-2014 , 01:55 PM

#17

Clicken

journeyman

Join Date: Aug 2010 Posts: 243

Thank you for the clarification!

I've only an inkling as to how exactly a regret minimization algorithm might work, although this has been bolstered by your explanation. Given these results however, it seems to me that it is maximising the payoff (or minimising the loss) for the next iteration of the game. If it's known (or probable enough) that for the next iteration the opponents' strategies are (1,2) then clearly, strategy (3) is optimal for that iteration, even if the pay-off is negative.

Quote:

Originally Posted by bobf

But you might do better by choosing a strategy that coerces your opponent's into changing their strategy.

This remark appears to reflect some thoughts I've been having about the game / NE in this thread, and about NEs in general.

Specifically, when

the number of future iterations is unknown or at least some large enough number

and, opponents' strategies are not fixed - for they are rational agents capable of adjusting.

then the optimal "plan" to maximise pay-offs over all future iterations, will sometimes be to play a strategy which minimises the pay-off for the next iteration.

The idea of a rational agent actually playing this game, and getting locked into a negative pay-off for every single iteration after a certain point seems absurd after all.

Quote

10-14-2014 , 03:07 AM

#18

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by Clicken

CFR is looking only at the past in order to determine it's current strategy.

Here is roughly how CFR works:

A CFR agent playing this game has a cumulative regret value for each action. Positive regret means "I wish I had played this action more". Negative regret means "I wish I had played this less".

Current strategy of an agent is a probability for each action proportional to cumulative regret. Negative regret gets 0 probability.

On each trial, a CFR agent compares two values:
1. The payoff for each action given the actions of opponents
2. The EV of the current strategy given the actions of opponents

The cumulative regret values for each action are updated by adding [1] - [2]

What regret minimization guarantees is that after some large number of trials, an agent, looking back at the history of opponent actions and holding those constant, will see that no other strategy would have done better (by much).

Quote:

Yes, I think that is true. We might want to take a negative payoff now to gain in the future. But I think this assumes we can somehow model the opponent.

Quote

Post Reply Subscribe

...