Quote:
Originally Posted by TomCowley
Yeah I just skimmed and assumed it was the usual martingale but it has basically the same property (an arbitrarily large positive payoff that grows as fast as the chance of receiving it decreases which the measure-theory analysis isn't a good model of IMO). In the PD passing you're literally making the value of defecting equal to 0 in the limit, and at least for some class of strategies than can be expressed discretely and in the limit, the relative values seem to be continuous and it's the GTO strategy designation that changes wildly. It's not strange for THAT to happen, but it is kind of odd, at least as far as I've seen, for the GTO VALUE to change that much.
I've been puzzling over your comments for several days trying to clarify my thoughts about what's going on here.
First I'd like to rename the model I proposed as
The Continuous Time Prisoner's Dilemma (CTPD) so it won't be confused with already established models like the one linked to by river_tilt.
I'd like to denote the Iterated Prisoner's Dilemma with known fixed number of iterations by
IPD(N) where N is the number of iterations.
Throughout this discussion the IPD(N) will always be normalized. The two players will play iterations I(1),...,I(N) of the PD with payoffs for I(k) determined by the PD matrix multiplied by the normalization factor 1/N. A player's decision on I(k+1) may depend on the decisions made by both players on I(1), ... , I(k). In other words, player strategies can be dynamic.
For the GTO I'm going by this explanation of it in post #1 by The Bryce here:
http://forumserver.twoplustwo.com/94...holdem-245479/
------------------
Game Theory Optimal (GTO): A strategy that yields the highest possible EV (or: “is optimal”) if your opponent always chooses the best possible counter-strategy. In a game of rock-paper-scissors the GTO strategy is to choose randomly from an equal distribution of paper, scissors, and rocks. If you play rock less often than paper, you will have less than ½ equity against an all scissors strategy. Similarly, you must play paper at least as often as you play scissors, and scissors at least as often as you play rock. As a result, you must play paper, scissors, and rocks with equal frequency to guarantee ½ equity against all strategies.
So long as your opponent always chooses the optimal counter-strategy to whatever strategy you choose no strategy on your part can have a higher EV than this.
------------------
Not the easiest concept, but if I understand it "Always Defect" is not GTO for IPD(N) in a general population of opponents. It's non-exploitable and any other strategy does worse against it than another Always-Defect does but that doesn't make it GTO. You have to compare it's EV against another Always-Defect to other candidates' EV's against their optimal counter strategies.
For example, look at Tit-for-Tat as an alternate candidate. While Always-Defect is an exploitive counter strategy to Tit-for-Tat it is not the optimal counter strategy. Such a counter strategy must do at least as well as the TFT-Mod modified Tit-for-Tat which varies from Tit-for-Tat only by always defecting on I(N). TFT-Mod also exploits Tit-for-Tat and does so gaining a much higher EV than Always-Defect gains against another Always-Defect.
Always-Defect is only GTO in the highly restricted population of opponents who always defect on I(2), ... , I(N). Of course that population consists of only 2 strategies, Cooperate or Defect on I(1). In other words, when the game has been reduced to a one-off PD.
So where does the idea that Always-Defect is somehow GTO for IPD(N)? It comes from the fact that under adequate freedom for a population of IPD(N) strategies to evolve over multiple runs of strategies playing against each other, there is evolutionary pressure for the population to evolve toward ones looking progressively more and more like Always-Defect, starting at I(N) and working backwards in I(N-1), I(N-2), ... , I(1).
As this post is getting too long I'll continue in the next post.
PairTheBoard