November and December NC/LC Thread - Page 5 - Medium Stakes Poker Forum

We are looking for a GTO solution if we are assuming our opponent will always adjust optimally. Keep in mind GTO will not be as profitable as an exploitative line vs. someone who does not adjust correctly.

How do either of us determine what the other is doing? On Throw2? On Throw100?

Regardless, how do you open?

12-04-2014 , 05:54 PM

#102

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

Also keep in mind that just because he has restrictions does not mean he can't play GTO within the confines of the new rules. We are not trying to exploit him, we are trying to determine the GTO strategy for a new game. We will win in the long run not because we are playing better but because the rules of the game favor us.

12-04-2014 , 05:56 PM

#103

jdr0317

Carpal \'Tunnel

Join Date: Jan 2012 Posts: 8,231

Quote:

Originally Posted by CrazyLond

I'm not saying we can't adjust to exploit his counterstrategy but as soon as we do it opens us to getting exploited by a new counterstrategy (the 40/60 JL suggested would be the correct new counterstrategy vs. the 1/0/0 you mentioned). The solution has to be the best strategy where he can't adjust to improve his odds.

We are looking for a GTO solution if we are assuming our opponent will always adjust optimally. Keep in mind GTO will not be as profitable as an exploitative line vs. someone who does not adjust correctly.

Yes, but players will continue to adjust to one another until a nash equilibrium solution can be converged on. As long as one player can profitably deviate to exploit a weakness in another player's strategy, NE is not reached. I didn't say that player 2's counter-strategy to player 1's strategy is not exploitable (as playing [1, 0, 0] clearly is exploitable), but that player 1 formed a strategy in response to player 2 ([1, 0, 0] 40%, [0, 1/3, 2/3] 60%), and that player 2 was able to form yet another counter-strategy to exploit player 1's attempted adjustment.

12-04-2014 , 06:26 PM

#104

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

Ok unrestricted player goes 1/3 rock 2/3 paper as I suggested before.

Restricted guy goes 40% rock, 26.66667% paper, 33.33333% scissors.

I think that's the Nash Equilibrium if I understand the concept correctly.

12-04-2014 , 07:16 PM

#105

jdr0317

Carpal \'Tunnel

Join Date: Jan 2012 Posts: 8,231

Quote:

Originally Posted by CrazyLond

Unrestricted strategy of #2: [0, 4/9, 5/9]

P(1 wins) = 1/3 * 1/3 + 2/3 * 2/5 = 1/9 + 4/15 = 17/45
P(2 wins) = 4/15 * 1/3 + 1/3 * 2/3 = 14/45

EV(1) = 1/15

If I go [0, 1/3, 2/3] unrestricted, or [2/5, 1/5, 2/5] overall

P(1 wins) = 1/3 * 2/5 + 2/3 * 2/5 = 2/15 + 4/15 = 6/15
P(2 wins) = 1/5 * 1/3 + 2/5 * 2/3 = 1/15 + 4/15 = 5/15

So they are equivalent. Which means both players seem to have found stability. So, I'd accept this as an answer (and am now retesting my code to see if it works as expected, since I missed by quite a bit apparently).

12-04-2014 , 08:41 PM

#106

thesilverbail

old hand

Join Date: Aug 2009 Posts: 1,659

We will guess at the form of the answer and then verify that it is actually correct. For player 1 (P1) since playing rock 40% is already "more than optimal" we will assume that when he is allowed to chose he will choose R 0% of the time. Therefore his strategy reduces to choosing between some mixture of P and S when he is allowed to choose. Let the probability of P and S in his optimal strategy then be X and 0.6-X respectively.

For Player 2 we will make another intuitive leap and set S=0. The justification for this is to remember that since Player 1 is playing R at least 40%, any value we gain from playing S would have to come from player 1 playing P > 40% of the time, in which case player 1 plays S<20% of the time. But then Player 2 can switch to playing 100% R and improve unilaterally. Therefore playing S for player 2 at all is a dominated option.

Thus player 2's strategy can be be represented as playing Y percentage of P and 1-Y percentage of R.

Now we can easily set up indifference equations. For player 1, he must make player 2 indifferent between choosing 100% R and 100% P:

Y-(0.6-Y) = -0.4+(0.6-Y)
Y= 0.8/3

For player 2, he must make player 1 indifferent between choosing 60% P and 60% S:
(1-X)0.6 +X(0.4-0.6) = -(1-X)(0.6)+X(0.4)
X= 2/3

And we are done, the equilibrium strategies (R,P,S) are:
player 1: (0.4, 0.8/3, 1/3)
player 2: (1/3, 2/3, 0)

(Note that this is not a proof yet, since we had to guess the form of the solution first. What we would need to do to complete the proof is to verify that there is no better counter-strategy for either player. You should verify this, but it is tedious)

12-04-2014 , 08:50 PM

#107

thesilverbail

old hand

Join Date: Aug 2009 Posts: 1,659

Aah, I see CrazyLond got there first. Well done !

jdr, about why you seem to get the wrong answer, are you sure you implemented the correct iteration? Finding NE's by simulation is a little tricky and many "obvious" ways of doing it don't converge or converge to the wrong answer. The best method in this case I think would be to use ficticious play:

http://people.csail.mit.edu/costis/6896sp10/lec3.pdf

12-04-2014 , 11:24 PM

#108

callipygian

slowrolled by tpiranha!

Join Date: Mar 2009 Posts: 19,880

Finding NEs by simulation is a great way to confuse global extrema with NEs.

Note that for P1's optimal play - {0.4, 0.8/3, 1/3} - the EV is the same whether P2 plays {1, 0, 0} or {0, 1, 0} or {1/3, 2/3, 0}. But only the latter is the Nash Equilibrium.

Thesilverbail's approach is correct, and highlights both the non-obviousness of Nash Equilibria and the woeful misuse of "GTO."

12-04-2014 , 11:38 PM

#109

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

how was GTO misused? Maybe I misunderstand the concept

12-05-2014 , 05:52 PM

#110

jdr0317

Carpal \'Tunnel

Join Date: Jan 2012 Posts: 8,231

Quote:

Originally Posted by thesilverbail

I am almost positive I made an error in my simulation: namely that the best counter to [R, P, S] would be [P, S, R]. Also may be a simple convergence issue (by nature of convergence rules, scissors probably didn't get suppressed enough).

12-05-2014 , 07:17 PM

#111

thesilverbail

old hand

Join Date: Aug 2009 Posts: 1,659

Quote:

Originally Posted by CrazyLond

how was GTO misused? Maybe I misunderstand the concept

I think Cally is talking about how GTO in Poker is mistaken for just being "balanced" when it is a lot more complicated than that. e.g. many of the intuitive guesses people had upthread for this game seem reasonable at first based on balance and symmetry but in the end were wrong.

I see a lot of threads where people seem to be trying to use some simplified notion of GTO almost as an excuse for not wanting to think too hard about the opponent's range. This is wrong-headed for 2 reasons:

1. In a certain sense the GTO strategy is indifferent to opponent's range. But the process of finding out what the GTO strategy is vitally dependent on the opponent's range. In fact there is no such thing as a GTO strategy. Nash equilibrium strategies come in pairs.

2. Because of the complications and non-obvious behavior in GTO, I feel more and more like there is not as much value in trying to find it, but a better use of our time would be to focus on inferring how our opponents play and what the best counter-strategy is (which is itself often non-obvious). If you look at the top posters like Jon/OTR etc that is what they are most concerned about.

12-05-2014 , 09:03 PM

#112

callipygian

slowrolled by tpiranha!

Join Date: Mar 2009 Posts: 19,880

What thesilverbail said, basically.

Nash Equilibria are useful in the sense that they tell you which side of EV mountain they're on, but unless both players are super skilled and have perfect memories and play literally a billion hands againsy each other you're not going to find the summit.

GTO means, "play as best as you can given the information about your opponent." If your opponent is imperfect, GTO means Not Nash Equilibrium.

12-05-2014 , 11:00 PM

#113

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

I've always thought of GTO as unexploitable. You seem to be describing the theoretical 'nemesis' that I read about in The Intelligent Poker Player

12-06-2014 , 03:33 AM

#114

callipygian

slowrolled by tpiranha!

Join Date: Mar 2009 Posts: 19,880

Quote:

Originally Posted by CrazyLond

I've always thought of GTO as unexploitable.

It is, in a sense. You're basically saying that when you play perfectly, you win the most. It's a meaningless statement.

Unexploitable implies if there were an exploit, your opponent would find it. That is, you're assuming you play an infinitely skilled opponent.

Nash Equilibria are what two infinitely skilled opponents would do to each other. Nash Equilibria do not guarantee a positive win rate - a "GTO" HUHU match will result in both players losing half the rake, for instance. Nash Equilibria also do not guarantee you win the most - the NE solution is the "maximum minimum" solution: no matter what your opponent does, you cannot win less (or lose more) than the NE EV. That isn't to say thay you cannot win more by changing your play.

Take for example a game where people just call with every hand on every board. If you play NE (bluff > 0%), you will win more than the minimum (which assumes opponents are perfect), but less than exploitative (bluff = 0%) play.

So when you want to play "optimally," you take into account your opponents' mistakes, and adjust away from the NE, because the NE only makes sense in the context of a perfect opponent. Optimal play is not necessarily Nash Equilibrium play. You can make exploitable deviations from NE if your opponents do not exploit them.

Play millions of hands online against the same 15 people? Put some effort into finding NE. Play 200 hours a year of 20/40 live with a 200 person player pool? You're not going to reach NE ever.

12-06-2014 , 03:59 AM

#115

Jon_locke

Pooh-Bah

Join Date: May 2012 Posts: 4,180

hilarious conversation today at the tables. First see this post from few months back. http://forumserver.twoplustwo.com/sh...6&postcount=50

Skip ahead today where we are playing same game and somebody talks about having quads. Joboy says, I had quads once, I didn't win, immediately another player says yea but you were really lucky he was all in. I crack up.

12-09-2014 , 04:11 PM

#116

boc4life

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 6,910

You guys sure care a lot about trying to break even

12-09-2014 , 05:17 PM

#117

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

GTO in regular RPS is breaking even but GTO is not breaking even in the modified RPS, nor in poker (unless your opponent is also playing GTO).

12-09-2014 , 05:25 PM

#118

CrazyLond

veteran

Join Date: May 2007 Posts: 2,931

I think that in a way, playing poker in an exploitative way essentially changes the rules of the game so that optimal play involves adjusting to the new rules and then making the GTO play within that context.

For example if my opponent calls too much, my optimal bluffing frequency decreases, but there is still an optimal bluffing frequency. So in a way, we are playing a different game at each decision point and trying to find the GTO decision for that new game.

12-09-2014 , 11:26 PM

#119

AlanBostick

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 11,485

Umm, people should remember what the O in "GTO" stands for.

12-09-2014 , 11:48 PM

#120

DougL

Too helpful for this post

Join Date: Sep 2002 Posts: 21,810

Yeah, all those strong mathy concepts (say some of the existence theorem thingys) might just go away when we redefine GTO from assigning context to the game saying "we're here now, let's define a new game of this and find GTO". Maybe someone has a name for that sort of "doing the best from here on out" thing so we don't have to re-use GTO?

Quote:

what if the ranges that got us here make it impossible to be balanced when optimizing on future streets?

12-10-2014 , 01:02 AM

#121

OnTheRail15

Pooh-Bah

Join Date: Dec 2005 Posts: 4,339

Because when you guys say gto you mean Nash equalibrium or (more often) "balanced" play. Afaik those aren't the same.

12-10-2014 , 01:08 AM

#122

callipygian

slowrolled by tpiranha!

Join Date: Mar 2009 Posts: 19,880

Quote:

Originally Posted by OnTheRail15

Because when you guys say gto you mean Nash equalibrium or (more often) "balanced" play. Afaik those aren't the same.

Nash Equilibrium is a very specific subset of balanced play. "GTO" is a meaningless phrase, so it can mean whatever people want it to mean, and it often does.