A Little Tweak Could Cause Big GTO Ulcers - Poker Theory

Two Plus Two Forums Poker Strategy Poker Theory & GTO

A Little Tweak Could Cause Big GTO Ulcers

Post Reply Subscribe

...

11-24-2022 , 03:44 AM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

There are lots of variations of a thought I recently had that is both interesting theoretically and could be the makings of an actual game.

Say two near perfect GTO computers were playing heads up holdem, 1-2 blinds, 200 chip freezout. Except Computer A always has the button and presumably a nice edge. Except that computer B secretly gets to know the "bottom" two cards of the deck before there is any action. Cards that he knows will be out of play. It now recalculates its strategy with that knowledge. But it doesn't calculate the GTO strategy of the two players using the new 50 card deck. Instead it calculates rather the exploitive strategy given the makeup of the deck along with the assumption that the unwitting button computer is playing the GTO strategy for a normal deck. Are you with me so far.?

One interesting question would be "who has the edge?" I'm pretty sure that its computer B in the blind. And I also am quite sure that his altered strategies, depending on the cards he sees, is not that difficult for supercomputers to deduce.

But the more interesting questions arise if Computer A does know that computer B has seen two out of play cards. How does he come up with a GTO strategy without knowing what those cards are? And what makes this question even more difficult is that if it is known by both that B sees two cards, then B can no longer use the simplistic technique of using the optimum exploitive strategy against "original GTO" that A would use if he was unaware B's "two card xray vision". And B would have to take THAT into account.

So lets see someone come up with the two GTO strategies for this simple game with this simple tweak.

Quote

11-24-2022 , 06:10 AM

Yadulla

newbie

Join Date: Nov 2022 Posts: 40

Quote:

Originally Posted by David Sklansky

it doesn't calculate the GTO strategy of the two players using the new 50 card deck. Instead it calculates rather the exploitive strategy given the makeup of the deck along with the assumption that the unwitting button computer is playing the GTO strategy for a normal deck.

There is no such thing as one ideal exploitative strategy. The correct statement was “…Instead it calculates the GTO strategy for the 50 card deck along with the assumption that the unwitting button computer is playing the GTO strategy for a normal deck.”

Quote

11-24-2022 , 10:12 AM

Haizemberg93

Carpal \'Tunnel

Join Date: Sep 2016 Posts: 7,247

In the first version: Even good human players could make good approximations for BB's strategy. There are a lot of indifferent spots in NL and if you know 2 dead cards all of those become clear decisions. I don't think this would provide enough edge for BB to start winning, but i'm not sure.

Calculation for the second version is much more complicated because you have to sum over all of the possible combinations of 2 dead cards. For example, let's say the river brings 3rd heart on the board and BB bets pot and BTN has some weak bluff catcher. If BB saw heart among dead cards he would bluff more and vice versa. From buttons perceptive he doesn't know what are dead cards, so in order to caclulcate whether or not he has + ev call he has to sum BB's bluff frequency over 1225 combinations of 2 dead cards. My guess GTO solution for this would be that BB bluffs 33% of the time after you sum over all possible combinations, but it will over or under bluff depending on dead cards. Button does not know what dead cards are so he can't possibly exploit this.

Even in this game is better for BB then regular HU.

Quote

11-24-2022 , 04:09 PM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

Quote:

Originally Posted by Yadulla

If your opponent is playing the GTO strategy for a 52 card deck and you are aware of that strategy, then you could theoretically devise a perfect counter strategy if the actual deck is missing two cards that you know. Whether the name of that counter strategy is "exploitive" or "GTO" isn't something that concerns me

Quote

11-24-2022 , 04:20 PM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

Quote:

Originally Posted by Haizemberg93

Actually, I would venture to say that the version where the button is aware that the other guy knows two cards would be a BIGGER edge for the big blind even though it logically can't be if both players are perfect. (Do you see why?) But in real life I would suspect that most humans would be sufficiently flustered by their lack of knowledge of the two cards that it would hurt their game more than if they were kept in the dark.

Meanwhile notice that this wrinkle can be applied in a myriad of ways to virtually any poker game, especially online. And, as stated earlier, I think the GTO strategy, especially for the player without the info, might be beyond present day computers. I just want my 1%.

Quote

11-24-2022 , 05:57 PM

Yadulla

newbie

Join Date: Nov 2022 Posts: 40

Quote:

Originally Posted by David Sklansky

I… errrm… hate to be anal… but... you are wrong again lol. You didn't account for the fact that the opponent will likely adjust his strategy depending on what he sees us do.

Quote

11-24-2022 , 06:49 PM

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

Quote:

Originally Posted by Yadulla

I… errrm… hate to be anal… but... you are wrong again lol. You didn't account for the fact that the opponent will likely adjust his strategy depending on what he sees us do.

What/? Are you realizing that the version you are discussing is the one where he doesn't know you saw two cards? His adjustments as far as what we do, will be what 52 card GTO tells him. If he strays from that because of odd ways we played previous hands it would no longer apply to the conditions of the OP.

In any case I don't want to stray from the main aspects of my post. How to come up with the button's GTO strategy and what type of games besides heads up holdem would be amenable to showing some out of play cards to some but not all of the players.

Quote

11-24-2022 , 08:15 PM

Yadulla

newbie

Join Date: Nov 2022 Posts: 40

Quote:

Originally Posted by David Sklansky

Quote:

Originally Posted by David Sklansky

In any case I don't want to stray from the main aspects of my post. How to come up with the button's GTO strategy and what type of games besides heads up holdem would be amenable to showing some out of play cards to some but not all of the players.

To come up with the GTO strategy for the button during the second scenario, do we even need to consider the likelihood of the different cards being seen? We’ll never see the extra cards anyway, so do they really matter to our strategy? Can’t we just feed the bb’s strategy in the first scenario into a normal solver and let it tell us the counter strategy?

I obviously understand the winrate the solver will give us will be wrong, but will that be the only mistake the solver makes?

… I like your idea of negating the advantage gained by position with an extra element, like being able to see some hidden cards. I don’t see why it wouldn’t work in the most versions of Poker. However, I think it’s more important nowadays to create versions of the game that negate a players ability to use GTO, which means the game would need elements that constantly change. I came up with a version which requires a second deck of cards, 1 card from this deck is dealt face up at the start of each hand, and on the card is written a handicap for the table: “Only pot size bets allowed”. I think these handicaps are a very effective way of obliterating the hold GTO has over the tables whilst not actually introducing any new elements to further complicate the game. These handicaps actually make Poker simpler, but the interchangeable nature of them will make the GTO strategy impossibly complicated.

Last edited by Yadulla; 11-24-2022 at 08:21 PM.

Quote

11-25-2022 , 01:03 AM

tombos21

veteran

Join Date: Sep 2018 Posts: 2,045

Quote:

Originally Posted by David Sklansky

Fun question!

This reminds me of the theory of collusion. Imagine two players are sitting next to each other and know each other's hole cards. Even if one player folds, the other knows certain cards are removed and can use that information to their advantage.

Whether or not that card removal information is enough to negate the positional edge is another question. Jesolver can solve with certain cards removed from the deck - although it assumes both players know the card removal. It wouldn't be too hard to craft a strategy that max exploits A in this scenario.

Quote:

Originally Posted by David Sklansky

But the more interesting questions arise if Computer A does know that computer B has seen two out of play cards. How does he come up with a GTO strategy without knowing what those cards are? And what makes this question even more difficult is that if it is known by both that B sees two cards, then B can no longer use the simplistic technique of using the optimum exploitive strategy against "original GTO" that A would use if he was unaware B's "two card xray vision". And B would have to take THAT into account.

So lets see someone come up with the two GTO strategies for this simple game with this simple tweak.

This is harder to solve, but I think the standard algorithm would still solve it... It would take much longer to converge. The simple method of letting both players exploit each other back and forth until neither can improve further would result in the equilibrium strategy for this game. For every flop there are 49C2=1175 combinations of removed cards, and each of those needs to be solved separately. A long and painful solve to be sure!

Quote

11-25-2022 , 01:20 AM

#10

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

Quote:

Originally Posted by Yadulla

Yeah, I realize we are speaking of that version. I didn’t stray from the conditions of the OP. I wasn’t speaking of the previous hands we played but instead about how the opponent will react to the current strategy that we choose to employ. You said that there was a perfect exploit possible, but as we don’t know exactly how the opponent will react to our play, it is impossible to judge the perfect exploit.

This is largely irrelevant to the OP, but it is important, and it’s not just you that misunderstands it. People often say “If two rational players exploited and counter exploited they are destined to reach GTO” but that is not true. This statement also neglects to appreciate that our opponents usually change their strategy based on what we do right now, which enables us to take a measure of control over the way the opponent will act in the future. This ability to control the opponents future reactions enables us to lure our opponent from GTO to make them MORE exploitable over time.

To come up with the GTO strategy for the button during the second scenario, do we even need to consider the likelihood of the different cards being seen? We’ll never see the extra cards anyway, so do they really matter to our strategy? Can’t we just feed the bb’s strategy in the first scenario into a normal solver and let it tell us the counter strategy?

I obviously understand the winrate the solver will give us will be wrong, but will that be the only mistake the solver makes?

… I like your idea of negating the advantage gained by position with an extra element, like being able to see some hidden cards. I don’t see why it wouldn’t work in the most versions of Poker. However, I think it’s more important nowadays to create versions of the game that negate a players ability to use GTO, which means the game would need elements that constantly change. I came up with a version which requires a second deck of cards, 1 card from this deck is dealt face up at the start of each hand, and on the card is written a handicap for the table: “Only pot size bets allowed”. I think these handicaps are a very effective way of obliterating the hold GTO has over the tables whilst not actually introducing any new elements to further complicate the game. These handicaps actually make Poker simpler, but the interchangeable nature of them will make the GTO strategy impossibly complicated.

At this point I am going to wait until others weigh in.

Quote

11-25-2022 , 05:40 AM

#11

itsyaboi

grinder

Join Date: Jun 2022 Posts: 678

If we think about on a particular river, when a proportion of 2 card combos are removed (making villain fold too often) the player who knows can always bluff indifferent hands and maybe some -EV hands, and on other removed combos underbluff a lot, maybe there's some neutral removal combos too.

Then when villain (who doesnt know the removed cards) knows this strategy, the immediate exploit is to look at the aggregate of removed combos and work out is he being bluffed too much or too little (eg if a load of combos were making him overfold a bit and a few combos make him overfold a ton, they would be bluffing too often on average).

I think the final iteration when they know each other's strategies would have to keep villain indifferent with his bluff catchers, but shifting the bluffs to when different cards are removed so you gain EV with that knowledge but they can't tell if you're over or underbluffing without knowing the missing cards themselves.

And similarly across the whole gametree you still want to keep them indifferent when they don't know what cards are removed.

(sorry if this was hard to follow)

Quote

11-25-2022 , 07:15 AM

#12

tombos21

veteran

Join Date: Sep 2018 Posts: 2,045

Let's try reducing this to a river toy game and see if that's solvable.

Pot = 1, stack = 2

BTN has: 44-77, 99-AA
BB has: 88

The GTO strategy in a vacuum is just for BTN to shove everything. 88 has 40% equity and is indifferent between calling/folding.

Now let's imagine BB can see two cards on the bottom of the deck. For simplicity, we'll assume these two cards just remove one of BTN's hands.

In Case 1) BTN doesn't know this, and continues to shove everything. BB simply folds when bluffs are removed, and calls for a small gain when value is removed.

In Case 2) BTN is aware that BB sees these dead cards. What's BTN's best strategy here?

---

There's a "safe" strategy where BTN just assumes the worst, and bluffs less often to compensate for the scenario where one of their value hands is removed. That loses less money than range-shoving. But there might be a better solution?

Quote

11-25-2022 , 11:58 AM

#13

Jarretman

veteran

Join Date: Jun 2010 Posts: 2,937

Don't have any math to back it up but I'd be willing to bet a lot of money that always having the BTN at 100bb effective is much more valuable than being able to see 2 dead cards (regardless if either player is aware or unaware)

This switches at some point when getting shallower

Quote

11-25-2022 , 01:23 PM

#14

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

Obviously if the button knew the GTO strategy for each of the two card combinations he could play a sort of average of all those strategies. Not sure exactly how that would work. Or he could just deal out a quintillion hands and let the computer coalesce to near perfect strategy. But it wouldn't be hard to come up with games where even those techniques would be beyond present day computers. For instance, Pot limit Omaha. Before the river bet the big blind can, if he chooses, match the pot which would force the button to expose three of his four cards of the button's choice. Try that one out Univ of Alberta!.

Quote

11-25-2022 , 01:53 PM

#15

Haizemberg93

Carpal \'Tunnel

Join Date: Sep 2016 Posts: 7,247

Maybe we need even simpler game
Lets do AKQ game, but with 2A 2Q and one K. POT is 1 and stacks are 1. Player1 has range of 2A and 2Q and Player2 has K.

Normal GTO
Player1 jams every A and half of his Q. Player2 calls half of the time. EV of this game for Player1 is 0.75

Player2 can see one dead card but Player1 dose not know this

In that case Player2 calls every time he sees A and folds every time he sees Q. In that case EV for Player1 is
If he has an A in his hand
EV1=1/3*2+2/3*1=2/3
If he has an Q
EV2=-2/3*1+1/3*1=-1/3
In total EV for Player1 is EV=1/2*EV1+1/2EV2=7/12=0.58

Player2 can see one dead card but Player1 dose know this
In order to keep Player2 call indifferent when he sees an A, Player2 must bluff less with his Q. Player2 should now bet 1/4 of his Q. Now when Player2 sees A, betting range is 1 combo of A and 0.5 combos of Q, which makes his call 0ev.
In order to keep Player's1 bluffs indifferent Player2 should call 3/4 of the time when he sees A and fold every time he sees Q. Now when Player1 bets Q, 1/3 of the time dead card is Q and he gets a fold and 2/3 when dead card is A he gets called 3/4 of the time. In total this gives 3/4*2/3=1/2 calling frequency, which keeps Q as 0 ev bluff.
What is Player1 EV know?
When he has Q that is zero and for A its
EV=1/3*3/4*2+2/3*1=7/6
He has A half of the time, so total EV is 7/12=0.58 ! Same as in case(if math is correct) where he didn know Player2 could see dead card. Idk if this is generally true, but bit surprising nonetheless.

Quote

11-25-2022 , 06:34 PM

#16

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,075

[QUOTE=Haizemberg93;57925919

. Idk if this is generally true.[/QUOTE]

Of course it's not. The simplest way to realize this is to notice that if player 2 sees all three dead cards he makes. 75 if it's a secret and .50 if it isn't.

Quote

11-25-2022 , 08:27 PM

#17

Haizemberg93

Carpal \'Tunnel

Join Date: Sep 2016 Posts: 7,247

Good point.

Quote

11-29-2022 , 05:09 AM

#18

Yadulla

newbie

Join Date: Nov 2022 Posts: 40

Quote:

Originally Posted by David Sklansky

Obviously if the button knew the GTO strategy for each of the two card combinations he could play a sort of average of all those strategies.

This isn’t going to be enough. Each action the BB takes changes the likelihood he saw particular cards. Eg. If he makes a 3bet, it becomes more likely that he saw AA, and less likely he saw 72o. You then need to match that up to the different boards, so, if he bets preflop and then the flop is AAA the chance he saw an A will have diminished, the chance he saw the K will have improved in likelihood, etc.

The more I think about this, the more I think that the BB is playing a game very similar to Omaha - he can see 4 cards but can only use two of them. As the button has only holdem cards, you’d expect the solution for this game would be easier to calculate than the GTO strategy for Omaha (that doesn’t mean it’ll be easy to calculate). I suspect you could merge and slightly tweak the solvers used for both holdem and Omaha to find this solution.

I still think the obvious approach is to find the strategy for the first scenario and then just feed that into a normal solver to find the answer to the second scenario. The third scenario is where it starts getting tough, however, I suspect the same programme that was used to find the answer to the first scenario could be tweaked very slightly to find the answer to this. If you had the solution to the first scenario you could use this to find the winrate advantage gained by seeing the cards too… All this does seem very straight forward though. I’m guessing you already knew everything I said here before I said it.

I’m struggling to understand exactly what you're after. Perhaps that is due to my own inadequacies, as this question is far from my specialty subject of exploitative theory. I must say though, that apart from giving us a chance to showcase our understanding of theory, I don't see much point in solving this problem. I think your time would be much better spent considering the game I designed to enhance and then eventually replace trade!! I’m going to take over the economic world in the next 10 years by using a simple game that reverses the negative effects of GTO on trade, but, as others in my field are busy trying to solve problems like this one, I am still forced to work alone. This is slowing me down massively. In short, I just dont get it.

Quote

12-10-2022 , 09:28 PM

#19

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

Improving a bit on Tombos exercise, OOP always has 88 and IP has 99-QQ and 44-77. Same board texture 22223. Pot 100, Eff stacks 200.

GTO is for IP to value shove everything and bluff shove each bluffing combo 66.7% of the time.

OOP calls 33.3% of the time, folds 66.7% (we want to make IP bluffs indifferent, thus they have to work 66.7%).

88 EV is 16.667 (it wins when IP checks obv)

Now OOP knows the bottom 2 cards, assuming per tombos example they always remove IP's range.

3 possible scenarios:

A) The 2 cards are 9-Q: 24.46%
B) The 2 cards are 4-7: 24.46%
C) 1 card is 9-Q and the other 4-7 (or vice-versa): 51.08% of the time

IP is still playing the same strategy, so:

A) IP is overbluffing and OOP always calls.
B) IP is underbluffing and OOP always folds.
C) Instead of 24 value bets and 16 bluffs, now it will be 21 value bets to 14 bluffs, so 88 still has 40% equity and calls 33.3% of the time and folds 66.7%.

OOP calling frequency will be 0.2446 + 0 + [0.333 * 0.5108] = 41.47%.

If IP knows that OOP cheats and knows the final 2 cards (but he still doesn´t know what was removed), IP new strategy should be to never bluff.

But then, obviously, OOP never calls any bet from IP.

Last edited by FazendeiroBH; 12-10-2022 at 09:53 PM.

Quote

12-10-2022 , 10:03 PM

#20

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

Both cards removing the same hand (both being Q's for example), make a difference in the frequencies and will impact OOP final EV.

Continuing on the conclusion of my last post that IP never bluffs.

Scenario A.1) Both cards are 9-Q and they have same face value 3.19% of the time
Scenario A.2) Both cards are 9-Q and they are different cards 21.27% of the time
Scenario B.1) Both cards are 4-7 and they have same face value 3.19% of the time
Scenario B.2) Both cards are 4-7 and they are different cards 21.27% of the time
Scenario C) Same 51.08%

A.1) 19 99-QQ, 24 44-77, OOP EV is 0.5581 * 100 = 55.81
A.2) 18 99-QQ, 24 44-77, OOP EV is 0.5714 * 100 = 57.14
B.1) 24 99-QQ, 19 44-77, OOP EV is 0.4418 * 100 = 44.18
B.2) 24 99-QQ, 18 44-77, OOP EV is 0.4285 * 100 = 42.85
C) 21/14, 0.4 * 100 = 40

Easy to see OOP gained EV (and IP lost) by cheating.

*Key assumption we should not forget: the sizing is always the all in for 2x pot.

Quote

12-10-2022 , 10:15 PM

#21

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

I think this happens because the all in sizing is not optimal anymore.

What if instead of 200 bb eff stacks, we have 141.1382, so even if the only sizing for IP is still the all in, his bluffs should now work only 58.53% (OOP calling 41.47% would be the GTO?)

Quote

12-10-2022 , 10:23 PM

#22

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

A) 0.2446 * 100 = 24.46
B) 0
C) 0.5108 * 41.47 = 21.18

45.64%, so still same conclusion, IP never bluffs. The difference seems smaller (45.64 - 41.47 < 41.47 - 33.33), so it might converge somewhere a little above the actual numbers??

Going to sleep, so if anyone finds this interesting to keep expanding, or find that I made some wrong calculations earlier and everything that came later is wrong, or wrong assumptions all over the place, feel free

Quote

12-11-2022 , 07:50 AM

#23

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

The number seems to be 50%, so a stack to pot ratio of 1.

24.46 + 0.5108x = x
0.4892x = 24.46
x = 50

I think everyone knows the GTO here, but to remember, shove all value and bluffs 50/50, 88 calls 50/50.

With the cheating game and our sceneries:

A) 0.2446 * 100 = 24.46
B) 0
C) 0.5108 * 50 = 25.54
50%

Quote

12-11-2022 , 08:05 AM

#24

FazendeiroBH

self-banned

Join Date: Oct 2014 Posts: 2,514

My EV calculations on post #20 are way off though

Quote

Post Reply Subscribe

...