on maximally exploitive strategies - Poker Theory

Two Plus Two Forums Poker Strategy Poker Theory & GTO

on maximally exploitive strategies

Post Reply Subscribe

...

01-27-2019 , 10:44 AM

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

hypothesis: the maximally exploitive strategy vs a player that deviates from Nash equilibrium at some imaginary distance > or = to (X) is a pure strategy. If the opponent deviates at distance < or = to (X), the maximally exploitive strategy will be a mixed strategy.

How can we define (X)?

Quote

01-27-2019 , 07:18 PM

TheUntiltable

newbie

Join Date: Nov 2018 Posts: 40

It will be impossible to prove this hypothesis for any particular game unless the game is solved.

You cannot hope to work on this for, for example, the whole game of HUNL.

It is not clear for what game you are proposing your hypothesis. But if you are making a hypothesis about games generally it is untrue, since not all maximally exploitative strategies are pure, nor are all equilibrium strategies mixed.

Quote

01-28-2019 , 11:29 AM

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

since not all maximally exploitative strategies are pure, nor are all equilibrium strategies mixed.

The first part is correct, which is proven by the fact that the maximally exploitive strategy vs Nash equilibrium is a mixed strategy(Nash equilibrium maximally exploits Nash equilibrium hu).

The second part is incorrect, at least for poker.

I'm positing that at some undefined measure of deviation from Nash, the maximally exploitive strategy will be a pure strategy.

Quote

01-28-2019 , 02:44 PM

TheUntiltable

newbie

Join Date: Nov 2018 Posts: 40

Whilst I think that it is the case that the equilibrium strategy for HUNL will be mixed at different points, this is undemonstrable in an unsolved game like HUNL.

I think your idea is reasonable but untestable. It depends v much on what spots you are talking about, since not all spots will be mixed even if some are.

Quote

01-28-2019 , 04:12 PM

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

since not all spots will be mixed even if some are.

Such a strategy is said to be mixed. A strategy is only said to be a pure strategy if every single decision is made at 100% frequency.

Quote

01-28-2019 , 08:23 PM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Your X just sounds like epsilon- equillibrium solutions.

Basically a strategy that can be proven to be within epislon (some arbitrary value) of the true equillibrium. I think it's typically proven to be within epislon by playing the nemsis strategy, but not 100% sure.

Quote

01-28-2019 , 09:28 PM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

To clarify I just mean the distance X from Nash resembles the notion of epislon equilibrium. Not your overall hypothesis.

Quote

01-30-2019 , 12:43 AM

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

Originally Posted by TheUntiltable

It will be impossible to prove this hypothesis for any particular game unless the game is solved.

I think this is true.

Quote:

You cannot hope to work on this for, for example, the whole game of HUNL.

No, but I can recognize opponents that I should play pure strategies against, which leads me to think that the hypothesis may be true. Basically, the further from correct play the opponent deviates, the more likely it is to be true.

Quote:

It is not clear for what game you are proposing your hypothesis. But if you are making a hypothesis about games generally it is untrue, since not all maximally exploitative strategies are pure, nor are all equilibrium strategies mixed.

It's certainly true of Rock Paper Scissors, provided that there is no fear of counter exploitation, which is an example of a small (X) value; the slightest deviation from 1/3 rock, 1/3 paper, 1/3 scissors will create a weakness that is best exploited by a pure strategy.

------

If we come at it from the counter angle, we could ask how good does my opponent have to be to make a mixed strategy correct?

Quote

01-31-2019 , 06:10 AM

plexiq

veteran

Join Date: Apr 2007 Posts: 2,554

Quote:

Originally Posted by Bob148

Against any given strategy (no matter the distance to a NE) there exists a maximally exploitative strategy (ie a best response) that is pure. So the first part is true for any (X).

There's no guarantee that a mixed best response exists at all, even if the other player is still exactly playing a NE strategy. NE strategies for poker are not guaranteed to be mixed in general, unless you are talking about a specific game setup. But even assuming that the NE is mixed: If it's possible to deny a mixed best response, then I think it can be done with arbitrarily small deviations from Nash (in terms of exploitability). Since mixed best responses only exist when two or more actions have exactly the same EV, even minuscule changes to the strategy can throw off that balance.

[Maybe you don't have maximally exploitative strategies in mind? There's a related concept of calculating defensive exploitative responses that aren't allowed to deviate from Nash by more than (X), ie exploit a given strategy but limit the amount that your exploiting strategy could be counter-exploited.]

Quote

01-31-2019 , 11:17 AM

#10

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

[Maybe you don't have maximally exploitative strategies in mind? There's a related concept of calculating defensive exploitative responses that aren't allowed to deviate from Nash by more than (X), ie exploit a given strategy but limit the amount that your exploiting strategy could be counter-exploited.]

I'm familiar with this; I've watched a few gtorangebuilder videos on "minimally exploitive strategies."

Quote:

Against any given strategy (no matter the distance to a NE) there exists a maximally exploitative strategy (ie a best response) that is pure. So the first part is true for any (X).

Thus any deviation creates a vulnerability that is best exploited by a pure strategy. This seems unlikely to me, particularly vs strategies that stray from Nash only slightly, perhaps by mixing at frequencies that are just slightly incorrect, but I've been wrong before.

Quote:

There's no guarantee that a mixed best response exists at all, even if the other player is still exactly playing a NE strategy.

Ok thanks, I was under the impression that this was guaranteed for poker.

Quote

01-31-2019 , 01:11 PM

#11

TheUntiltable

newbie

Join Date: Nov 2018 Posts: 40

Plexiq has partly reiterated what I wrote, but whilst we both state that there is no guarantee that NE poker strategy (eg for 100bb HUNL) is mixed, it is almost inconceivable that at some points it is not.

Further, any deviation at all will, as Plexiq says, be likely to require a pure response, but in any real game of poker this is obviously crazy; if you notice, eg, that someone is calling rather than folding 10% too much to river barrels you don’t want to start barrelling 100% of rivers, since you want to be able to continue to exploit them and not “teach” villain to counter-adjust. Against a computer that never adapted, the strategy of 100% barrel (which is a pure strategy) would be better than sometimes folding (which could also be a pure strategy but resembles a mixed one in the correct ways to make my point), but against any adapting player this would be to forgo a continued future edge by making a smaller adjustment.

Quote

02-13-2019 , 09:31 AM

#12

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

But even assuming that the NE is mixed: If it's possible to deny a mixed best response, then I think it can be done with arbitrarily small deviations from Nash (in terms of exploitability).

Ok thanks. Does this imply that the only maximally exploitive strategy which is mixed is Nash vs Nash?

Is (X) different depending on the type of deviation, which determines best pure response? For example, my bluffcatching range should expand vs a player that bluffs this street too much(or too frequently as part of mixed strategy); this is not analogous to making my betting range value or bluff heavy. The two adjustments are not mutually exclusive(it's possible to maximally exploit both mistakes with one strategy, specifically because checking and betting ranges are independent of each other when a pure strategy is used).

Of course, there are irrational ways that our opponent could deviate from Nash, but I'm mostly concerned with mistakes on the margins, as these have the smallest (X) value.

Quote

02-14-2019 , 12:23 PM

#13

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

Further, any deviation at all will, as Plexiq says, be likely to require a pure response, but in any real game of poker this is obviously crazy

I agree and would like to explore the options we have:

strategy A: Nash equilibrium, or some multiway situation that has reduced to heads up play which requires a comaximally exploitive strategy, otherwise default strategy vs unknowns.

strategy B: best pure strategy available given a read of marginal deviation.

strategy C: minimally exploitive strategy(assumed to be mixed?) given a read of marginal deviation.

strategy D: a strategy that is mixed, but slightly more profitable than (C), and thus slightly more exploitable given a read of marginal deviation.

Profitability from best to worst would be: B,D,C,A vs a non adjusting opponent, but I think either D or C would be a prudent choice vs anyone playing for significant stakes today.

Exploitability of course would decrease as: B,D,C,A.

I don't currently have access to solutions, so I shoot for D when I have a read, but I doubt I get anywhere close to the ideal ev of D, which is potentially as profitable as the best pure response, or extremely close to it.

Quote

02-14-2019 , 12:33 PM

#14

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Also, I should note that I assume Nash equilibrium strategy for no limit holdem is a mixed strategy because the solution to limit holdem(which is a structurally much more simple game than no limit, but played with the same number of cards and betting round combination) is quite mixed. I think this would provide the insight necessary to assume that the no limit holdem solution is mixed.

If we take this a step further into multiway poker, the naturally tighter ranges involved should nearly always produce a mixed strategy when play reduces to heads up, but I think that multiway poker solutions should be pure until there are only two players left(speculation).

Quote

02-16-2019 , 08:10 AM

#15

plexiq

veteran

Join Date: Apr 2007 Posts: 2,554

The NE for any reasonably complex form of poker is likely mixed, yes. That's also the case for games with more than 2 players, even in simple push-or-fold games we can see plenty of mixed plays 3+ way, it doesn't need to be down to a 2 player subgame.

Regarding exploitative ranges, I'm not sure if there is a conceptual difference between C and D. Seems like the same thing, you just allow your exploitative range to be exploitable to a different degree? And as you widen/tighten that exploit target you can also get B/A. (No limit = B, zero exploitable = A)

As for what's a reasonable approach for real games, it might be useful to look at the ratio of EV gained by exploiting vs potential EV lost by being exploited. (I vaguely remember that there's a section on various approaches in the Johanson thesis? Can't look it up atm though.)

Quote

02-16-2019 , 09:10 AM

#16

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

That's also the case for games with more than 2 players, even in simple push-or-fold games we can see plenty of mixed plays 3+ way, it doesn't need to be down to a 2 player subgame.

I didn't know this, thanks.

Quote:

Regarding exploitative ranges, I'm not sure if there is a conceptual difference between C and D. Seems like the same thing, you just allow your exploitative range to be exploitable to a different degree?

yes.

Quote

02-16-2019 , 12:33 PM

#17

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

“for what's a reasonable approach for real games, it might be useful to look at the ratio of EV gained by exploiting vs potential EV lost by being exploited. “

If our decision closes the action for the entire hand then there’s no counter exploit. All other decisions will have varying exploitability values.

I prefer to adjust on the margins with more money to win or lose in the hand, but I’ll go for extreme exploitive value closing the hand with a good read.

Quote

02-16-2019 , 01:26 PM

#18

plexiq

veteran

Join Date: Apr 2007 Posts: 2,554

Quote:

Originally Posted by Bob148

If our decision closes the action for the entire hand then there’s no counter exploit. All other decisions will have varying exploitability values.

That depends how you look at it. Sure, it's correct if you take your read of your opponent's range as 100% accurate.

But what if there is even a small chance that your opponent is anticipating your adjustment and is actually playing a different range than you think, maximizing against your exploitative range? If there is uncertainty about your opponent's range then it may still make sense to skip hands with the worst exploiting/exploitable ratio.

Quote

02-16-2019 , 01:42 PM

#19

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

True. The assumption is that the read is good when going for extreme adjustment. The better the read and the less exploitable the adjustment, the better our (exploit:exploitability) ratio is.

This is where I like to draw a line in the sand between (exploitive strategy) vs (exploitive strikes). The former entails more vulnerability to counter exploitation, while the latter allows less counter exploitation, even approaching zero counter exploitability (lest our opponents be psychic).

Quote

02-17-2019 , 09:11 AM

#20

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

Quote:

what if there is even a small chance that your opponent is anticipating your adjustment and is actually playing a different range than you think, maximizing against your exploitative range?

This is a liability for any exploitive opportunity, but there is a distinct difference between these two situations:

a) our opponent reveals his or her strategy to us, while there is still more money to be won or lost in the hand(poor exploit:exploitability ratio).

b) our opponent reveals his or her strategy to us, and our next action ends the hand(zero counter exploitability = ridiculously fantastic exploit:exploitability ratio).

Despite having complete information about our opponent's strategy, we should probably pass on opportunity (a) and seize opportunity (b).

Of course, we don't have complete strategic details about our opponent's strategy, and he or she may in fact be one step ahead of us in adjustment. However, this distinction is important to me; it dictates which exploits I should or should not seize.

Quote

Post Reply Subscribe

...