Game theory questions - Poker Theory - General Poker Theory Forum

Two Plus Two Forums Poker Strategy Poker Theory & GTO

Game theory questions

Post Reply Subscribe

...

05-27-2020 , 10:03 AM

2outstwice

newbie

Join Date: May 2020 Posts: 15

A little background - I was a SS cash player pre-BF whilst studying mathematics at university. Got back into the game recently but was always reluctant due to the endless 'poker is dead' and GTO threads, and expected tables full of perfectly balanced 'GTO' opponents. To my surprise, the games are as good as I remember (could be due to lockdowns) which got me interested in the current GTO approach.

I have been reading a few papers and have some (probably basic) questions about this theoretical approach to the game.

1) Is there a formal definition (in mathematical terms) of GTO? Discussions in the forum seem to conflate a number of concepts (Nash equilibria, unexploitability). It seems that most players take GTO to just mean an unexploitable strategy that will not lose to any other strategy (although not necessarily be most profitable against a specific strategy). This paper defines GTO poker as 'non-exploitable'. This seems incomplete to me, as the Nash Equilibrium is composed of a pair of strategies (in a HU game). As I understand it, a NE approximation was found for HULHE by having a bot play itself and constantly adjust its strategy until equilibrium was approached. So there may be more than one NE strategy pairs? I think the confusion arises by making analogues to simple games like RPS, in which an unexploitable strategy is obvious (play 1/3 of each at random), whereas the situation is more like the Coordination game, in which there are multiple NE strategies, depending on the other player's strategy.

2) A Nash Equilibrium exists for HUNL due to the fixed-point theorem and the fact that NL with discrete bet sizes is a finite game (durrrr seems to be arguing against this in this thread?) Could there be multiple NE strategies for HUNL?

3) How do solvers like PIOSolver work on a technical level? I have watched videos and understand how they are used but how exactly do they work? I know that they calculate the EV of every possible choice in the game tree (given the parameters and the estimated ranges). Is this basically like a PokerStove calculation but applied to an entire game tree rather than a single node?

Quote

05-27-2020 , 10:42 AM

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

1. Yes, an equilibrium is a pair of strategies. However, if (A,B) and (X,Y) are equilibrium strategy pairs, then so is (A,X) etc. Otherwise, neither pair would be an equilibrium because a player would have an incentive to deviate.

Quote

05-27-2020 , 11:18 AM

2outstwice

newbie

Join Date: May 2020 Posts: 15

I might be misunderstanding but for example, in the driving game, the NE strategies are [L1,L2] & [R1,R2], but [L1,R2] wouldn't be a NE as you state?

Quote

05-27-2020 , 11:41 AM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by 2outstwice

Quote:

Originally Posted by 2outstwice

A
1) Is there a formal definition (in mathematical terms) of GTO? Discussions in the forum seem to conflate a number of concepts (Nash equilibria, unexploitability). It seems that most players take GTO to just mean an unexploitable strategy

GTO in the poker population at large is not well defined. Most people have their own idea of what it means, but there is no formal definition.

In the domain of serious theory and academic research, I believe Nash Equillibrium is typically used.

You can read the sticky at the top of the theory section for more terminology and their definitions.

Quote:

Originally Posted by 2outstwice

3) How do solvers like PIOSolver work on a technical level? I have watched videos and understand how they are used but how exactly do they work? I know that they calculate the EV of every possible choice in the game tree (given the parameters and the estimated ranges). Is this basically like a PokerStove calculation but applied to an entire game tree rather than a single node?

Very high level here.

The game is abstracted to a manageable size (i.e. so that it can be evaluated in a 'reasonable' amount of time with a 'reasonable' amount of resources).

Then there is some sort of data structure that keeps track of EV at different decision points. At each decision point an algorithm , typically counter factual regret minimization, evaluates the ev and tries to minimize regret/maximize return. This process continues until some arbitrary limit to the change in EV is met so that the algorithim doesn't continue indefinitely.

Quote

05-27-2020 , 11:56 AM

2outstwice

newbie

Join Date: May 2020 Posts: 15

Quote:

Originally Posted by just_grindin

This is what I was thinking - the general usage seems to imply people think there is some 'fixed' strategy (i.e. a mixed strategy with static probabilities) that is unexploitable and indifferent to opponent's strategy (equivalent to playing 1/3 of each in RPS), but a strategy in NE is in some way dependent on your opponent's strategy. Of course in reality, human players have leaks and exploitable strategies, implying an exploitative strategy will maximise EV?

Quote:

Originally Posted by just_grindin

The abstractions being limited bet sizings on each street right? I'm reading up on the algorithm but is it convergent to equilibrium in a finite number of iterations?

Quote

05-27-2020 , 03:28 PM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by 2outstwice

I think what you described comes from the idea that by definition if you are playing one of the nash equillibrium strategies from the strategy set then the another player can do no better (and most likely very much worse) than the EV of the other equillibrium strategy in a 2 player constant sum game.

So taken to the extreme it could mean that you are guaranteed some minimum EV from players deviating (i.e. if their deviation is lower EV than the other strategy in the nash equillibrium strategy set then you should gain that EV). However it's possible that if an opponent deviates you could also deviate and gain even more EV, with the risk of opening yourself up to exploitation as well.

Yes in practice the goal is maximizing our own EV so if you can exploit players for more EV than you would playing the nash equillibrium strategy vs their non-nash equillibrium strategy than you should do that. In reality solvers are showing us that Nash Equillibrium strategies in poker are so complex humans could never implement them without computer aided assistance. So exploiting in human vs human play is almost the only way to play, though online players can probably approximate nash equillibriums better than live players given the tools available.

Quote:

Originally Posted by 2outstwice

The abstractions being limited bet sizings on each street right? I'm reading up on the algorithm but is it convergent to equilibrium in a finite number of iterations?

Betsizing is one game abstraction but my understanding is that there are others to limit the number of strategically different card configurations which further cuts down on the game tree size. Suit isomorphism is an example of this.

I am not sure about the convergence guarantees of the algorithm. I assume any convergence guarantees in academic literature would be expressed in terms of polynomial or non-polynomial time and all the other in which case it would likely be polynomial time.

Quote

05-27-2020 , 04:25 PM

2outstwice

newbie

Join Date: May 2020 Posts: 15

Quote:

Originally Posted by just_grindin

So taken to the extreme it could mean that you are guaranteed some minimum EV from players deviating (i.e. if their deviation is lower EV than the other strategy in the nash equillibrium strategy set then you should gain that EV). However it's possible that if an opponent deviates you could also deviate and gain even more EV, with the risk of opening yourself up to exploitation as well.

Why do you say taken to the extreme, is it not the case that any strategy besides the NE strategy will yield a lower EV (or equal at best) in a zero-sum game by definition (assuming both are playing a NE strategy)? I see that second bit, seems intuitive if you consider an exploitative RPS strategy vs a non-equilibrium strategy (e.g. by playing rock vs a player overplaying scissors you are now open to be exploited).

Quote:

Originally Posted by just_grindin

Thanks, that makes sense regarding the abstractions. This paper states CFR has a theoretical convergence bound of O(1/√T) FWIW.

One thing that is still unclear to me is that the solver is calculating EV based on our opponent's (assumed) range (and our own obviously). So all EV calculations on an identical board with our range identical, but our opponent's changed, would be different. My point is, that there is no 'fixed' equilibrium strategy (i.e. one independent of my opponent's strategy). A number of posts in the game theory thread seem to imply the opposite.

Quote

05-27-2020 , 09:04 PM

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

You seem to have a good idea of how it all works, but there are some sticky threads at the top of this forum that answered some of your original questionss..

Quote:

Originally Posted by 2outstwice

My point is, that there is no 'fixed' equilibrium strategy (i.e. one independent of my opponent's strategy).

The Nash Equilibrium, by definition, requires that all participants are rational and are trying to maximize their own EV, given that they believe all other participants are also rational and trying to maximize EV.

If your opponent is just clicking random buttons, or using a random range, or is misapplying what he saw in Pio, he's not being rational, so there wouldn't be an "equilibrium" in the general sense, as such.
To maximally exploit an opponent, you have to know what he's doing. If he's playing irrationally, then following the GTO solution or playing the Nash equilibrium still means he won't beat you, however, because your strategy is unexploitable. He won't find an "exploit" randomly, because there isn't one. His random choices in games like poker will lose money. If you're playing according to the GTO solution, then villain should do likewise, since it is the best response. That's what makes it an equilibrium. There isn't a way to beat it. There's just a "defence" against it, and that defence is part of the equilibrium, arrived at when both players are being rational.

Quote

05-28-2020 , 06:13 AM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by 2outstwice

Because I think people take the argument too far with multihanded poker games

Quote:

Originally Posted by 2outstwice

The solutions you're describing would only be equillibrium solutions for the small sub game described by the parameters you feed into the solver.

There is a solution that describes play on all streets for both opponents and would be based on all game states that would be the full nash equillibrium for poker.

The subgame nash equillibriums might not even show up in the full solution depending on what information is provided to the solver.

Quote

05-28-2020 , 09:14 AM

#10

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

Quote:

Originally Posted by 2outstwice

I might be misunderstanding but for example, in the driving game, the NE strategies are [L1,L2] & [R1,R2], but [L1,R2] wouldn't be a NE as you state?

The driving game isn't zero-sum; poker is. In HU poker, coordinating with your villain is not beneficial to you. Any benefit to you must come at your villain's expense.

The bottom line is that if you find an unexploitable strategy in poker (a strategy part of a NE), it's unexploitable against any strategy. If it weren't, then it wouldn't be equilibrium to begin with because your villain could start using whichever strategy exploits it. If anyone has an incentive to adjust, that's not an equilibrium.

Quote

05-28-2020 , 11:19 AM

#11

2outstwice

newbie

Join Date: May 2020 Posts: 15

Quote:

Originally Posted by heehaww

Makes sense, is this a theorem for zero-sum games then? Or self-evident/tautological?

Quote:

Originally Posted by just_grindin

So the holy grail for GTO poker? Are there any estimates of the feasibility of eventually finding this? That is, is it a matter of computational difficulty due to a massive game tree, or are there any theoretical hurdles to finding this solution?

Quote

05-28-2020 , 01:07 PM

#12

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by 2outstwice

So the holy grail for GTO poker?

Not sure what you mean?

Quote:

Originally Posted by 2outstwice

Are there any estimates of the feasibility of eventually finding this? That is, is it a matter of computational difficulty due to a massive game tree, or are there any theoretical hurdles to finding this solution?

I think heads up limit has been declared 'essentially solved' many years ago when an incredibly low exploitability number had been reached.

I think most academics agree that heads up no limit is very close but yes mostly computational power is the issue.

My understanding is there may be issues with solving 3+ player games using the same methods we use for heads up games but I am just a hobbyist and couldn't say for sure exactly what those difficulties might be or if they really exist or it's just a matter of the rapid increase in complexity with the game tree.

I think there was concerns about in 3+ player games collusion amongst agents increasing complexity more or causing issues with convergence or something but not sure that was a valid concern or not.

Quote

07-17-2020 , 02:47 AM

#13

Roxyyy03

stranger

Join Date: Jul 2020 Posts: 2

I was playing this after playing Gacha Life and terraria mobile vs pc cause sometimes my chips are not enough to play and i need to wait another day to spin again so that i receive some of the free chips from the spin

Quote

07-17-2020 , 05:05 AM

#14

Haizemberg93

Carpal \'Tunnel

Join Date: Sep 2016 Posts: 7,254

1/2. In two player zero-sum games any kind combinations of NE strategy is still NE. I think in HUNL NE is unique, so GTO is that strategy or if its not it means set of all NE.

3. Solver starts with random strategy and then in each step calculation EV of each decision and increases the frequency of higher EV. Repeating this process reaches NE or close to it, but for game with much fewer bet sizes then real NL.

Quote

07-17-2020 , 08:38 AM

#15

MicroDonkYT

adept

Join Date: Jan 2020 Posts: 818

It seems like most people conflate GTO and NE to mean the same thing. In my mind, solvers spit out the NE for the parameters given, but a GTO strategy will always maximally exploit. NE ends up being the result of 2 known ranges exploiting each other.

Quote

07-17-2020 , 12:36 PM

#16

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

I’ve seen lots of confusion around understanding the whole (breakeven/defensive type play) stuff that gets said often. Whether you’re talking about gto, Nash equilibrium, or even a flawed strategy that performs well vs humans, seizing exploitive ev when possible, any of these qualifiers Represent strong strategies. So I think it’s important to talk about what strong strategies do.

Strong strategies gain ev in limited ways. Either you win when your opponent folds or you win showdown. That’s it.

What strong strategies do is maximize total ev by creating ranges for checking, calling, betting and raising that benefit each other in such a way that maximizes(or nearly maximizes) the ev of every combination in the range that continues in the hand. This is true at every decision point in the game tree.

Good ranges represent a system of ev maximization that gives a product that is greater than the sum of its parts. Think about your winrate with AA per hand; it would not be as profitable without the rest of the hands to create the game of imperfect information.

Likewise, that flushdraw you hit sometimes? Wouldn’t be as profitable if you never have missed draws and bluffcatchers in your range.

Every hand is benefiting from some other hand(s) in your range.

Quote

Post Reply Subscribe

...