Two Plus Two Publishing LLC
Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > >

Notices

Poker Theory General poker theory

Reply
 
Thread Tools Display Modes
Old 05-27-2020, 10:03 AM   #1
2outstwice
newbie
 
Join Date: May 2020
Posts: 15
Game theory questions

A little background - I was a SS cash player pre-BF whilst studying mathematics at university. Got back into the game recently but was always reluctant due to the endless 'poker is dead' and GTO threads, and expected tables full of perfectly balanced 'GTO' opponents. To my surprise, the games are as good as I remember (could be due to lockdowns) which got me interested in the current GTO approach.

I have been reading a few papers and have some (probably basic) questions about this theoretical approach to the game.

1) Is there a formal definition (in mathematical terms) of GTO? Discussions in the forum seem to conflate a number of concepts (Nash equilibria, unexploitability). It seems that most players take GTO to just mean an unexploitable strategy that will not lose to any other strategy (although not necessarily be most profitable against a specific strategy). This paper defines GTO poker as 'non-exploitable'. This seems incomplete to me, as the Nash Equilibrium is composed of a pair of strategies (in a HU game). As I understand it, a NE approximation was found for HULHE by having a bot play itself and constantly adjust its strategy until equilibrium was approached. So there may be more than one NE strategy pairs? I think the confusion arises by making analogues to simple games like RPS, in which an unexploitable strategy is obvious (play 1/3 of each at random), whereas the situation is more like the Coordination game, in which there are multiple NE strategies, depending on the other player's strategy.

2) A Nash Equilibrium exists for HUNL due to the fixed-point theorem and the fact that NL with discrete bet sizes is a finite game (durrrr seems to be arguing against this in this thread?) Could there be multiple NE strategies for HUNL?

3) How do solvers like PIOSolver work on a technical level? I have watched videos and understand how they are used but how exactly do they work? I know that they calculate the EV of every possible choice in the game tree (given the parameters and the estimated ranges). Is this basically like a PokerStove calculation but applied to an entire game tree rather than a single node?
2outstwice is offline   Reply With Quote
Old 05-27-2020, 10:42 AM   #2
heehaww
Pooh-Bah
 
heehaww's Avatar
 
Join Date: Aug 2011
Location: Tacooos!!!!
Posts: 4,807
Re: Game theory questions

1. Yes, an equilibrium is a pair of strategies. However, if (A,B) and (X,Y) are equilibrium strategy pairs, then so is (A,X) etc. Otherwise, neither pair would be an equilibrium because a player would have an incentive to deviate.
heehaww is offline   Reply With Quote
Old 05-27-2020, 11:18 AM   #3
2outstwice
newbie
 
Join Date: May 2020
Posts: 15
Re: Game theory questions

I might be misunderstanding but for example, in the driving game, the NE strategies are [L1,L2] & [R1,R2], but [L1,R2] wouldn't be a NE as you state?
2outstwice is offline   Reply With Quote
Old 05-27-2020, 11:41 AM   #4
just_grindin
Pooh-Bah
 
Join Date: Dec 2007
Posts: 5,263
Re: Game theory questions

Quote:
Originally Posted by 2outstwice View Post
A little background - I was a SS cash player pre-BF whilst studying mathematics at university. Got back into the game recently but was always reluctant due to the endless 'poker is dead' and GTO threads, and expected tables full of perfectly balanced 'GTO' opponents. To my surprise, the games are as good as I remember (could be due to lockdowns) which got me interested in the current GTO approach.



I have been reading a few papers and have some (probably basic) questions about this theoretical approach to the game.



1) Is there a formal definition (in mathematical terms) of GTO? Discussions in the forum seem to conflate a number of concepts (Nash equilibria, unexploitability). It seems that most players take GTO to just mean an unexploitable strategy that will not lose to any other strategy (although not necessarily be most profitable against a specific strategy). This paper defines GTO poker as 'non-exploitable'. This seems incomplete to me, as the Nash Equilibrium is composed of a pair of strategies (in a HU game). As I understand it, a NE approximation was found for HULHE by having a bot play itself and constantly adjust its strategy until equilibrium was approached. So there may be more than one NE strategy pairs? I think the confusion arises by making analogues to simple games like RPS, in which an unexploitable strategy is obvious (play 1/3 of each at random), whereas the situation is more like the Coordination game, in which there are multiple NE strategies, depending on the other player's strategy.



2) A Nash Equilibrium exists for HUNL due to the fixed-point theorem and the fact that NL with discrete bet sizes is a finite game (durrrr seems to be arguing against this in this thread?) Could there be multiple NE strategies for HUNL?



3) How do solvers like PIOSolver work on a technical level? I have watched videos and understand how they are used but how exactly do they work? I know that they calculate the EV of every possible choice in the game tree (given the parameters and the estimated ranges). Is this basically like a PokerStove calculation but applied to an entire game tree rather than a single node?
Quote:
Originally Posted by 2outstwice View Post
A
1) Is there a formal definition (in mathematical terms) of GTO? Discussions in the forum seem to conflate a number of concepts (Nash equilibria, unexploitability). It seems that most players take GTO to just mean an unexploitable strategy
GTO in the poker population at large is not well defined. Most people have their own idea of what it means, but there is no formal definition.

In the domain of serious theory and academic research, I believe Nash Equillibrium is typically used.

You can read the sticky at the top of the theory section for more terminology and their definitions.





Quote:
Originally Posted by 2outstwice View Post
3) How do solvers like PIOSolver work on a technical level? I have watched videos and understand how they are used but how exactly do they work? I know that they calculate the EV of every possible choice in the game tree (given the parameters and the estimated ranges). Is this basically like a PokerStove calculation but applied to an entire game tree rather than a single node?
Very high level here.

The game is abstracted to a manageable size (i.e. so that it can be evaluated in a 'reasonable' amount of time with a 'reasonable' amount of resources).

Then there is some sort of data structure that keeps track of EV at different decision points. At each decision point an algorithm , typically counter factual regret minimization, evaluates the ev and tries to minimize regret/maximize return. This process continues until some arbitrary limit to the change in EV is met so that the algorithim doesn't continue indefinitely.
just_grindin is offline   Reply With Quote
Old 05-27-2020, 11:56 AM   #5
2outstwice
newbie
 
Join Date: May 2020
Posts: 15
Re: Game theory questions

Quote:
Originally Posted by just_grindin View Post
GTO in the poker population at large is not well defined. Most people have their own idea of what it means, but there is no formal definition.

In the domain of serious theory and academic research, I believe Nash Equillibrium is typically used.
This is what I was thinking - the general usage seems to imply people think there is some 'fixed' strategy (i.e. a mixed strategy with static probabilities) that is unexploitable and indifferent to opponent's strategy (equivalent to playing 1/3 of each in RPS), but a strategy in NE is in some way dependent on your opponent's strategy. Of course in reality, human players have leaks and exploitable strategies, implying an exploitative strategy will maximise EV?


Quote:
Originally Posted by just_grindin View Post
Very high level here.

The game is abstracted to a manageable size (i.e. so that it can be evaluated in a 'reasonable' amount of time with a 'reasonable' amount of resources).

Then there is some sort of data structure that keeps track of EV at different decision points. At each decision point an algorithm , typically counter factual regret minimization, evaluates the ev and tries to minimize regret/maximize return. This process continues until some arbitrary limit to the change in EV is met so that the algorithim doesn't continue indefinitely.
The abstractions being limited bet sizings on each street right? I'm reading up on the algorithm but is it convergent to equilibrium in a finite number of iterations?
2outstwice is offline   Reply With Quote
Old 05-27-2020, 03:28 PM   #6
just_grindin
Pooh-Bah
 
Join Date: Dec 2007
Posts: 5,263
Re: Game theory questions

Quote:
Originally Posted by 2outstwice View Post
This is what I was thinking - the general usage seems to imply people think there is some 'fixed' strategy (i.e. a mixed strategy with static probabilities) that is unexploitable and indifferent to opponent's strategy (equivalent to playing 1/3 of each in RPS), but a strategy in NE is in some way dependent on your opponent's strategy. Of course in reality, human players have leaks and exploitable strategies, implying an exploitative strategy will maximise EV?
I think what you described comes from the idea that by definition if you are playing one of the nash equillibrium strategies from the strategy set then the another player can do no better (and most likely very much worse) than the EV of the other equillibrium strategy in a 2 player constant sum game.

So taken to the extreme it could mean that you are guaranteed some minimum EV from players deviating (i.e. if their deviation is lower EV than the other strategy in the nash equillibrium strategy set then you should gain that EV). However it's possible that if an opponent deviates you could also deviate and gain even more EV, with the risk of opening yourself up to exploitation as well.


Yes in practice the goal is maximizing our own EV so if you can exploit players for more EV than you would playing the nash equillibrium strategy vs their non-nash equillibrium strategy than you should do that. In reality solvers are showing us that Nash Equillibrium strategies in poker are so complex humans could never implement them without computer aided assistance. So exploiting in human vs human play is almost the only way to play, though online players can probably approximate nash equillibriums better than live players given the tools available.


Quote:
Originally Posted by 2outstwice View Post
The abstractions being limited bet sizings on each street right? I'm reading up on the algorithm but is it convergent to equilibrium in a finite number of iterations?
Betsizing is one game abstraction but my understanding is that there are others to limit the number of strategically different card configurations which further cuts down on the game tree size. Suit isomorphism is an example of this.

I am not sure about the convergence guarantees of the algorithm. I assume any convergence guarantees in academic literature would be expressed in terms of polynomial or non-polynomial time and all the other in which case it would likely be polynomial time.
just_grindin is offline   Reply With Quote
Old 05-27-2020, 04:25 PM   #7
2outstwice
newbie
 
Join Date: May 2020
Posts: 15
Re: Game theory questions

Quote:
Originally Posted by just_grindin View Post
So taken to the extreme it could mean that you are guaranteed some minimum EV from players deviating (i.e. if their deviation is lower EV than the other strategy in the nash equillibrium strategy set then you should gain that EV). However it's possible that if an opponent deviates you could also deviate and gain even more EV, with the risk of opening yourself up to exploitation as well.
Why do you say taken to the extreme, is it not the case that any strategy besides the NE strategy will yield a lower EV (or equal at best) in a zero-sum game by definition (assuming both are playing a NE strategy)? I see that second bit, seems intuitive if you consider an exploitative RPS strategy vs a non-equilibrium strategy (e.g. by playing rock vs a player overplaying scissors you are now open to be exploited).


Quote:
Originally Posted by just_grindin View Post

Betsizing is one game abstraction but my understanding is that there are others to limit the number of strategically different card configurations which further cuts down on the game tree size. Suit isomorphism is an example of this.

I am not sure about the convergence guarantees of the algorithm. I assume any convergence guarantees in academic literature would be expressed in terms of polynomial or non-polynomial time and all the other in which case it would likely be polynomial time.
Thanks, that makes sense regarding the abstractions. This paper states CFR has a theoretical convergence bound of O(1/√T) FWIW.

One thing that is still unclear to me is that the solver is calculating EV based on our opponent's (assumed) range (and our own obviously). So all EV calculations on an identical board with our range identical, but our opponent's changed, would be different. My point is, that there is no 'fixed' equilibrium strategy (i.e. one independent of my opponent's strategy). A number of posts in the game theory thread seem to imply the opposite.
2outstwice is offline   Reply With Quote
Old 05-27-2020, 09:04 PM   #8
ArtyMcFly
Carpal \'Tunnel
 
ArtyMcFly's Avatar
 
Join Date: Dec 2014
Location: Enchantment Under the Sea
Posts: 13,232
Re: Game theory questions

You seem to have a good idea of how it all works, but there are some sticky threads at the top of this forum that answered some of your original questionss..
Quote:
Originally Posted by 2outstwice View Post
My point is, that there is no 'fixed' equilibrium strategy (i.e. one independent of my opponent's strategy).
The Nash Equilibrium, by definition, requires that all participants are rational and are trying to maximize their own EV, given that they believe all other participants are also rational and trying to maximize EV.

If your opponent is just clicking random buttons, or using a random range, or is misapplying what he saw in Pio, he's not being rational, so there wouldn't be an "equilibrium" in the general sense, as such.
To maximally exploit an opponent, you have to know what he's doing. If he's playing irrationally, then following the GTO solution or playing the Nash equilibrium still means he won't beat you, however, because your strategy is unexploitable. He won't find an "exploit" randomly, because there isn't one. His random choices in games like poker will lose money. If you're playing according to the GTO solution, then villain should do likewise, since it is the best response. That's what makes it an equilibrium. There isn't a way to beat it. There's just a "defence" against it, and that defence is part of the equilibrium, arrived at when both players are being rational.
ArtyMcFly is offline   Reply With Quote
Old 05-28-2020, 06:13 AM   #9
just_grindin
Pooh-Bah
 
Join Date: Dec 2007
Posts: 5,263
Re: Game theory questions

Quote:
Originally Posted by 2outstwice View Post
Why do you say taken to the extreme, is it not the case that any strategy besides the NE strategy will yield a lower EV (or equal at best) in a zero-sum game by definition (assuming both are playing a NE strategy)?
Because I think people take the argument too far with multihanded poker games





Quote:
Originally Posted by 2outstwice View Post
Thanks, that makes sense regarding the abstractions. This paper states CFR has a theoretical convergence bound of O(1/√T) FWIW.

One thing that is still unclear to me is that the solver is calculating EV based on our opponent's (assumed) range (and our own obviously). So all EV calculations on an identical board with our range identical, but our opponent's changed, would be different. My point is, that there is no 'fixed' equilibrium strategy (i.e. one independent of my opponent's strategy). A number of posts in the game theory thread seem to imply the opposite.
The solutions you're describing would only be equillibrium solutions for the small sub game described by the parameters you feed into the solver.


There is a solution that describes play on all streets for both opponents and would be based on all game states that would be the full nash equillibrium for poker.

The subgame nash equillibriums might not even show up in the full solution depending on what information is provided to the solver.
just_grindin is offline   Reply With Quote
Old 05-28-2020, 09:14 AM   #10
heehaww
Pooh-Bah
 
heehaww's Avatar
 
Join Date: Aug 2011
Location: Tacooos!!!!
Posts: 4,807
Re: Game theory questions

Quote:
Originally Posted by 2outstwice View Post
I might be misunderstanding but for example, in the driving game, the NE strategies are [L1,L2] & [R1,R2], but [L1,R2] wouldn't be a NE as you state?
The driving game isn't zero-sum; poker is. In HU poker, coordinating with your villain is not beneficial to you. Any benefit to you must come at your villain's expense.

The bottom line is that if you find an unexploitable strategy in poker (a strategy part of a NE), it's unexploitable against any strategy. If it weren't, then it wouldn't be equilibrium to begin with because your villain could start using whichever strategy exploits it. If anyone has an incentive to adjust, that's not an equilibrium.
heehaww is offline   Reply With Quote
Old 05-28-2020, 11:19 AM   #11
2outstwice
newbie
 
Join Date: May 2020
Posts: 15
Re: Game theory questions

Quote:
Originally Posted by heehaww View Post
The driving game isn't zero-sum; poker is. In HU poker, coordinating with your villain is not beneficial to you. Any benefit to you must come at your villain's expense.

The bottom line is that if you find an unexploitable strategy in poker (a strategy part of a NE), it's unexploitable against any strategy. If it weren't, then it wouldn't be equilibrium to begin with because your villain could start using whichever strategy exploits it. If anyone has an incentive to adjust, that's not an equilibrium.
Makes sense, is this a theorem for zero-sum games then? Or self-evident/tautological?

Quote:
Originally Posted by just_grindin View Post
The solutions you're describing would only be equillibrium solutions for the small sub game described by the parameters you feed into the solver.


There is a solution that describes play on all streets for both opponents and would be based on all game states that would be the full nash equillibrium for poker.

The subgame nash equillibriums might not even show up in the full solution depending on what information is provided to the solver.
So the holy grail for GTO poker? Are there any estimates of the feasibility of eventually finding this? That is, is it a matter of computational difficulty due to a massive game tree, or are there any theoretical hurdles to finding this solution?
2outstwice is offline   Reply With Quote
Old 05-28-2020, 01:07 PM   #12
just_grindin
Pooh-Bah
 
Join Date: Dec 2007
Posts: 5,263
Re: Game theory questions

Quote:
Originally Posted by 2outstwice View Post
So the holy grail for GTO poker?
Not sure what you mean?

Quote:
Originally Posted by 2outstwice View Post
Are there any estimates of the feasibility of eventually finding this? That is, is it a matter of computational difficulty due to a massive game tree, or are there any theoretical hurdles to finding this solution?
I think heads up limit has been declared 'essentially solved' many years ago when an incredibly low exploitability number had been reached.

I think most academics agree that heads up no limit is very close but yes mostly computational power is the issue.

My understanding is there may be issues with solving 3+ player games using the same methods we use for heads up games but I am just a hobbyist and couldn't say for sure exactly what those difficulties might be or if they really exist or it's just a matter of the rapid increase in complexity with the game tree.

I think there was concerns about in 3+ player games collusion amongst agents increasing complexity more or causing issues with convergence or something but not sure that was a valid concern or not.
just_grindin is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -4. The time now is 06:02 AM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright 2008-2017, Two Plus Two Interactive
 
 
Poker Players - Streaming Live Online