GTO Equations - Poker Theory - General Poker Theory Forum

Two Plus Two Forums Poker Strategy Poker Theory & GTO

GTO Equations

Post Reply Subscribe

...

11-08-2018 , 05:29 AM

Sciolist

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 17,526

I'm interested in the idea of making a GTO solver. Are the equations needed out there somewhere? I couldn't find much so far.

Quote

11-08-2018 , 07:14 AM

outfit

old hand

Join Date: Feb 2015 Posts: 1,396

There is this:

https://www.husng.com/content/will-t...video-pack-2-0
https://m.youtube.com/watch?v=Gm4DXFZLEew
Using 'fictitious play'

Also, research, 'counterfactual regret minimization'
This is probably the way you want to go.

Make it open source.

Last edited by outfit; 11-08-2018 at 07:19 AM.

Quote

11-08-2018 , 12:16 PM

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by outfit

For anyone interested in a working implementation of counterfactual regret minimization I found a python implementation of libratus on github by a college student for their Applied Machine Learning course final project.

https://github.com/mp3242/coms4995-finalproj

Also, here is a nice video from MIT open course ware(Matt Hawrilenko is the guest speaker). Although this is mostly about post flop decision making: https://youtu.be/kn92WXcKr0M

Last edited by alkimia; 11-08-2018 at 12:21 PM.

Quote

11-08-2018 , 01:18 PM

robert_utk

Not From the UK

Join Date: Jan 2005 Posts: 4,822

OP wants a solver, not an AI.

Also, if anyone has access to an implementation of MODICUM that would be much appreciated.

Quote

11-08-2018 , 02:09 PM

outfit

old hand

Join Date: Feb 2015 Posts: 1,396

If it is an implementation of counterfactual regret minimization and open source, he should have no problem turning it into a solver.
P.S. make it open source
P.S.S. make sure it runs on Linux too

Last edited by outfit; 11-08-2018 at 02:16 PM.

Quote

11-09-2018 , 11:01 AM

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by robert_utk

OP wants a solver, not an AI.

Also, if anyone has access to an implementation of MODICUM that would be much appreciated.

Are you referring to the agent described in the paper "Depth-Limited Solving for Imperfect-Information Games" by Carnegie Mellon University? If so I've been unable to find an implementation and unlike the libratus paper that paper contains more useful information for actually writing an implementation. This has sparked my interest and I think I'm going to attempt to implement it. When I'm finished I'll update this post with links to a github repo with the source(MIT license). I always thought that the computing requirements for libratus were extreme and somewhat unnecessary. A depth-limited approach would seem to solve that issue while giving similar end results.

Quote

11-09-2018 , 11:04 AM

robert_utk

Not From the UK

Join Date: Jan 2005 Posts: 4,822

Quote:

Originally Posted by alkimia

Oh yes, absolutely. Trying to run Libratus on a single PC would be pointless, imo.

Quote

11-12-2018 , 04:22 AM

Sciolist

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 17,526

Thanks all.

Quote

12-15-2018 , 08:14 PM

vincentchase

stranger

Join Date: Sep 2009 Posts: 1

Quote:

Originally Posted by alkimia

I agree. It takes relatively less processing power, or human thinking power, for a master human to compute decisions in real time vs an expert. An expert has to use more energy to get to the same decision as a master. Fair to say that Libratus is currently a superhuman expert. With this logic a pre trained master would use far less computing power in real time. I got a few ideas with a depth limited approach, that i would like to unpack. Interested in collaborating. My interested is also sparked!

Quote

12-16-2018 , 05:27 PM

#10

Konroy

stranger

Join Date: Sep 2011 Posts: 12

I'm also interested to make my own solver, but i'm to lazy sorry

Quote

12-18-2018 , 05:11 PM

#11

Fishing

centurion

Join Date: Jul 2004 Posts: 126

Will Tipton has an open source solver on GitHub and sells a set of videos that step through explaining step by step. Written in Python. No GUI. Could easily be customized. I plan to use it but have not incorporated into my Java application.

Quote

05-06-2019 , 05:10 PM

#12

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

I've been working on this for some time and have made some real progress. So far I haven't released anything yet but in the process of working on this I noticed that one of the largest bottlenecks was in equity estimation. Most people probably use montecarlo simulations which can drastically slow things down(hence the need for depth limited searches). I wrote a javascript based simulator that basically runs a montecarlo simulation against from 2 to 10 weighted ranges and has an output similar to that of equilab/pokerstove. The issue is that to run 100k simulations on 5x weighted ranges it can take 3+ seconds in a browser and probably half of that in nodejs. I'm sure I could optimize things a bit and get a little improvement but I don't think it would be much. So what I did was create a dataset containing 2-10 random ranges and their assoicated equities. Then I created a standard keras lstm model and trained it using that dataset. I need to generate more samples but so far the results are extremely promising. Hopefully this would give results nearly identical to 100k monetcarlo simulation while only needing run through the model once. Instead of taking 3 seconds for one result it should be able to do thousands, or more, in that same time frame. I'll clean up the source and release it on github in the next couple days.

Quote

05-09-2019 , 01:28 PM

#13

pot_committed

centurion

Join Date: Oct 2009 Posts: 176

I know a little about GTO, somewhat more about poker, but I know a lot about modeling,model usage, and particularly model validation. If you are thinking about building a model, the first thing you should do is define model purpose. Is this an academic exercise, something you are doing for fun, are you going to use the model to analyze particular hands post-mortem, or do you intend to use this model to play poker? Maybe everyone who has commented implicitly understands what you want to do, but... maybe not.

In general, until you define model purpose, comments about methodology are premature. I also wonder whether everyone has a common definition of "GTO". To me, GTO means creating an unexploitable strategy in a game with rational players. If that isn't the common definition, it would be helpful to know what it is, so we are all talking about the same thing.

I have serious reservations about the ability to parameterize any model in a way that will provide practical advantages, and question whether it is worthwhile as anything other than an analytical exercise. Having said that, if you want to do it, you should do it with some sound concepts about how to design, implement, and test a model.

Quote

05-11-2019 , 02:29 PM

#14

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by pot_committed

What I was referring to was a model that would estimate all in equities similar to pokerstove/equilab but with weighted ranges. In terms of model it would be a rather simple squence to squence encoding problem which in this case LSTM is a great fit. Since a nerual network is, at its bare base, an universal function approximator I think in theory that it should be able to learn to approximate a monte carlo simulation. The input would be in the form of up to 10x weighted ranges(e.g. a shape of [10,1326] with an output of [10] or 1x probability per range). Using masking it is rather trival to allow for variable input lengths(e.g. 2-10x input ranges). As far as testing it is just a classic supervised learning problem so the SOP is the same(cross validation, etc).

Basically this would just make one part of a solver much faster(equity estimation)

Quote

05-11-2019 , 04:49 PM

#15

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by pot_committed

I also wonder whether everyone has a common definition of "GTO". To me, GTO means creating an unexploitable strategy in a game with rational players. If that isn't the common definition, it would be helpful to know what it is, so we are all talking about the same thing.

GTO is short for "Game Theory Optimal". Game theory is a branch of mathematics where a game is defined as any interaction between two or more rational individuals and was traditionally meant for zero sum games specifically although it can be used in non-zero sum games.

Poker is a sequential, imperfect information, zero sum game that is usually represented by a decision/game tree. Poker is an imperfect information game so parallel sub games affect how we should be playing in the sub game we're currently in. For example, in the context of a real world situation you should be thinking about all of the hands you would have in a particular situation and how your current hand ranks within that range relative to the current situation(board,etc).

The current state of the art for this problem is counterfactual regret minimization. This is where we assume two or more players playing a rational strategy. Lets consider a simple game like paper, rock, scissors where the "GTO" strategy is to choose each randomly 1/3rd of the time. The way the agent learns is by taking actions and observing both their regret and utility. The final strategy after many iterations is the normalized sum of the utility. During the learning process the agent minimizes the regret. Here is a simple example written in javascript: https://pastebin.com/2tr4jNGZ

Here is a fantastic video from one of the lead researchers on the libratus project: https://youtu.be/McV4a6umbAY

Last edited by alkimia; 05-11-2019 at 05:01 PM.

Quote

05-12-2019 , 11:53 AM

#16

pot_committed

centurion

Join Date: Oct 2009 Posts: 176

Quote:

Originally Posted by alkimia

Thanks for the valuable background info. I thought the original request for "the equations" for a "GTO solver" was a bit vague, but maybe not to you guys.

A couple of comments about your R, P, S example. The first is that the GTO strategy is zero EV. That is kind of the point of a GTO strategy in a zero sum game; both "rational" players ultimately adopt the same strategy so they are each unexploitable.

The GTO strategy is not just zero EV against another rational player who uses the same strategy; it is zero EV against ALL strategies, even the dumbest. It is zero EV against a player who chooses rock every time.

A pure GTO strategy is similarly zero EV in poker (negative EV with the rake), so the value of studying GTO strategy is:

a) to detect when opponents deviate from GTO so we can exploit them, and

b) to understand when we deviate from GTO so we are not exploited.

In any model design I always believe in starting with intended uses. I appreciate your earlier explanation that you are looking for an all in equity estimator. That is more meaningful to me than "the equations for a GTO solver". If the "intended use" is insight into theoretically perfect play for research and advancing the state of play at the highest level, I am all for it. Otherwise I would suggest proceeding with extreme caution.

Quote

05-12-2019 , 01:23 PM

#17

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Well gto for 3+ player sequential zero sum games is probably not the best route in the first place. In either the video I first linked to or in a related one by the same guy he goes into a little detail about why the same system will kind of work for 3+ players but is not ideal.

GTO is really about finding a nash equilibrium or minimizing exploitation. However cfr can easily be used to find an optimal, in terms of profit, strategy against any other strategy that isn't the equilibrium. In the simple cfr class I linked to that is the "train" method. The problem with doing that in real life is that the opponents are unlikely to have static strategies and will at some point adjust. For example, in rps if an opponent throws rock 5 times in a row one could be mistaken in thinking they're always throwing rock. At that point they could throw scissors and beat us.

A lot of this was also covered in that video.

Quote

05-12-2019 , 02:26 PM

#18

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by pot_committed

A pure GTO strategy is similarly zero EV in poker (negative EV with the rake), so the value of studying GTO strategy is:

a) to detect when opponents deviate from GTO so we can exploit them, and

b) to understand when we deviate from GTO so we are not exploited.

In theory that is correct but in practice, in a heads up game, if a player were able to play a equilibrium strategy they would actually make money. This is because the game of poker is very complex and a human opponent is unlikely to play an optimal strategy or rather is very likely to play a sub-optimal strategy. The reason for using a equilibrium based strategy is to guarantee that you at least wont lose in expectation and in practice will probably win in expectation against most opponents.

Also, exploitation is difficult to model correctly and especially so in 3+ player games. The main issue is that If I were to adjust to some sub optimal strategy in order to take advantage of an opponent other players can adjust as well or could simply play an equilibrium strategy and beat me, or that player that I adjusted for can also adjust to my adjustment. Basically adjusting to exploit another player drastically increases our own exploitability. This ultimately leads to the same situation we see in the training process in the CFR class I linked to previously where agents go back and forth until they ultimately converge on the equilibria

Last edited by alkimia; 05-12-2019 at 02:50 PM.

Quote

05-12-2019 , 03:25 PM

#19

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

For modeling exploitative based play I've been working on is breaking players into groups based on frequencies/style of play. So far I've found 21 groups of players in my relatively small dataset(25 million hands of online 6max small stakes games) and then generating a near optimal strategy for their frequencies. Meaning I look at the group's average vpip, pfr, 3bet, etc stats for each position and then train a rather simple genetic algorithm to approximate their play. Since the goal is not to actually develop an optimal strategy but rather a strategy that would be optimal with their specific frequencies we should be fairly close to estimating how those players will play in general. Then the plan is to use cfr to find the best strategy given our estimated opponent strategies. This can be done in the background and fairly quickly and is part of the reason for me developing the equity estimation model.. This would increase our exploitability but the profit gained from taking advantage of player pool weaknesses would likely more than cover any losses from us increasing our exploitability. Well at least at low stakes. As the skill level of the player pool increases the more we'll want to be leaning towards equilibria based play. This is very similar to how pros look at this situation I imagine.

I still think that neither the equilibria based play nor the play I mentioned above is the best possible style of play for 3+ player no limit games. I had an idea that I think would probably beat the current state of the art in 3+ player games. This would be using a lot of the same tech used in nlp. Basically I would take hand histories, abstract some things like bet sizes, and then tokenize them using standard text tokenization tools. Then train an embedding layer on millions of those hand histories. Then use that embedding layer combined with capsnet for a standard reinforcement learning based agent in a multiplayer self play based game. A similar concept could be applied to building models that approximate opponent play. Instead of using reinforcement learning we could use something similar to a generative adversarial network.

Quote

05-12-2019 , 06:17 PM

#20

pot_committed

centurion

Join Date: Oct 2009 Posts: 176

Quote:

Originally Posted by alkimia

The reason for using a equilibrium based strategy is to guarantee that you at least wont lose in expectation and in practice will probably win in expectation against most opponents.

Also, exploitation is difficult to model correctly ...

This makes sense, especially after watching the Hawrilenko video, which I found fascinating. The RPS analogy threw me off because it is zero EV, and because there are better exploitation strategies against any non-GTO strategy.

There are a couple of differences in the case of poker. Most importantly, in RPS there are no bad decisions at any particular decision point. In poker, a player who opens too wide UTG is making a negative EV decision. I don't have to "exploit" that decision; my +EV will emerge just by playing a GTO strategy. There may be a superior exploitation strategy, but as Hawrilenko notes, those marginal changes are secondary to "know yourself", and focusing on your own ranges.

The second difference is the sequential nature of poker, and the "nested subgame" nature across streets. The implications here are a lot less intuitive, but clearly different than RPS.

Thanks to those who posted additional references. Lots to think about.

Quote

05-12-2019 , 08:41 PM

#21

alkimia

enthusiast

Join Date: Jul 2018 Posts: 91

Quote:

Originally Posted by pot_committed

That was a fantastic summary!

for anyone who may be interested the Hawrilenko video at mit is here: https://www.youtube.com/watch?v=kn92WXcKr0M

Assuming thats the one you're referring to and if not I'd love to see any other video from him on the subject

Quote

05-12-2019 , 10:01 PM

#22

samooth

veteran

Join Date: May 2009 Posts: 3,350

Quote:

Originally Posted by Sciolist

Are the equations needed out there somewhere?

haha this is great

Quote

05-13-2019 , 04:42 PM

#23

pot_committed

centurion

Join Date: Oct 2009 Posts: 176

Quote:

Originally Posted by alkimia

You posted that video earlier in this thread. I tried an application similar to his AA hand in the thread below (p. 2). I would appreciate your thoughts on whether I applied the theory properly.

https://forumserver.twoplustwo.com/1...o-btn-1743150/

Quote

05-13-2019 , 04:59 PM

#24

pot_committed

centurion

Join Date: Oct 2009 Posts: 176

I realized there is an error on one of the slides in the video at 45:50. His calculation should be 1/(1.5+1). I initially miscalculated on the other thread and had to go back and edit. If you use the wrong calculation you get a higher call percentage the bigger the raise, which makes no sense.

Quote

07-02-2019 , 08:17 AM

#25

PolarFish

stranger

Join Date: Jul 2019 Posts: 1

Quote:

Originally Posted by alkimia

I've been working on this for some time and have made some real progress. So far I haven't released anything yet...

Anything to release @alkimia? I have tinkered a tiny bit myself, but probably far from the depth you have. Did you end up using Python, JS or a combination?

Quote

Post Reply Subscribe

...