Quote:
Originally Posted by alkimia
GTO is short for "Game Theory Optimal". Game theory is a branch of mathematics where a game is defined as any interaction between two or more rational individuals and was traditionally meant for zero sum games specifically although it can be used in non-zero sum games.
Poker is a sequential, imperfect information, zero sum game that is usually represented by a decision/game tree. Poker is an imperfect information game so parallel sub games affect how we should be playing in the sub game we're currently in. For example, in the context of a real world situation you should be thinking about all of the hands you would have in a particular situation and how your current hand ranks within that range relative to the current situation(board,etc).
The current state of the art for this problem is counterfactual regret minimization. This is where we assume two or more players playing a rational strategy. Lets consider a simple game like paper, rock, scissors where the "GTO" strategy is to choose each randomly 1/3rd of the time. The way the agent learns is by taking actions and observing both their regret and utility. The final strategy after many iterations is the normalized sum of the utility. During the learning process the agent minimizes the regret. Here is a simple example written in javascript: https://pastebin.com/2tr4jNGZ
Here is a fantastic video from one of the lead researchers on the libratus project: https://youtu.be/McV4a6umbAY
Thanks for the valuable background info. I thought the original request for "the equations" for a "GTO solver" was a bit vague, but maybe not to you guys.
A couple of comments about your R, P, S example. The first is that the GTO strategy is zero EV. That is kind of the point of a GTO strategy in a zero sum game; both "rational" players ultimately adopt the same strategy so they are each unexploitable.
The GTO strategy is not just zero EV against another rational player who uses the same strategy; it is zero EV against ALL strategies, even the dumbest. It is zero EV against a player who chooses rock every time.
A pure GTO strategy is similarly zero EV in poker (negative EV with the rake), so the value of studying GTO strategy is:
a) to detect when opponents deviate from GTO so we can exploit them, and
b) to understand when we deviate from GTO so we are not exploited.
In any model design I always believe in starting with intended uses. I appreciate your earlier explanation that you are looking for an all in equity estimator. That is more meaningful to me than "the equations for a GTO solver". If the "intended use" is insight into theoretically perfect play for research and advancing the state of play at the highest level, I am all for it. Otherwise I would suggest proceeding with extreme caution.