Quote:
Originally Posted by statmanhal
I thought I was beginning to understand GTO but this statement casts some doubt. You often see statements made that GTO does not account for villain’s strategy – you can even tell villain what the GTO strategy is and it would not affect the long term outcome. RPS is the simplest example of this. So, I assumed that for every LHE information set (CPRG’s term for the hand history up to the decision point excluding what villain holds since that is unknown) there is a fixed strategy for fold, call, or raise .
Now, one of the foremost CPRG researchers tells us that the ‘essentially solved’ GTO strategy for LHE does account for opponent strategy.
Does this mean that the strategy includes some type of hand/ range reading? If so, is the strategy still fixed? If so, does that mean that the fixed strategy includes in some way the various possibilities of what villain may have? Also, if fixed, does that mean that the GTO strategy doesn’t do any learning for use in future play?
Help.
Ha. Well, it's certainly possible I've accidentally misused or misunderstood the term range, but I think I can explain the apparent contradiction.
I'm assuming a player's hand range at some point in the game is, for every hand, the probability that the player would hold that hand at that point in the game.
I think the bigger problem was a sloppy use of "it." I meant the exploitability computation to verify correctness, and the CFR/CFR+ algorithm. They both have to use an opponent hand range. The end result, Cepheus, is a static strategy, and never consider a particular opponent.
The other problem is probably "opponent." Keep in mind that a Nash equilibrium is actually a set of strategies, one for each player: small blind and big blind. The strategies are their own opponents.
Any time you want to correctly compute how well you do against a specific opponent in some situation (betting, board, and hand), you DO need to make use of how likely each possible opponent hand is. In the case of computing exploitability, we're using Cepheus' hand range when choosing an action that maximises our value. In CFR and CFR+, each step forward towards a better strategy makes uses the opponent's current strategy. The big blind stragegy is updated by looking at the small blind's hand range, and vice versa.
So... all the way along of producing and checking an (approximate) Nash equilibrium, you need to consider the opponent's strategy -- it's just that the "opponent" here is the other part of the equilibrium, not some other person who might be playing it.