Alberta university Poker 'bot "solves" heads up limit hold 'em - Page 4 - Poker Theory

Simple question: If the computer doesn't lose money to the opponent, will it lose money to the rake?

So far every book on GTO, every study and simulation seems to ignore the rake.

Depends on the opponent. If his deviation from gto is larger than the rake then the bot makes money.

Quote

01-21-2015 , 07:29 PM

#77

bachfan

adept

Join Date: Nov 2005 Posts: 888

Quote:

Originally Posted by Shandrax

Simple question: If the computer doesn't lose money to the opponent, will it lose money to the rake?

1 - HUHU LHE with rake is not the same game as HUHU LHE with no rake. Any claims about the strength of Cepheus' strategy in a raked environment are speculative at this point (I strongly suspect it would still beat (lose less) against all humans, but that's just a guess. I also suspect it would win enough to overcome the rake against a typical good-but-not-world-class $2/$4 player, but I'm less confident there.)

2 - Imagine an AI were created somehow that solved an exact nash-equilibrium for a game with rake. It has to lose money to the rake in the worst case. This becomes obvious if the "don't lose to rake" strategy faces off against itself. How would it be possible for neither player to lose to the rake?

Quote

01-22-2015 , 10:59 AM

#78

skario

enthusiast

Join Date: Jan 2007 Posts: 87

Quote:

Originally Posted by ArtyMcFly

That figure was based on their "old" method of storing the solution. Tammelin's compression method got it down from over 500 terabytes to about 12 terabytes and from a predicted 100k CPU years to around 900 (just 70 human days at the facility used for Cepheus).
As GPUs get more powerful and cheaper, and compression algorithms are improved, future "brute force" solutions to games will no doubt be even quicker.

Check out the Thinking Poker Podcast (interview starts at about 44 minutes in) for the story so far, and the team's plans for the future.

A minor correction: It was the '+' in CFR+ that really made it faster, not the compression. CFR+ is 10+ times faster than regular CFR but also, the way it works, capping regret to zero and traversing the whole tree each iteration (not doing any sampling), keeps the range of regret values limited and makes them change "smoothly" across similar boards and hole cards and that's what makes compression easy. The biggest challenge with compression was to make the ratio high while still performing reasonably well (70 days was bad enough). I think there's still room for improvement there, would be interested to hear if anyone can do better (you can find the source code online).

Quote

01-22-2015 , 11:23 AM

#79

Shandrax

Pooh-Bah

Join Date: Mar 2005 Posts: 4,818

Quote:

Originally Posted by bachfan

The reason I am asking is this: Most theory assumes "clean" conditions without a rake. It is like analyzing red/black in Roulette while ignoring the zero.

The rake is the exact reason why games dry out, so it must be taken into account somehow. Beating the Dealer is what Blackjack is all about, but nobody seems to think about the Dealer in Poker. So eventually the University of Alberta has solved homegame limit HU at best.

Quote

01-22-2015 , 03:33 PM

#80

Donkem

journeyman

Join Date: Feb 2014 Posts: 359

Quote:

Originally Posted by TakenItEasy

Is GTO proven?
I'm not saying a convergence to 0 from a positive value vs an adaptive exploitative player is enough to necessarily define a GTO state...
... but I don't see how proving GTO is possible. For instance both players could fall into the same local minima where neither is optimal yet still learning trials could appear to converge and convergence vs an opponent with the same flaw towards $0 could also be satisfied.

Yes, this. GTO doesnt exist in any form of poker imo. Unless you decide to consider multiple adaptation strategies as the GTO solution. If GTO existed in poker then players gameplay would quickly converge towards it.

Quote:

Originally Posted by Shandrax

Simple question: If the computer doesn't lose money to the opponent, will it lose money to the rake?

They say Cepheus is self-taught, however, they claim it plays GTO therefore I guess that it doesnt really adjust to a specific player that it faces but rather in terms of its whole database, if this is true then it is definetely not the mathematical nemesis, but probably a strategy that beats a big ammount of strategies, but not necessarily unbeatable.

If this is really "GTO" than Im guessing it could have good results (beat the rake) it will always be a question of how skillful the opponents are, certain opponents will lose more to GTO some more to a Nemesis, because the Nemesis might be capable of exploiting some specific spots better. Deppends on the field Id say. But imo GTO doesnt exist. People perceive temporary good strategies as a GTO strategy, but u can always adapt.

Quote

01-22-2015 , 05:50 PM

#81

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

Quote:

Originally Posted by Donkem

If GTO existed in poker then players gameplay would quickly converge towards it.

How can you be confident about that? How quick is quickly and what counts as convergence?

Quote:

But imo GTO doesnt exist.

Not even for heads-up poker? How would it be possible for every strategy to be exploitable in HU?

Quote

01-22-2015 , 06:10 PM

#82

bachfan

adept

Join Date: Nov 2005 Posts: 888

Quote:

Originally Posted by Donkem

But imo GTO doesnt exist.

There exists at least one strategy for heads-up limit hold'em with a 4-bet cap and no rake that cannot be beaten. This is not a matter of opinion.

Last edited by bachfan; 01-22-2015 at 06:16 PM. Reason: added "and no rake"

Quote

01-22-2015 , 06:13 PM

#83

kaby

Carpal \'Tunnel

Join Date: Jan 2007 Posts: 6,230

Quote:

Originally Posted by Donkem

Yes, this. GTO doesnt exist in any form of poker imo.

If you can prove this you all you have to do is go and pick up the Nobel. Go for it!

Quote

01-22-2015 , 06:53 PM

#84

Donkem

journeyman

Join Date: Feb 2014 Posts: 359

Quote:

Originally Posted by heehaww

How can you be confident about that? How quick is quickly and what counts as convergence?

Thousands of heads-up games run per day for years now, players play the same players which accelerates the learning proccess yet after so many years of online poker there's still no definitive approach to it and very little can be stated with a big ammount of certainty concerning winning strategies.

Poker is an incomplete information game with a big ammount of variables, and most of them using discrete values, which makes solution finding way harder than continuous functions, in mathematics multidimensional systems usually present multiple solutions, some problems cant even be solved using the existing knowledge, as a multidimensional system poker is unlikely to have a single solution.

Quote:

Originally Posted by heehaww

Not even for heads-up poker? How would it be possible for every strategy to be exploitable in HU?

Specially HU poker where u play most hands. U can always narrow and widen ranges around.

"How would it be possible for a strategy not to be exploitable in HU?" is more like it

Quote

01-22-2015 , 07:37 PM

#85

plexiq

veteran

Join Date: Apr 2007 Posts: 2,554

I'd like to suggest a containment thread for Nash existence skepticism posts. Please mods?

Quote

01-23-2015 , 12:37 AM

#86

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Quote:

Originally Posted by Donkem

If GTO existed in poker then players gameplay would quickly converge towards it.

Humans are not good at solving complex games, partly because they don't have the time or resources.
In chess, as in poker, there is an optimal move for every situation. Chess has been played and studied for hundreds of years, and yet humans still don't know for sure if E4 or D4 is the best opening. Chess computers also don't know for sure, because the game tree is so massive, but they can perfectly solve endgames. This is why top players do post-game analysis using chess bots: to find out where they made mistakes. (A freeware chess bot can do stuff like find a forced mate in 12 moves. Most humans can't visualize more than a handful of boards at once, in the same way that humans can't visualize every possible board runout in holdem. A program like Cepheus has a database it can look in to find the optimal move in every situation. i.e. it has the solution).

Quote:

Originally Posted by Donkem

Cepheus has played more hands (trillions!) than every human in the history of the world added together. Simply put, it learned more about poker in 70 days than humanity learned in 70 years. This means it's better than anyone, obviously. Is that really hard for you to understand or believe, or did you simply not bother reading the literature?

Quote

01-23-2015 , 07:56 AM

#87

QuadZeros

enthusiast

Join Date: Dec 2012 Posts: 81

Quote:

Originally Posted by David Sklansky

Obviously if the rules allow implicit collusion against you, your best, but still losing, strategy, would change from what it would be if it wasn't allowed.

Right, but a player at a three-handed table doesn't necessarily know that the other players are colluding.

The U of Alberta team that solved Limit Hold Em were just on the Thinking Poker Podcast again this time. This time, they went into a lot of detail about how they ran this collusion experiment: They had a simplified 3-handed Nash equilibrium LHE situation running (meaning no player could change his strategy and benefit himself). They changed one player's strategy to "always raise" and kept the other two at what was their 3-handed Nash equilibrium strategy. The "always raise" player lost a little, the player to his right lost a LOT, and the player to his left won a LOT.

Here's a link to the new podcast (episode 110). Lots of fascinating stuff that cuts through the gibberish in these 2+2 threads. The 3-handed collusion material starts at maybe 1:30 or so. They also talk about the scale of the challenge of HU NLHE (the game space is like 10^150 larger than HU LHE).
http://www.thinkingpoker.net/2015/01...-solves-hulhe/

NB: Collusion to them doesn't mean sharing cards, they call that cheating. Collusion means taking an action that is not optimally beneficial to you, but choosing that action because it is beneficial to another player.

Last edited by QuadZeros; 01-23-2015 at 07:57 AM. Reason: spacing for readability

Quote

01-23-2015 , 08:13 AM

#88

Donkem

journeyman

Join Date: Feb 2014 Posts: 359

Quote:

Originally Posted by ArtyMcFly

Humans are not good at solving complex games, partly because they don't have the time or resources.
In chess, as in poker, there is an optimal move for every situation. Chess has been played and studied for hundreds of years...A program like Cepheus has a database it can look in to find the optimal move in every situation. i.e. it has the solution)...Cepheus has played more hands (trillions!) than every human in the history of the world added together...This means it's better than anyone, obviously. Is that really hard for you to understand or believe, or did you simply not bother reading the literature?

Im not knowledgeable in chess, but FL HU for instance is supposedly simpler, 10^14 states in comparison to chess' 10^50. But yeah I guess the convergence thing is debatable.

But about the Nash push/fold equilibrium, when u r short stack and get to be forced into shoving u r pretty much reaching a boundary of the system, the variables to which values are provided by the villain play become irrelevant so the game becomes temporarily a complete info one, because u can calculate the best solution despite the villain.

An example:

U have a 1bb stack and u r on the sb. So u only have 0.5bb left, either u fold or call and be automatically all-in. U r being given 25% pot odds and r against full range, so u should call with everything because ull have at least 29% with 23o against that range. Yet this doesnt prove that u need to have a general solution when deep.

In a deep stack game by not having info on your villain the game is an undetermined system, u need to attribute values to your opponents actions (ranges) or u cant find a strategy to solve the problem. So every change in ranges changes the system and therefore the possible solutions.

So in terms of strategies, instead of getting:

GTO > A, B, C, D, E, ...

U get:

A > B, C
B > C, D, F
C > D, E
D > A
...

Quote

01-23-2015 , 09:10 AM

#89

Wolfram

Carpal \'Tunnel

Join Date: Jan 2006 Posts: 15,066

this has to be the most elaborate troll ever

Quote

01-23-2015 , 09:40 AM

#90

Donkem

journeyman

Join Date: Feb 2014 Posts: 359

Forgot to say something, just because Cepheus played more hands than a human it doesnt mean it makes better use of it. Intelligent thinking in humans will largely outclass cepheus'. In between hands a person will analyse possible strategies, possible future hands and spots and what best approaches to have, so from a certain prespective a human may learn more per hand. He may at least draw more conclusions (which for bad players wont exactly be a plus in comparison to cepheus).

Quote

01-23-2015 , 11:45 AM

#91

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Donkem,

Please visit, at the very least, the sticky on terminology. I don't think you know what a nash equilibrium is, or what people mean by GTO.

Love,
Rusty

Quote

01-23-2015 , 12:59 PM

#92

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

Quote:

Originally Posted by Donkem

...so from a certain prespective a human may learn more per hand.

Maybe, but it's irrelevant since no human alive can beat Cepheus. (Or can you name someone?)

Quote

01-23-2015 , 02:06 PM

#93

droller

adept

Join Date: Jan 2009 Posts: 929

Always disliked this group... I can't see how this could ever be good for the poker world, especially in regards to legalization, regulation and getting governments on board. All the decision makers see is that a computer program has been created to outplay humans and take all their money. Perfect....

Way to go U of A... You "solved" a game practically no one plays anymore staged in a magical fantasy world with no rake!

Last edited by droller; 01-23-2015 at 02:14 PM.

Quote

01-23-2015 , 02:23 PM

#94

skario

enthusiast

Join Date: Jan 2007 Posts: 87

Quote:

Originally Posted by plexiq

I'd like to suggest a containment thread for Nash existence skepticism posts. Please mods?

Yes, this is supposed to be the poker theory forum. Could the nonsense be moved elsewhere?

Quote

01-23-2015 , 04:00 PM

#95

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by plexiq

+1, would be very interested in details about this. Iirc it said in one of their papers that there will be a separate paper about the compression used. (I think it was in Tammelin / "Solving Large Imperfect Information Games Using CFR+".)

Fwiw, you don't really need efficient random access for CFR+. You are traversing the game tree in a very ordered fashion, so it's possible to efficiently compress/decompress and even disk swap large chunks of the tree as needed. The really impressing part was the huge reduction they achieved, I believe 500TB->11TB was mentioned in one of the sources?

@Rusty: They did canonicalize the cards/boards, but i think the compression mentioned worked on the remaining regret data.

Yup, suit-identical hands/boards were grouped together to make the game smaller. With 8 byte values this would be 262TB (~500TB for traditional CFR, as it uses two 8 byte values for each choice.) Using a single 4 byte value per choice still requires 131TB. Compression sits on top of that, getting down to 10.9TB using an average of ~2.6 bits per value.

The compression uses other boards and hands to predict values (compression is very much about prediction.) We get better compression by ordering the boards and the hands: similar hands on similar boards are likely to have similar amounts of regret for actions.

As far as random access goes, it's terrible, but as plexiq said, it doesn't matter. We don't need to worry because there's no sampling, and we visit the tree in the same order every time: decompress old values as we need them, update things, recompress, and move onwards.

Quote

01-23-2015 , 04:06 PM

#96

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by TakenItEasy

I forgot to mention that if it doesn't take "Range Removal" into account when doing its calculations, than I'd say that an identical bot that took range removal into account would have to beat it for a small amount given the slightly improved accuracy for probabilities.

Please don't confuse this with card removal when adjusting for the removal of known cards from the stub. It's based on the fact that the opponents cards are no longer random once they have acted and each action thereafter defines their range mor and more.

When removing a "range" from the stub, you end up with a deck of weighted averages that sum up to the stub-2 instead of integers which can be a hassle for factorials but for a computer, it's a trivial distinction.

Of course HU limit is the least effective application for range removal, but with all other things being equal, any added edge regardless of how small will determine the winner.

It definitely takes the opponent strategy into account. The numbers it uses are always the expected value it would get for a hand, given how the opponent plays. So, card removal because of my hand and the board, and a weighted average across possible opponent hands given the opponent strategy.

Quote

01-23-2015 , 08:53 PM

#97

statmanhal

Pooh-Bah

Join Date: Jan 2009 Posts: 4,986

Quote:

Originally Posted by nburch

It definitely takes the opponent strategy into account. The numbers it uses are always the expected value it would get for a hand, given how the opponent plays. So, card removal because of my hand and the board, and a weighted average across possible opponent hands given the opponent strategy.

I thought I was beginning to understand GTO but this statement casts some doubt. You often see statements made that GTO does not account for villain’s strategy – you can even tell villain what the GTO strategy is and it would not affect the long term outcome. RPS is the simplest example of this. So, I assumed that for every LHE information set (CPRG’s term for the hand history up to the decision point excluding what villain holds since that is unknown) there is a fixed strategy for fold, call, or raise .

Now, one of the foremost CPRG researchers tells us that the ‘essentially solved’ GTO strategy for LHE does account for opponent strategy.

Does this mean that the strategy includes some type of hand/ range reading? If so, is the strategy still fixed? If so, does that mean that the fixed strategy includes in some way the various possibilities of what villain may have? Also, if fixed, does that mean that the GTO strategy doesn’t do any learning for use in future play?

Help.

Quote

01-24-2015 , 12:02 AM

#98

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by statmanhal

Ha. Well, it's certainly possible I've accidentally misused or misunderstood the term range, but I think I can explain the apparent contradiction.

I'm assuming a player's hand range at some point in the game is, for every hand, the probability that the player would hold that hand at that point in the game.

I think the bigger problem was a sloppy use of "it." I meant the exploitability computation to verify correctness, and the CFR/CFR+ algorithm. They both have to use an opponent hand range. The end result, Cepheus, is a static strategy, and never consider a particular opponent.

The other problem is probably "opponent." Keep in mind that a Nash equilibrium is actually a set of strategies, one for each player: small blind and big blind. The strategies are their own opponents.

Any time you want to correctly compute how well you do against a specific opponent in some situation (betting, board, and hand), you DO need to make use of how likely each possible opponent hand is. In the case of computing exploitability, we're using Cepheus' hand range when choosing an action that maximises our value. In CFR and CFR+, each step forward towards a better strategy makes uses the opponent's current strategy. The big blind stragegy is updated by looking at the small blind's hand range, and vice versa.

So... all the way along of producing and checking an (approximate) Nash equilibrium, you need to consider the opponent's strategy -- it's just that the "opponent" here is the other part of the equilibrium, not some other person who might be playing it.

Quote

01-24-2015 , 12:57 PM

#99

statmanhal

Pooh-Bah

Join Date: Jan 2009 Posts: 4,986

Ok, thanks for the response.

Quote

01-24-2015 , 02:21 PM

#100

Donkem

journeyman

Join Date: Feb 2014 Posts: 359

Quote:

Originally Posted by nburch

...They both have to use an opponent hand range...The strategies are their own opponents.

Any time you want to correctly compute how well you do against a specific opponent in some situation (betting, board, and hand), you DO need to make use of how likely each possible opponent hand is...each step forward towards a better strategy makes uses the opponent's current strategy...So... all the way along of producing and checking an (approximate) Nash equilibrium, you need to consider the opponent's strategy -- it's just that the "opponent" here is the other part of the equilibrium, not some other person who might be playing it.

So basically what u have there is the mathematical nemesis. It's a perfect exploitative bot that reads the opponent's range and adapts. What u r calling GTO is actually a sum of all the strategies that beat all possible strategies in the game.

If not, then how would a nemesis be any different from this GTO?

Quote

Page 4 of 6

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 4 of 6

First

1 2 3 4 5 6

Last