Alberta university Poker 'bot "solves" heads up limit hold 'em - Page 2 - Poker Theory

It's not an argument, it's a fact. The game is not yet solved.

There's no approximation in math. If I say x = 1 then it means that if you claim x = 1.0000000000001 then you are absolutely wrong.

Quote

01-10-2015 , 07:40 PM

#27

tuccotrading

old hand

Join Date: Oct 2006 Posts: 1,373

There are some good posts and discussion on this program in the NV&G forum:

http://forumserver.twoplustwo.com/29...-time-1502189/

... incase anyone missed that thread.

Quote

01-10-2015 , 09:38 PM

#28

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by Karganeth

It's not an argument, it's a fact. The game is not yet solved.

There's no approximation in math. If I say x = 1 then it means that if you claim x = 1.0000000000001 then you are absolutely wrong.

Just wandering over from the other thread...

It is not an exact solution, and the paper clarifies that: the game is "essentially weakly solved."

In practice, people use approximation algorithms and (almost) no one bothers to find an exact Nash equilibrium in imperfect information games: it's still poly-time, but it's too slow for anything but tiny problems, so no one bothers*. If no one generally bothers with exactly solving, just going for "good enough" instead, why bother specifically for poker?

It's hard to publish something and just say "eh, it's good enough" without having some reasonable attempt at an objective measure. "Essentially weakly solved" is a statement that an approximate equilibrium is statistically indistinguishable from an exact equilibrium in a human lifetime of play. We assumed a worst case of a human player that is somehow able to play an exact counter-strategy without mistakes, 200 hands an hour, 12 hours a day, every day without breaks, for 70 years. There's still a better than 1 in 20 chance the the bot would be ahead at the end.

As for the title, saying Hold'em is solved... "Solved" is already an umbrella term. Even in perfect information games, "solved" can mean three slightly different things. Move to imperfect information games, and not everyone is satisfied with a Nash equilibrium (what about those lines where opponents make mistakes and we don't punish them?!) Some people still just care about the game value. Just saying "solved is only and always finding an exact Nash equilibrium" is well defined, but it IS a bit awkward if it leaves you saying no one really cares about solving games.

Given that an approximate Nash equilibrium are the common use case, we're also making a bit of a push here to have that common use case pushed into the umbrella term "solved", rather than wasting space on longer titles that need to be clarified anyway. Any time you see the English word "solved", you should already be asking exactly what was done. Forcing the common use case of finding a good enough approximation of a Nash equilibrium to use a longer phrase isn't worth it.

* It's somewhat interesting to note that you can use approximation algorithms to exactly solve two player constant sum games, if all the payoffs are rational numbers. There will be at least one equilibrium strategy with rational action probabilities, and an upper bound on the maximum precision needed can be computed ahead of time. Find an approximation that's at least that good, and you actually have an exact solution.

Quote

01-10-2015 , 09:45 PM

#29

NewOldGuy

Pooh-Bah

Join Date: Mar 2009 Posts: 5,935

People are confusing "solved" with "never loses". If two perfect gto bots play each other each one loses 50% of the time. If they play many times the results will eventually converge to breakeven.

Quote

01-10-2015 , 11:03 PM

#30

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

No, I don't think people are making that mistake. They're saying "solved" means "unexploitable" not "has a maximum exploitability > 0"

I don't really care but there's no reason to mischaracterize the point. Qualifying the statement, such as saying "weakly" or "nearly" solved is fine. Calling it "solved" is probably not really pedantically true.

Quote

01-10-2015 , 11:23 PM

#31

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,078

For the sake of poker publicity we ought not to quibble.

Quote

01-10-2015 , 11:49 PM

#32

Karganeth

old hand

Join Date: Jun 2008 Posts: 1,305

Quote:

Originally Posted by nburch

Just wandering over from the other thread...

It is not an exact solution, and the paper clarifies that: the game is "essentially weakly solved."

The paper should say almost solved, not essentially solved.

Quote:

In practice, people use approximation algorithms and (almost) no one bothers to find an exact Nash equilibrium in imperfect information games: it's still poly-time, but it's too slow for anything but tiny problems, so no one bothers*. If no one generally bothers with exactly solving, just going for "good enough" instead, why bother specifically for poker?

Well if you want to claim poker is solved then solving the game would be a reason to specifically bother for poker.

Quote:

Originally Posted by RustyBrooks

I don't really care but there's no reason to mischaracterize the point. Qualifying the statement, such as saying "weakly" or "nearly" solved is fine. Calling it "solved" is probably not really pedantically true.

Weakly solved has a very specific meaning in game theory.

Quote

01-11-2015 , 01:10 AM

#33

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by Karganeth

The paper should say almost solved, not essentially solved.

Seems a bit presumptuous to say we should use a different name for a specific concept that had no name. If you don't like the particular distinction we made, don't use it. Come up with your own approximate solution concept, and give it a name.

Quote:

Well if you want to claim poker is solved then solving the game would be a reason to specifically bother for poker.

I was pointing out that "solved" is already an imprecise term for perfect information games, so trying to claim it has exactly the specific meaning "found a Nash equilibrium" for imperfect information games doesn't seem like an automatic truth. At the very least, we disagreed with that.

"Solved" gets used to mean "found a Nash equilibrium". It also gets used to mean "found a good approximation of a Nash equilibrium" (usually with the approximation quality being completely ignored.) We don't think either meaning should automatically be assumed: "solved" should be treated as nothing more than an imprecise umbrella term.

So, HU LHE is "solved" and, more precisely it's essentially weakly solved, the paper lays out exactly what we meant by that, and exactly why it satisfies those criteria. If you're complaining that the title didn't tell you everything you needed to know, we're just going to have to disagree.

Either way - this seems like a fruitless derail. Bottom line, HU LHE strategy found, average worst-case exploitability across both seats < 1 big blind / 1000 hands. Used a modified, improved version of CFR, no sampling.

Quote

01-11-2015 , 01:21 AM

#34

NewOldGuy

Pooh-Bah

Join Date: Mar 2009 Posts: 5,935

Quote:

Originally Posted by NewOldGuy

People are confusing "solved" with "never loses". If two perfect gto bots play each other each one loses 50% of the time. If they play many times the results will eventually converge to breakeven.

Quote:

Originally Posted by RustyBrooks

No, I don't think people are making that mistake.

It's what the original post in the thread says, that the bot should "never lose." So someone is.

Quote

01-11-2015 , 03:32 AM

#35

200zoomgrinder

journeyman

Join Date: Jan 2014 Posts: 366

Mentioned to someone the other day that I wished the NVG thread for this had been started in Poker Theory instead so I didn't have to wade through a bunch of useless posts to find the good ones. Somehow this one ended up significantly worse...

(Congrats UofA guys, you guys are pretty amazing)

Quote

01-11-2015 , 04:39 AM

#36

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

It would be unwise to not consider what happened an essential nontrivial victory towards the solution (or some 99.99% the real thing ). Congratulations. My question is why do you not go for the exact solution while at it. Is it a hard drive space issue?

Also people might be able to appreciate the difference better if they were told in what way could the near solution that has almost converged to the exact one differ from it in some extreme example. Like can you expect the real thing and the approximation to play a particular very rare spot radically different or is it true that all plays are almost the same on the same runouts for boards and opponent action, pot sizes. Is the difference a mixed strategy difference (ie i raise x% of time vs call y%or a pure strategy difference where a particular hand is raise always in some river and a call always in the real solution?

People would tend to see a difference in my opinion only if the one solution raises and the other calls 100% at that spot. They would not see a real difference if one convergent solution was 55% raise 45% call and the real one was 54% raise and 46% call.

By the way how does one know what the real solution is in some selected specific spot if one wanted to compare?

Last edited by masque de Z; 01-11-2015 at 04:44 AM.

Quote

01-11-2015 , 07:39 AM

#37

petjax

banned

Join Date: May 2011 Posts: 802

The biggest advanatge Cepheus has over its human counterparts, however, is that for two months straight it played 24 trillion hands per second - meaning that it has already played more poker hands than have ever been played in the whole of human history.

Well i don't think that beating a opponent that has played this many hands and! remembers every single one, is an option is it? lol

24 trillion!!!!!! hands a second!!!!!!!!!!!! for 2 months!!!!!! oef that is practice right? lol

Quote

01-11-2015 , 08:13 AM

#38

Wolfram

Carpal \'Tunnel

Join Date: Jan 2006 Posts: 15,066

Quote:

Originally Posted by petjax

this

+it never messes up odds calculations
+it knows exactly how badly it played the hand if it had a perfect opponent
+it has no ego, no cognitive dissonance, no selective memory messing with how it perceives results

...yet some people think a human can beat it. Delusion is a powerful drug.

Quote

01-11-2015 , 12:01 PM

#39

RustyBrooks

Carpal \'Tunnel

Join Date: Feb 2006 Posts: 24,647

Quote:

Originally Posted by masque de Z

My question is why do you not go for the exact solution while at it. Is it a hard drive space issue?

Their method converges to the optimal solution but that doesn't mean it will actually reach it. Consider that 1/x converges to 0 but only reaches it at infinity. And, the closer you get to your convergent value, the slower it converges.

So if it took them a month to get to -.1bb/100it might take them months more to get to -.05 and so forth. At some point you just call it a day and publish (they may still be running it to converge even tighter, or they may not. Consider that to a mathematician, 2 different "essentially solved" strategies might be equally interesting, provided neither of them is actually solved.

So, tl;dr, the issue is time, most likely.

Quote

01-11-2015 , 12:08 PM

#40

NMcNasty

Pooh-Bah

Join Date: Feb 2004 Posts: 3,990

Quote:

Originally Posted by nburch

I was pointing out that "solved" is already an imprecise term for perfect information games

I think in general do actually consider "solved" to be a precise term which refers to a precise mathematical solution. I feel like your comments may mislead people to believe that a precise, exact solution in which no approximation is performed whatsoever is not possible. We can open up game theory texts or Mathematics of Poker to see what those types of solutions look like. So if computationally arriving at an approximate solution is adequate for you're definition of "solved" then you need a term to describe the exact solution.

Exactly solved?
Mathematically solved?

It all just seems redundant.

Quote

01-11-2015 , 12:13 PM

#41

NMcNasty

Pooh-Bah

Join Date: Feb 2004 Posts: 3,990

Quote:

Originally Posted by David Sklansky

For the sake of poker publicity we ought not to quibble.

I mean don't get me wrong I think this is a huge accomplishment and worthy of publication, I just have a problem with the idea of reading the exact same headline a year from now and having both of them be correct.

Quote

01-11-2015 , 01:59 PM

#42

whosnext

Carpal \'Tunnel

Join Date: Mar 2009 Posts: 6,732

Quote:

Originally Posted by RustyBrooks

My uneducated guess is that the computer's strategy has not changed in the last month or so of "learning" from the billion billions of hands it has played. So the researchers calculated the win rate, saw that it is not appreciably different than zero, and declared victory.

They could continue running their learning program to hopefully converge further to a solution (with slim chance of further learning) but it probably takes costly resources (man-hours, computer time, etc.) so they have decided to call it a day.

As others have pointed out, there are presumably multiple solutions and we don't know how different their associated strategies might be. I would hazard a guess that should the entire learning process somehow be replicated (re-starting from scratch) and a different solution emerge, we would not be able to tell the difference.

Quote

01-11-2015 , 04:11 PM

#43

QuadZeros

enthusiast

Join Date: Dec 2012 Posts: 81

Quote:

Originally Posted by masque de Z

So how can 2 human players join forces to defeat the computer in a 3way without ever communicating cards but only playing in a way that the group has the advantage and they split the profits?

The Thinking Poker podcast had an interview with two members of the Alberta team where they touched on this issue for 3-handed play. Apparently, the big annual poker bot competition includes a 3-handed competition.

IIRC, the Alberta guys said that "Player A" can choose to make it basically impossible for even a (close to) GTO opponent directly to his right at a 3-handed table to win. It requires Player A to play a losing strategy, but the benefits will go to third player and the two players will win on aggregate.

I can't recall if this competition was limit or NLHE. Regardless, this has scary implications for even subconscious collusion between players sharing action at a short-handed table.

Here's the link to the interview. It's episode 79 in case the link doesn't work:
http://www.thinkingpoker.net/2014/05...esearch-group/

Quote

01-11-2015 , 08:39 PM

#44

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Quote:

Originally Posted by QuadZeros

Thanks for that link, i will look over it. In my original idea i considered a table of eg 9 players that one was a world class player and the others just top winning players but unable to beat him and the game plays for days until they recognize the problem that they lose to him reliably beyond statistical noise and they decide to join forces each on their own as a rational decision to stop the problem because only these 8 have an incentive to do that clearly. So the system naturally converges to that behavior if AI bots were the players involved (not humans with inferior logic lol) and one of them was the best programmed one. Without telling them anything they gradually converge to collusion. But i still need to see a rigorous description of what EV comes out of it in some examples to be further convinced. I think in some tournament examples with ICM i had worked out some funny cases that this was true in more than 2 players.

I want to really understand better what the real conditions for Nash equilibrium existence in more than 2 players are. It might be possible to argue that the spontaneous collusion of the 2 violates the Nash equilibrium existence theorem conditions (no cooperation). But i am not yet clear on this ie that the 3rd player doesnt have a response strategy or even that Nash solution for 3 players eliminates the collusion because the theorem conditions are still met (the cooperation or not part is not yet clear to me how its properly defined, it might be defined as a conscious effort to lose to the cooperating player while stealing equity for them 3rd guy and then that player does the same later etc and maybe that violates the Nash conditions because instead of 3 fighting its 1+2 fighting). Another might be that they can in principle on occasion signal card content by the way they bet after the target player has acted letting the other guy know who has the best hand or something (information that now cannot be used by the target guy that has already acted on that stage).

Additionally it might even be true that Nash equilibrium exists also in that case of 2 vs 1 but the collusion of the 2 has introduced a deficit for the single guy that Nash can now only minimize (ie something like the equivalent of them playing from the btn always heads up kind of edge). So Nash for the 3 doesnt exist in the old sense (3way each for themselves), it exists for the only remaining non cooperative effective 2 opponents that have emerged ie the single guy and the group are now the redefined new game opponents where you are searching for the Nash equilibrium.

Ps: I want to make clear of course that if in real life a proper game started and was asked from me to not violate an honor rule that there is no implied collusion allowed, i would gladly not participate in it for the sake/integrity of the game. But in the absence of any such request for such honor code, i think (if what i suggested proves true) that its a legitimate part of the greater possible game one should always be considering as possibly running and its simply a redefined new math problem within the rules.

Last edited by masque de Z; 01-11-2015 at 08:50 PM.

Quote

01-11-2015 , 09:37 PM

#45

David Sklansky

Administrator

Join Date: Aug 2002 Posts: 17,078

Obviously if the rules allow implicit collusion against you, your best, but still losing, strategy, would change from what it would be if it wasn't allowed.

Quote

01-11-2015 , 11:49 PM

#46

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

Yes but the interesting thing is that i am essentially claiming that there are 2 Nash equilibria. One if the players play for their own self only and one if they play in groups. The game itself has not changed. Because the game doesnt exclude (at least in principle) semi bad players that do mistakes mostly towards particular people (if they cannot communicate with external signals etc). So its really interesting how the solution changes from one to the other as one starts playing cooperatively a little bit until fully doing so. I find that intriguing if true.

My point being that one would imagine that a solution shouldnt have to take into account how others play (thats the point of Nash) but now it must (unless we are wrong of course). That must be an interesting effect that happens at n>=3 suggesting that possible groups of players may emerge all with their own solution which is interesting, especially if they are now 3 vs 6 lol or funny things like that. The game has become something very interesting without anyone doing anything on purpose as a preagreement. It has experienced a phase transition possibly.

Also imagine if 3 top AI play the same game all 3 and they recognize that nobody can win anything this way so they spontaneously pick a partner and one is suddenly left out. How can you break their system now? The first move is to make an error towards someone and then that person must accept and do the same later to establish the trust. To deny that offer is to deny future profits. It all gets rather intriguing (unless again we are far off from the real result and collusion proves inferior vs the Nash solution, but it feels like there ought to be some advantages in implicit collusion so that keeps the thought interesting. Maybe if we create crystal clear examples it may become more obvious that its possible - unless speaking about it in public is bad for the game and we must halt the discussion and leave it to mathematicians/game theorists at a purist level.)

Last edited by masque de Z; 01-12-2015 at 12:02 AM.

Quote

01-12-2015 , 06:01 AM

#47

punter11235

Carpal \'Tunnel

Join Date: Mar 2005 Posts: 8,210

Quote:

The game has become something very interesting without anyone doing anything on purpose as a preagreement. It has experienced a phase transition possibly.

I remember limit 6max players complaining some time ago that if SB flats a ton vs BTN raise they are hurting both themselves and the stealer while donating equity to BB and that it sucks to be a BTN when SB is flatting a lot even though they are making a mistake.

Quote:

I was pointing out that "solved" is already an imprecise term for perfect information games

Perfect/imperfect information is artificial distinction really.
Rock/Paper/Scissors is solved. It's imperfect information game. That means you can't use "imperfect information" as excuse to not have exact solution.
"Solved" is quite well understood when it comes to games: it means that there exists an algorithm run by either computer or a human in reasonable time which:

-always gets at least theoretical value of the game for "weakly solved" and

-always exploits all opponents mistakes as well for "strongly solved"

(so checkers is weakly solved as the computer never loses but it's not strongly solved because it's not guaranteed it will win vs you if you make a mistake)

It's true that in games with imperfect information it's often difficult to decide if the solution is strong or weak or if that distinction even makes sense but the point about always getting at least theoretical amount of the game stands.

That being said I consider 1mb/hand as good as it gets. It's huge result and I think claiming "solved" for publicity reasons is justified.

Last edited by punter11235; 01-12-2015 at 06:16 AM.

Quote

01-12-2015 , 08:23 AM

#48

Wolfram

Carpal \'Tunnel

Join Date: Jan 2006 Posts: 15,066

pedants gonna pedant

Quote

01-12-2015 , 01:04 PM

#49

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by masque de Z

Yeah, things change with 3 or more players. Here's a slightly more concrete example, from Duane Szafron's work in http://webdocs.cs.ualberta.ca/~duane.../2013aamas.pdf. It takes Kuhn's 3 card poker game, extends it to three players, and describes a large space of different Nash equilibrium. Instead of a standard deck of cards, there's 1,2,3,4. Each player antes one chip, and they have two chip stacks (so there's at most one bet of one chip.) Player 1 acts first, then player 2, then player 3.

In the entire space of equilibrium, player 2 always loses the same amount. Player 1's value can vary, but is never higher that player 2's value, so player 1 also expects to lose. Player 3's value can also vary. One interesting thing is that player 2's probability of betting with a 1 or a 2 moves money from player 1 to player 3, without affecting player 2's expected value.

Imagine Alex sits down with Marcy and Ned. Marcy and Ned are planning to collude against Alex. Even if the player positions rotate, and even if they always have to play a strategy that is part of an equilibrium (remember that a Nash equilibrium is actually a collection of strategies for all players) they can manage to collude.

Code:

P1     P2     P3
Alex   Marcy  Ned     Marcy plays a P2 strategy that sometimes bets 1,2
       ->             feeding money from P1 (Alex) to P3 (Ned)

P1     P2     P3
Ned    Alex   Marcy   Alex does something, maybe helping Ned or
       ?              maybe helping Marcy: they don't care.

P1     P2     P3
Marcy  Ned    Alex    Ned never bets 1,2 and gives Marcy an easy time
       <-             increasing Marcy's value at Alex' expense.

No surprises here that as a pair, you can hurt outsiders by betting, or not. The surprise for some might be that Marcy and Ned can do this while still "playing GTO" by playing according to a Nash equilbrium (remember that a single equilibrium actually describes all three seats: when you move to three player, you can't actually mix and match strategies from different equilibrium profiles.) Marcy picks an equilibrium profile with an aggressive P2, and Ned picks one with a passive P2.

Marcy and Ned could beat Alex even more by moving from GTO, but while staying with "plausibly deniable GTO play" they can still make it a losing game for Alex. We would hope that having positions rotate around the table makes everyone break even, but it just doesn't work out. You would have to watch Marcy and Ned in other matches to see if they switch up their aggression based on who's playing ahead of the other.