HU NLHE: Bot beats pros(?) for 49 +/- 4 bb/100 - Page 5 - Poker News

Yikes! People who question the strength of Deepstack might want to have a few games against Slumbot. I beat the old version over a meaningless sample of random button-clicking, but the 2017 AI seems much stronger. My understanding is that the only EV winners on the leaderboard for more than 5k hands are other bots. If "marcus" is human, I would like to know how he achieved his 150bb/100 winrate, because Slumbot is destroying almost everyone else.

(Eric Jackson here, author of Slumbot.)

Yes, sadly, this is true (Slumbot losing to a 10 line script in the ACPC). Just a few comments though:

* I think the core Slumbot strategy computed in the usual way (i.e., CFR) is strong and would not lose to any 10 line scripts. However, last minute post-processing steps I implemented created weaknesses I was not aware of. I've only glanced briefly at the logs of user "marcus", but I think he may be exploiting the same vulnerability that the Alberta script is.

* This does confirm the point that head-to-head performance can be a noisy performance measure. The 10-line script can thrash Slumbot, but will no doubt get crushed by other agents that may themselves lose soundly to Slumbot.

* Having said that, this is kind of an extreme case where Slumbot had a vulnerability, and there was an agent designed to take advantage of that hole. In general, when non-adapting equilibrium agents have faced off in the ACPC, things have behaved a little more predictably. If A beats B by 20 mbb/g and A beats C by 40 mbb/g, then *probably* B will beat C.

* Still, I don't think we can know whether Libratus or Deep Stack would win head-to-head without actually having a match.

Quote

03-05-2017 , 02:40 PM

#102

Derp!

grinder

Join Date: Jul 2014 Posts: 687

Quote:

Originally Posted by egj

(Eric Jackson here, author of Slumbot.)

* This does confirm the point that head-to-head performance can be a noisy performance measure. The 10-line script can thrash Slumbot, but will no doubt get crushed by other agents that may themselves lose soundly to Slumbot.

Very interesting.

Quote

03-05-2017 , 03:53 PM

#103

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Thanks for posting, Eric. FWIW, I think it's amazing that your independent bot can compete pretty well against university teams with access to more resources.

Quote

03-05-2017 , 03:56 PM

#104

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by egj

I'd just like to make it clear that the point of the 10 line bot is *NOT* to pick on Slumbot. Eric has consistently made each year's version of Slumbot be a strong agent -- that's why we aimed towards it.

We were trying to demonstrate that a) using abstract versions of the true situation might create potentially easy-to-exploit problems, and b) head-to-head play is not a good way to evaluate exploitability. Slumbot 2016 was not a weak target that was singled out, it was intended to be a strong target that was representative of a problem. It's unfortunate that it might have also hit a post-processing step

Even an agent with exact knowledge of the cards (i.e. a multi-terabyte compressed strategy) can be beaten by at least 100 bb/hand, just by making actions on the flop that the agent doesn't properly understand. It might be a hard hole to find and exploit, but abstracting away the exact situation makes a big hole. Both Libratus and DeepStack have done something new to fill in that hole, in different ways -- and measuring the success of that won't happen 1-on-1.

Quote

03-07-2017 , 01:45 PM

#105

samooth

veteran

Join Date: May 2009 Posts: 3,350

Quote:

Originally Posted by Lifrordi

I disagree that it's artificial.
I can't see how the most naive translation would not be easily exploitable.
In the example you gave, if the SB opens to 4.5$, we need to map it to either open to 4$ or to 5$, which is an error of 0.5$ a (or 25BB/100).
The error will grow significantly after next player actions in the hand.

Furthermore - while differences between bets can be small, DeepStack computes an unique strategy for any situation possible.
While it can be argued this is not necessary, I don't think that game size is artificial.

I appreciate the reply but your perspective confuses me. From the Deepstack paper:

Quote:

The imperfect information game HUNL is comparable in size to go, with the number of decision points exceeding 10^160 (13).

(13) is a reference to Johanson (2013), the paper that i pointed out above. Johanson clearly shows that modelling HUNL requires an arbitrary decision on how to model bet sizes. Academic researchers follow a convention, which changed from 2009 to 2010 from 10^49 to 10^160 in size -- a drastical difference given that both game trees describe 200bb HUNL. He even chose the word "inflate". That sounds artificial to me given that there is no clear-cut scientific argument for modelling it as the current ACPC convention.

I already covered your counter argument of raising to 4.5 at 1-2 with 400 stacks with making the same point using 5-10 with 2k stacks in my previous post. 5-10 with 2k stacks has a much much lower game size than your convention, but the differences in optimal strategies would be very small and exploitation of a naive translation technique would be infinitesimal. The only spots where going from being able to bet up to 1/10th of a bb to being able to bet up to 1/100th of a bb would lose some EV by cutting out optimal bet sizes that are in between are preflop around the optimal raise size regions and postflop spots with high stack-to-pot ratios -- but the overall lost EV would likely be lower than the lost EV due to numerical stability that can be found in most modern solutions as you pointed out.

I'm not sure how you define error in your example, but the way you put it kind of implies that mapping 4.5 onto 5 50% of the time and 4 otherwise would yield an overall EV loss of 25bb/100 for the bb -- that is definitely incorrect, it would be much lower than that (25bb/100 would prolly be higher than the equ game value for btn). Not sure if you meant it that way, but I can't see any logic supporting that idea if you did.

Using the current ACPC convention and comparing 200bb HUNL to Go without explaining the convention properly seems like an overstatement to me. Take another perspective: Hypothetically, if another independent research team that doesn't need to follow academic conventions came up with a similiar bot, and published the results on their private website claming the have a very close to optimal bot for 200bb HUNL, the largest game ever, far larger than Go, since they model it at 500-1000 with 200k stacks with game size of 10^1367 (number may not be accurate

), wouldn't that strike you as odd, or even unscientific? If someone makes a simliar argument of that game size being an overstatement and the researchers would counter with "well in your abstraction, if btn opens to 2.0005bb and we need to map it onto 2.001 or 2bb, I can't see how that would not be easily exploitable", would that be a satisfying answer?

Another perspective: In online poker, one can usually bet up to 1 cent -- stating that a $1-$2 online cash game with Otb, Sauce and

Quote:

Originally Posted by Lifrordi

Dough Polk

would be way way less complex than the same lineup at $100-$200 (disregarding rake and motivational effects ofc) would be ridiculous.

The way this should be handled is that academic papers should state that the size of 200bb HUNL is around 10^160 -- followed by a footnote that briefly explains that modelling bet sizes (and thus the game size) is not a clear-cut decision and follows a convention, along with some motivation of why the current convention has been chosen in the first place.

Alternatively, or even better additionally, you can add a short discussion of how the traditional measure of game size (counting decision points or info sets) may not be the best way to compare the complexitiy of different games, such as poker and Go and chess. One might need another measure to better capture strategic depth and therefor complexity as games with theoretically continuous action spaces can be modelled arbitrarily, and games with lots of easily detectable (weakly) dominated strategies in the early branches are actually easier to solve than their game size suggests (such as HUNL, where such strategies can be found at the first decision point in the tree, poke at Libratus' play).

Why did the ACPC choose 50-100 with 20k stacks in the first place? While your example might have been part of the motivation to change the 2009 format to have a finer preflop abstraction, why would one ever model it as such a big problem when the added value is so small compared to how it blows up the game tree -- it seems counterintuitive and inefficient to me. The goal of every model is to capture as much as possible without blowing it up; seems like the ACPC just did the latter. Given the timing of the Go and HUNL AI vs human matches, and 160 being a number that is very close to 170 I guess it's a really nice sell for poker researchers -- to me it just seems inaccurate and artificial.

Quote:

Originally Posted by Lifrordi

I believe the incentive structure was chosen quite well.
What structure would you suggest that you think would be more appropriate? [...]

I'm not the right person to ask since I don't know a lot about experiments. I always thought that it's odd that play vs humans is such a highly accepted benchmark/test for optimal strategies in HUNL in the first place. There are a lot of potential problems with choosing human play over limited sample sizes as a benchmark. Figuring out better ways to estimate exploitability for new solutions seems like the more natural and scientific way to go.

Regarding the argument that head-to-head performance has "very little scientific value": I understand that a bot can mainly exploit an implementiation error, or whatever there is to exploit, and thus skew the results, but how can that statement be true and not be applied to the AI vs human matches, which are also head-to-head perfomance measures? I don't see the theoretical agrument why the same problems can't occur with human play. Given that both bots are designed to be essentially hole-less, with Deepstack's "intuition" and Libratus being able to endgame-solve and learn new bet sizes for the early streets overnight, is head-to-head performance really that likely to be skewed? If I got the argument wrong, please elaborate.

How can a measure like LBR that is designed to give a lower bound of exploitability give out negative numbers in the first place? I would really appreciate a short explanation, it's tough to understand from the papers. It seems that the LBR results are interpreted in a way that suggests that Deepstack is much closer to optimal that any other bot -- why can't the alternative interpretation that LBR is a flawed measure for exploitability be true?

Thanks for reading, I know it was a long post.

Last edited by samooth; 03-07-2017 at 01:56 PM. Reason: typos

Quote

03-09-2017 , 01:15 AM

#106

Isidiop

grinder

Join Date: Apr 2009 Posts: 456

How is it even possible to lose 49bb/100. Just click fold 2 all bets and you probably lose around 70bb

Quote

03-09-2017 , 08:24 PM

#107

AxeGrinder

centurion

Join Date: Apr 2008 Posts: 114

Is Slumbot supposed to be strong? I played 500 hands with it and wasn't impressed. Obviously I know that is a small sample but you get an idea of how strong an opponent is from 500 hands and I set a new record for 500 hands minimum.

I didn't play my A game and made some silly mistakes in some hands but Slumbot seemed to make some greater mistakes and some very strange plays.

Quote

03-10-2017 , 11:16 AM

#108

AxeGrinder

centurion

Join Date: Apr 2008 Posts: 114

Also, could someone explain the what the baseline metric means exactly?

Quote

03-10-2017 , 02:39 PM

#109

egj

grinder

Join Date: Sep 2004 Posts: 673

Quote:

Originally Posted by AxeGrinder

Is Slumbot supposed to be strong? I played 500 hands with it and wasn't impressed.

The ten-line script mentioned above and the performance of user "marcus" suggests that Slumbot has some serious holes. That said, I think it performs well against opponents who don't identify its particular vulnerability. Slumbot 2017 beats last year's version of Slumbot, and that version was second at the Annual Computer Poker Competition.

500 hands is really too small a sample size to conclude anything from.

Quote:

Originally Posted by AxeGrinder

Also, could someone explain the what the baseline metric means exactly?

Here's the explanation from the "Help" tab on the website:

Quote:

The baseline statistics compare how you did to how Slumbot would have done with your cards in your seat. A positive value for baseline earnings mean you did better than Slumbot would have done against itself given your cards and sitting in your seat.

Here's an example with a single hand. First Slumbot deals out cards for the board (let's say AhTd4c9s3h), the button (let's say AdKh) and the big blind (let's say TsTc). Then Slumbot plays itself on these cards. Let's say that the players get all-in (since they both have strong hands). So the result is -20,000 for the button.

Now you get to play Slumbot on these same cards. Let's suppose you're the button. Perhaps you put in 8,000 chips on the early streets but manage to fold to a large bet on the river. Your outcome is -8,000 for this hand.

Your baseline outcome is how much better (or worse) you did than Slumbot did against itself. Your baseline outcome here is +12,000 which is the difference between -8,000 and -20,000.

Your baseline statistics should better reflect the skill with which you play. They eliminate much of the luck due to getting good or bad cards by measuring your relative performance to another player playing with the same cards.

Quote

03-10-2017 , 05:57 PM

#110

AxeGrinder

centurion

Join Date: Apr 2008 Posts: 114

Quote:

Originally Posted by egj

I know that 500 hands is not considered to be statistically significant and obviously doesn't mean that I would be able to continue to win at that winrate over thousands of hands. At the same time, any experienced player is able to gauge the strength or lack of it from that many hands. It does several things that weak players do. Unless it somehow changed it's approach, I would gladly play it as a "fish." There are human players with whom after 500 hands or less that I don't feel that I have a big enough edge so I would prefer to not continue with that opponent.

I have a large minus baseline, I presume according to the bot, I am running hot or does it mean I am playing way different then it would? There was only one hand that we got it all in before the river and I managed to hold against a draw with trips. I am far ahead in showdown and non-showdown pots albeit it's only 500 hands. Nevertheless, I feel strongly that I would continue to beat it substantially if I played it several thousand hands.

Quote

03-16-2017 , 04:38 PM

#111

nbard

stranger

Join Date: Mar 2017 Posts: 2

Quote:

Originally Posted by nburch

If you look at the raw data in the ACPC directory, you should see all 45,037 hands. We did not delete any hands: it would be unacceptable to not report all hands played. I have just downloaded a fresh copy from the Science page. There are 22,571 hands played as the small blind, and 22,466 hands played as the big blind.

If the converted files don't work or look very strange... that's the conversion, not the matches. I think the converted files in PokerStars format have only ever been tested in Hold'em Manager 2, because that was the only free trial version of a tool that we had available to test. All hands seem to show up there. It looks like there are other free / trial version of other tools out there, and we're looking to see what information might be incorrectly converted into the PokerStars format.

EDIT: it looks like I was already beaten to this response.

Hi everyone,

I am another member of the DeepStack research team. We followed up on comments earlier in the thread about possible issues with the converted PokerStars match logs and found a couple issues. Most notably, the conversion for hands where the small blind immediately folds caused errors with PokerTracker4. This explains why morilka observed a considerable difference in the number of hands where DeepStack was SB vs. BB. There were also an error that resulted in the pot being awarded to the wrong player on a small number of hands. These issues have been addressed and the logs appear to be importing correctly into Hold'em Manager and PokerTracker. Please note that this only affected the PokerStars formatted logs, and not the study's official logs using the Annual Computer Poker Competition format.

Updated logs are available on deepstack.ai. Note that the update is still pending on the Science website, so using the link posted earlier will (currently) just get you logs with the same errors.

Thanks to the 2+2 community for their feedback and helping us resolve the issue.

Cheers,
Nolan

Quote

03-16-2017 , 06:07 PM

#112

nbard

stranger

Join Date: Mar 2017 Posts: 2

In addition to the match logs, people might be interested in the AIVAT summaries that are part of the .zip file. AIVAT (the Action-Informed Value Assessment Tool) is the newest technique coming from a decade of research, and it lets us evaluate play using 20 to 50 times fewer hands.

Having an objective measurement of poker skill has been a central goal of both the Computer Poker Research Group, and our partners at the International Federation of Poker. We are excited about AIVAT's power to evaluate our work on AI agents, and hope that AIVAT can help players evaluate and improve their strategy. The IFP provides a nice illustration of how AIVAT can be used to evaluate play though an in-depth examination of several hands between DeepStack and Martin Sturc at http://matchpoker.net/deepstack-ai/.

Cheers,
Nolan

Quote

04-13-2017 , 09:35 AM

#113

samooth

veteran

Join Date: May 2009 Posts: 3,350

still hoping for an answer Lifrordi.

Quote:

Originally Posted by getmeoffcompletely

Someone like Laak would have a difficult time beating 100NL online.

confirmed. thx for the link bromite

Quote

04-13-2017 , 10:03 AM

#114

SuperSwag

Pooh-Bah

Join Date: May 2012 Posts: 5,765

Laak doesn't need to waste his time playing online LOL. You guys like to make fun of him but he is still a math genius. Would actually like to see him play the bot on a live stream.

Quote

04-13-2017 , 08:50 PM

#115

getmeoffcompletely

old hand

Join Date: Aug 2012 Posts: 1,670

I actually imported the hands and went through the bot's stats and some of the bigger pots just to see how the bot won/lost. There is a LOT of random button clicking and spew from the human players. The overall level of play from the humans is extremely poor. With that being said, the bot played way better than I expected. It has a very reasonable strategy and it's no surprise it crushed all those button clickers. Personally I would be very interested in seeing the bot play against some real competition.

Here's some bot stats

Quote

04-18-2017 , 07:49 PM

#116

Don't mind me

grinder

Join Date: Mar 2009 Posts: 520

60% WSD? Supports your position that the humans were playing very poorly.

Quote

04-19-2017 , 04:16 AM

#117

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Quote:

Originally Posted by getmeoffcompletely

Personally I would be very interested in seeing the bot play against some real competition.

The Twitch account has some potential (https://www.twitch.tv/deepstackai) but no HUNLH specialists appear to have played it so far, and they don't seem to be archiving all the games.

Quote

04-19-2017 , 02:57 PM

#118

KINGDMER

See my staking forum thread

Join Date: Jun 2013 Posts: 1,002

40and7

Quote

06-05-2017 , 08:22 PM

#119

quhmk

stranger

Join Date: Jun 2017 Posts: 5

Some of the strange plays made by the bot and by the human can be due to the fact that the stacks reset after each hand. If this did not happen, then the bot (and the human), is going to be much less likely to go all in early.

Is this how the annual computer poker tournament works? Because it would seem to alter the strategy significantly.

Quote

06-05-2017 , 08:50 PM

#120

quhmk

stranger

Join Date: Jun 2017 Posts: 5

Quote:

Originally Posted by FreshThyme

Live tournament players are generally awful compared to any online cash game player that wins, so not sure what difference that makes.... (I said generally, there is obviously a few exceptions)

Why is there a skill discrepancy between online and live players? If the online players are much better, wouldn't they venture into live games to pick up that difference?

Quote

06-07-2017 , 04:24 AM

#121

Singasong2222

banned

Join Date: Oct 2016 Posts: 1,243

Do bots use HUD? Never understood that a bot can beat a human, so basically if you just follow a chart and know how much to bet call raise you are unbeatable?

Surely a human could memorise a chart preflop and flop play as all bot can do is just play random and now right charts

Sites should have a bot lobby were you can play house bots and your not allowed play them for more than a hour session due to bots have advantage after hour due to human. Nature

Every laptop and tablets and most phones have camera why has stars never promotion web games face games like 888? Fish would have fun and the cheating aspect is going be much lower

No one would feel at ease unless it was rolled out in big promo, how much more won would table be with live banter at tables

Saying that there be to many men just showing there penis.

Quote

06-07-2017 , 04:42 PM

#122

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Quote:

Originally Posted by Singasong2222

Presumably you think autonomous vehicles just drive 'randomly' based on Google maps. ¯\_(ツ)_/¯
Pro-tip: Read the thread (and others about solvers, game trees, equilibrium strategies etc etc.). The best bots have learned how to play the streets, and can negotiate all the traffic furniture (board textures, bet-sizes) they encounter.

Quote

06-07-2017 , 04:44 PM

#123

sump

veteran

Join Date: Jan 2010 Posts: 3,136

Quote:

Originally Posted by Singasong2222

Google machine learning.

Quote

01-09-2019 , 12:56 PM

#124

helpmeinvest

enthusiast

Join Date: Apr 2017 Posts: 53

I have been a small breakeven/losing player for over a decade, is there any way I can contact the university to the strategy? I'd love to study this thing and finally start making money at poker lol