HU NLHE: Bot beats pros(?) for 49 +/- 4 bb/100 - Page 4 - Poker News

Two Plus Two Forums Poker News & Discussion News, Views, and Gossip

HU NLHE: Bot beats pros(?) for 49 +/- 4 bb/100

Post Reply Subscribe

...

Page 4 of 6

1 2 3 4 5 6

Page 4 of 6

1 2 3 4 5 6

03-04-2017 , 01:25 PM

#76

morilka

enthusiast

Join Date: Jan 2009 Posts: 84

Quote:

Originally Posted by PrimordialAA

Where did you get these HHs? I might be accepting their challenge vs pros and would love to review them beforehand if possible

I found the link at the end of an article ArtyMcFly posted

http://science.sciencemag.org/conten...s_IFP_pros.zip

Quote

03-04-2017 , 02:49 PM

#77

getmeoffcompletely

old hand

Join Date: Aug 2012 Posts: 1,670

lol @ those hands.

This "ai" ain't beating no pros.

Quote

03-04-2017 , 03:38 PM

#78

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

> 1) AI played 20709 SBs and 16649 BBs, difference is quite significant.

In our study we played more than 44,000 hands and your numbers do not match that.
The total number of SB and BB situations was fair in our study.
The reason your numbers do not match is because poker tracker did not correctly loaded all the hands.
The original log files from the study are in the standard ACPC format (included in the supplementary material), and to make it easy for anyone to analyze the hands, we also included converted hands into the 'poker stars' format.
Unfortunately, it seems the tool did not convert all the hands properly.
We are working on fixing the issue and will inform you when we upload the corrected 'poker stars' log files.
Thank you for pointing out the issue.

> 4) given 1,2&3 some hands might've been deleted.
We never deleted ANY hand.
Even in situations where we had trouble with connections to our server and thus the agent did a fallback to check/fold, we include such hands in the study.
We believe removing any hand is unprofessional, unfair and unscientific.

Quote

03-04-2017 , 03:58 PM

#79

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

> They posted HHs and there are A LOT of questionable hands:

> lol @ those hands.

> This "ai" ain't beating no pros.

Let me elaborate on that. There can be two types of questionable/crazy hands

1) Questionable Computer Hands

1a) some hands can be surprising but good. It's hard to analyse single situations without a larger strategical concept.
For example, Libratus, AlphaGo also played quite a few times very surpising moves but it proved to be good at the end.

1b) DeepStack includes some small noise due to numerical stability, and the strategy can be in some spots under converged.
Due to this noise, there is no action with zero probability.
Thus it can very well happen that it will play with very small probability bad move.
Since the probability is small, it does not hurt the win-rate much but the move can be very surprising.

1c) If DeepStack had problems due to internet connection, it does a fallback to check/fold. In such situations, we could have done something very silly (it did so only in very few situations)

2) Questionable Human Hands

2a) It is important to point out that we are not just beating players in aggregate (like Libratus did), but thanks to our new variance reduction technique (AIVAT), we are also significantly beating all but one player who finished the requested number of hands
(the one player also lost to DeepStack, but the results did not fall into 95% confidence interval).
Thus while some players might play better or worse than the others, we beat everyone.

2b) It is not surprising that people who have some understanding of the previous abstraction based bots, try to 'poke' the abstraction in order to find the holes in it.
For example in the Libratus match, players also made some suprising moves to i) find the holes ii) utilize the holes (Libratus was trying to fix the holes in the abstraction over night, so that helped).
Since our bot is not adapting nor changing overnight, I can see that a player could have tried something crazy if he thought he found some mistake/bug in DeepStack.

Last edited by Lifrordi; 03-04-2017 at 04:12 PM. Reason: typo

Quote

03-04-2017 , 03:59 PM

#80

morilka

enthusiast

Join Date: Jan 2009 Posts: 84

Quote:

Originally Posted by Lifrordi

We never deleted ANY hand.
Even in situations where we had trouble with connections to our server and thus the agent did a fallback to check/fold, we include such hands in the study.
We believe removing any hand is unprofessional, unfair and unscientific.

I guess I was too quick with my conclusions then, thanks for putting these hands up, waiting for the fix.

Quote

03-04-2017 , 04:16 PM

#81

nburch

newbie

Join Date: Jan 2015 Posts: 47

Quote:

Originally Posted by morilka

They posted HHs and there are A LOT of questionable hands:

also
1) AI played 20709 SBs and 16649 BBs, difference is quite significant.
2) AI has a casual 61% WSD (65% on BB and 57% on SB)
3) AI has very low VPIP on BB - 61,5%, may be due to a lot of limping from humans, but also may be because its actually bad.
4) given 1,2&3 some hands might've been deleted.

If you look at the raw data in the ACPC directory, you should see all 45,037 hands. We did not delete any hands: it would be unacceptable to not report all hands played. I have just downloaded a fresh copy from the Science page. There are 22,571 hands played as the small blind, and 22,466 hands played as the big blind.

If the converted files don't work or look very strange... that's the conversion, not the matches. I think the converted files in PokerStars format have only ever been tested in Hold'em Manager 2, because that was the only free trial version of a tool that we had available to test. All hands seem to show up there. It looks like there are other free / trial version of other tools out there, and we're looking to see what information might be incorrectly converted into the PokerStars format.

EDIT: it looks like I was already beaten to this response.

Quote

03-04-2017 , 05:01 PM

#82

samooth

veteran

Join Date: May 2009 Posts: 3,350

Thanks for responding ITT. Do you think you have chosen a good incentive structure in your experiment? It seems like some of the players didn't take playing too seriously -- I understand players might take unusual lines to exploit the bot, but limp/calling 200bb with 54o doesn't accomplish anything in that regard (and that guy was one who played 3k hands as well).

Why are you poker AI researchers always bringing up the 10^160 number and the atoms comparison? I pointed out above how that number is essentially artificial. Can we agree that if we were to fully solve 200bb HUNL with 1-2 blinds and 400 stacks (which has ~10^49 info sets) and use the most naive translation technique to map the solution onto 50-100 with 20k stacks, then exploitability of the new solution would be very very small? Even if we were to alter the current convention to 5-10 with 2k stacks (and allow to bet any dollar integer amount) then the resulting game size would be much lower than 10^160 -- but the solutions would be pretty much identical!

Quote

03-04-2017 , 06:12 PM

#83

getmeoffcompletely

old hand

Join Date: Aug 2012 Posts: 1,670

Quote:

Originally Posted by Lifrordi

1) Questionable Computer Hands

1a) some hands can be surprising but good. It's hard to analyse single situations without a larger strategical concept.
For example, Libratus, AlphaGo also played quite a few times very surpising moves but it proved to be good at the end.

1b) DeepStack includes some small noise due to numerical stability, and the strategy can be in some spots under converged.
Due to this noise, there is no action with zero probability.
Thus it can very well happen that it will play with very small probability bad move.
Since the probability is small, it does not hurt the win-rate much but the move can be very surprising.

The chance that a Q4o 200bb jam over a limp is part of any sort of correct strategy is pretty much 0%, not even going to talk about the human player calling it off with 54o. Then the K8s hand where the bot calls a 3bet, calls 2 streets and value jams a bluffcatcher on a 3 straight board shows it has extremely poor understanding of relative hand strength.

If you put this bot up against the same kind of opposition Libratus faced, it would get absolutely demolished. I'm sure there's lots of people who would be willing to wager large sums of money on that.

What I'm saying is that it's very disingenuous of your team to parade the "first AI to beats human pros" headline.

Quote

03-04-2017 , 06:16 PM

#84

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by samooth

I believe the incentive structure was chosen quite well.
What structure would you suggest that you think would be more appropriate?

We engouraged players to play their best by offering monetary prices based on their winnings.
I find it hard to believe that from all the players we beat, none of them thought that playing his best over the course of only 3k hands is not worth the first place.
Keep in mind that players had a full month to finish only 3,000 hands and they could play from the comfort of their home.

As for any particular hand - it's hard to tell a believed exploit from a simple missclick?

Last edited by Lifrordi; 03-04-2017 at 06:19 PM. Reason: typo

Quote

03-04-2017 , 06:21 PM

#85

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Quote:

Originally Posted by morilka

Pre Flop: (pot: $150.00) dmit.pol has 4

Pre Flop: (pot: $150.00) DeepStack has Q

dmit.pol calls $50.00, DeepStack raises to $20,000.00 and is all-in, dmit.pol calls $19,900.00 and is all-in

Wait... what?

Was that a hand conversion error, or did the human malfunction even more than the robot?

Quote

03-04-2017 , 06:27 PM

#86

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by samooth

Why are you poker AI researchers always bringing up the 10^160 number and the atoms comparison? I pointed out above how that number is essentially artificial. Can we agree that if we were to fully solve 200bb HUNL with 1-2 blinds and 400 stacks (which has ~10^49 info sets) and use the most naive translation technique to map the solution onto 50-100 with 20k stacks, then exploitability of the new solution would be very very small? Even if we were to alter the current convention to 5-10 with 2k stacks (and allow to bet any dollar integer amount) then the resulting game size would be much lower than 10^160 -- but the solutions would be pretty much identical!

I disagree that it's artificial.
I can't see how the most naive translation would not be easily exploitable.
In the example you gave, if the SB opens to 4.5$, we need to map it to either open to 4$ or to 5$, which is an error of 0.5$ a (or 25BB/100).
The error will grow significantly after next player actions in the hand.

Furthermore - while differences between bets can be small, DeepStack computes an unique strategy for any situation possible.
While it can be argued this is not necessary, I don't think that game size is artificial.

Last edited by Lifrordi; 03-04-2017 at 06:28 PM. Reason: Typo

Quote

03-04-2017 , 06:54 PM

#87

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by getmeoffcompletely

First, note that any action DeepStack can make has a non-zero probability due to numerical stability.

Given how the algorithm operates, it has a perfect understanding of a relative hand strength on every street it sees.
It uses no card abstraction on the street it's making the decision on.

Quote:

Originally Posted by getmeoffcompletely

If you put this bot up against the same kind of opposition Libratus faced, it would get absolutely demolished. I'm sure there's lots of people who would be willing to wager large sums of money on that.

I disagree.

The experiments we did suggest that the bot is substantially closer to the optimal strategy than the abstraction based techniques.
The accompanying "Local Best Response" paper (LBR) as well as our DeepStack paper includes all the details.
Even the Libratus creators agree that LBR would beat Libratus if evaluated against, while it fails to exploit DeepStack!
This all suggests that DeepStack plays closer to the optimal strategy than Libratus does.

Given that, I don't see why the players you talk about loose to Libratus, but would "demolish" DeepStack.
Libratus might have done very similar actions to the ones you question but since the hand files were never made public, you just did not see it.

Quote:

Originally Posted by getmeoffcompletely

What I'm saying is that it's very disingenuous of your team to parade the "first AI to beats human pros" headline.

Philip "Phil" Courtney Laak (born September 8, 1972) is an Irish-American professional poker player and a poker commentator, now residing in Los Angeles, California.
Laak holds a World Poker Tour (WPT) title, a World Series of Poker (WSOP) bracelet and has appeared on numerous nationally aired television shows.
source: https://en.wikipedia.org/wiki/Phil_Laak

Are you saying it's disingenuous of us to call Phil Laak professional poker player?

Last edited by Lifrordi; 03-04-2017 at 06:54 PM. Reason: Typo

Quote

03-04-2017 , 07:02 PM

#88

BetaPro

adept

Join Date: Apr 2008 Posts: 1,033

Is there another match coming give that Primordial says he's considering accepting the challenge.

Quote

03-04-2017 , 07:05 PM

#89

NoamBrown

stranger

Join Date: May 2015 Posts: 7

Quote:

Originally Posted by Lifrordi

Even the Libratus creators agree that LBR would beat Libratus

I do not agree with this, and I've explained this to Mike Bowling previously.

Quote

03-04-2017 , 07:11 PM

#90

HeroStory

journeyman

Join Date: Aug 2012 Posts: 241

only one way to settle this. deepstack vs libratus hu4rolls

Quote

03-04-2017 , 07:12 PM

#91

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by BetaPro

Is there another match coming give that Primordial says he's considering accepting the challenge.

Follow us on the twitter - https://twitter.com/DeepStackAI - to get the latest updates on that.
As we already announced, we are going to play freezout tournaments that will be streamed using on Twitch and we will be posting the details.

Notice that DeepStack can play freezouts (that is, it understands any stack size) - while all the previous precomputed abstraction based agents (such as Libratus) could play only the fixed stack size (200BB).

Quote

03-04-2017 , 07:21 PM

#92

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by NoamBrown

I do not agree with this, and I've explained this to Mike Bowling previously.

I was refering to what prof. Tuomas Sandholm agreed on at the AAAI conference talk.
I did not realize you have different opinion, and I am thus sorry to say that - I stand corrected and I apologize. I will edit my previous comment.

EDIT: the system won't let me now edit the previous comment. So I put it explicitely here.
Libratus creators DO NOT agree that LocalBR technique would beat Libratus.

Last edited by Lifrordi; 03-04-2017 at 07:30 PM. Reason: time limit

Quote

03-04-2017 , 07:25 PM

#93

PrimordialAA

Carpal \'Tunnel

Join Date: May 2007 Posts: 11,673

Quote:

Originally Posted by BetaPro

Is there another match coming give that Primordial says he's considering accepting the challenge.

possibly, talking with the DeepStackAI team to confirm some things, it seemed like the Libratus pros were blindsided by the fact it would be adjusting overnight, adding to it's solution state, etc., If we confirm I'll be playing libratus in it's current state, without additions to how it was described in the publication I'd love to participate in a challenge.

Quote:

Originally Posted by HeroStory

only one way to settle this. deepstack vs libratus hu4rolls

Obv this would be much more entertaining than myself, so +1 for this

. I assume we'll see some incarnation of both at the ACPC though? I've never followed it but maybe they can do some sort of live updating or something to make it cool to watch or follow results

Quote

03-04-2017 , 08:31 PM

#94

NoamBrown

stranger

Join Date: May 2015 Posts: 7

Quote:

Originally Posted by PrimordialAA

Quote:

Originally Posted by HeroStory

only one way to settle this. deepstack vs libratus hu4rolls

Obv this would be much more entertaining than myself, so +1 for this

. I assume we'll see some incarnation of both at the ACPC though? I've never followed it but maybe they can do some sort of live updating or something to make it cool to watch or follow results

The ACPC has strict limits on submission size and computing power (2 CPU cores, and 250 GB of space). Neither Libratus nor DeepStack met those limits, so they weren't submitted this year. There's been some discussion about changing those limits for future ACPCs, but that's still up in the air.

Both teams have played against previous bots from the ACPC though, and Libratus's performance was far better. So I think everyone knowledgeable about the bots would agree that if both bots were to play each other head to head as they exist right now (or as they existed ~1 month ago anyway), Libratus would likely win. You could argue that a head-to-head comparison isn't fair, because Libratus uses more computing power. You could also argue that head-to-head performance isn't the right metric, and that maybe worst-case performance against an omnipotent opponent is the right way to measure, in which case I think the answer is less clear. But head-to-head with the bots from the studies, I think it's clear Libratus would win.

Either way, I don't think there is much to be gained from a research standpoint in playing the bots against each other. If you're just looking for some really impressive master-level poker, we'll hopefully have the Libratus match videos up on Youtube in the near future. (And maybe a "best of" video.)

Quote

03-05-2017 , 01:59 AM

#95

cheet

veteran

Join Date: Jan 2006 Posts: 2,498

Quote:

Originally Posted by PrimordialAA

. I assume we'll see some incarnation of both at the ACPC though? I've never followed it but maybe they can do some sort of live updating or something to make it cool to watch or follow results

Just wanted to say that we were not blindsided by a strategy improvement component to the A.I. We knew it was improving its strategy overnight but we just didn't know exactly how/what it was doing.

Quote

03-05-2017 , 02:38 AM

#96

Parsons Grinder

veteran

Join Date: Aug 2014 Posts: 2,372

Quote

03-05-2017 , 04:02 AM

#97

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by NoamBrown

Both teams have played against previous bots from the ACPC though, and Libratus's performance was far better. So I think everyone knowledgeable about the bots would agree that if both bots were to play each other head to head as they exist right now (or as they existed ~1 month ago anyway), Libratus would likely win.

I am sorry but I must strongly disagree with that argument.
Here's my counter-argument.

We submitted this year a 10-line agent that plays a very weak poker, but this ACPC it beats Slumbot submission by 36BB/100!
Are you implying that if Libratus does not beat Slumbot by more than 36BB/100, Libratus would loose to that script?

Similarly - in limit poker, we can now precisely measure the bot's closeness to the optimal strategy (exploitability).
What such measured showed us there, is that one on one performance is a very poor measure of the exploitability.

And in no-limit, LocalBR is currently the best known technique to estimate the exploitability since we can't compute it exactly - thus we should care about localBR.

Quote:

Originally Posted by NoamBrown

You could argue that a head-to-head comparison isn't fair, because Libratus uses more computing power. You could also argue that head-to-head performance isn't the right metric, and that maybe worst-case performance against an omnipotent opponent is the right way to measure, in which case I think the answer is less clear.

We argue that head-to-head comparison is a very poor measure of the agent's exploitability, which is what we care about in equilibrium approximating techniques.

As you point out, you use substantially large computing power and from that standpoint, the computational resources indeed seem unbalanced.
Furtermore, Libratus takes 40seconds to act on turn while DeepStack only about 5.
The good thing about that is that people could use that time to stretch out during your match

https://clips.twitch.tv/DaintyBraveLorisCurseLit

Quote:

Originally Posted by NoamBrown

But head-to-head with the bots from the studies, I think it's clear Libratus would win.

As I said, there is no evidence to support that (if you don't mean to imply the 10-line code beats Libratus)

Quote:

Originally Posted by NoamBrown

Either way, I don't think there is much to be gained from a research standpoint in playing the bots against each other.

I totally agree.
The one on one performance has very little scientific value.

Let me finish with saying that I believe Libratus was a great achievemnt and I am impressed by the work.
What we seem to disagree on is the importance of local BR numbers, but since I of course don't expect you to run localBR vs Libratus, I think there is no point in arguing what the numbers would look like.

Quote

03-05-2017 , 04:55 AM

#98

getmeoffcompletely

old hand

Join Date: Aug 2012 Posts: 1,670

I don't know how familiar you are with the poker world, let's just say the term "professional" gets thrown around quite a bit, without playing skill being any sort of factor. The skill discrepancy between live and online poker is huge. Someone like Laak would have a difficult time beating 100NL online. The main "skill" in being a live pro is being able to play against very bad opposition.

Sure, you can technically find some live "pro", have your bot beat him and say "you've beat the pros". But for anyone familiar with this game that achievement is next to worthless.

The Libratus team went out and got some of the very best HU players in the world, all of whom are actively near the top of their field and they streamed every single hand played in complete transparency. That's what I expect anyone in good faith to do before proclaiming they "beat the pros".

Quote

03-05-2017 , 05:30 AM

#99

Lifrordi

stranger

Join Date: Mar 2017 Posts: 9

Quote:

Originally Posted by getmeoffcompletely

The Libratus team went out and got some of the very best HU players in the world, all of whom are actively near the top of their field

From what I understand, Libratus team is always very explicit about saying that they beat "HU specialists / top HU players etc." and I am fine with that statement - I think that's fair.
We never say that.
We are always clear to state that we beat professional poker players.
Since we try to be careful about the terms and we seem to be in agreement with the Libratus team in these statements, I am offended to be called 'disingenuous' to call Phil Laak 'professional poker player'

Quote:

Originally Posted by getmeoffcompletely

I don't know how familiar you are with the poker world, let's just say the term "professional" gets thrown around quite a bit, without playing skill being any sort of factor.

While I am not very familiar with the poker world, I was at the impression that Dough Polk, Otb_RedBaron a Sauce123 are believed to be the best HU specialist in the world, and I did not see any of them to play neither of the agents.

Quote:

Originally Posted by getmeoffcompletely

But for anyone familiar with this game that achievement is next to worthless.

As we showed in our study, the previous abstraction based approaches can be beaten by localBR - and localBR beats all the evaluated agents by more than 100BB/100!
Furthermore, as we have seen at ACPC, agents can be also beaten by a 10-line script.
I find it hard to believe any professional poker player would loose to such programs (though we can't be sure of course).

Our achievement is a revolutionary local search algorithm combining deep neural networks, for the first time we have such local search algorithm for imperfect information games!
This is important since these techniques worked well for a long time in perfect inf. games (DeepBlue, AlphaGo) and the algorithm was just published in Science.
Furthermore, we evaluated the technique using the currently best technique to estimate a program's exloitability - localBR, and it shows great improvement.
The final evaluation included professional players, and it's for the first time any program beats any professional poker players.

I don't see how you can think this achievement is worthless.

Quote:

Originally Posted by getmeoffcompletely

...they streamed every single hand played in complete transparency.

Not true at all!
The stream was quite often down, quite often it showed only one table out of two and the hand histories were never released.
Furthermore, many hands were simply removed from the study (and the humans won such hands).

Last edited by Lifrordi; 03-05-2017 at 05:35 AM. Reason: typo

Quote

03-05-2017 , 12:41 PM

#100

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Quote:

Originally Posted by Lifrordi

We submitted this year a 10-line agent that plays a very weak poker, but this ACPC it beats Slumbot submission by 36BB/100!

Yikes! People who question the strength of Deepstack might want to have a few games against Slumbot. I beat the old version over a meaningless sample of random button-clicking, but the 2017 AI seems much stronger. My understanding is that the only EV winners on the leaderboard for more than 5k hands are other bots. If "marcus" is human, I would like to know how he achieved his 150bb/100 winrate, because Slumbot is destroying almost everyone else.

Quote

Page 4 of 6

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 4 of 6

First

1 2 3 4 5 6

Last