Heads Up Hold'em Solved? - Page 3 - Poker News

PartyGirlUK - Computing a perfect adversary to a bot isn't the same thing as computing a Nash equilibrium, and making a perfect adversary is way easier. We developed a technique in 2011 that takes in a strategy and returns the best counter-strategy for beating that strategy, and the expected value of using that counter-strategy. The counter-strategy isn't a Nash equilibrium, and in fact would probably lose horribly if used against any other player. These counter-strategies tend to be very brittle: they win as much as possible against one specific person, but can lose badly to even terrible other players.

Here's a link to the paper (link) describing how we do that computation. It doesn't take much memory, but it takes about 76 CPU-days to measure the exploitability of any one strategy. The "Math is not that important" thread I linked to in my first post goes into a lot more detail, if you're curious about more of the details.

Quote

10-09-2013 , 04:20 PM

#52

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

How much can you beat the strategy that beats your best strategy for 2bb/100 for?

Quote

10-09-2013 , 04:21 PM

#53

ohsosick

newbie

Join Date: Oct 2013 Posts: 45

When the next man versus Machine match is planned?
With the progress made recently in finding a nash in NLHU, I think a lot of people would like to see it.

Quote

10-09-2013 , 04:31 PM

#54

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

PartyGirlUK - For that particular one, I'm not sure. We usually just keep the score that the best response would get and not the actual best response strategy, since it would take ~30TB to record it. But all of our earlier experience with other best response counter-strategies suggests that it would be beatable for thousands of mbb/g: multiple big blinds per game.

We have another branch of research that's aimed at making what we call robust counter-strategies: strategies that are almost as exploitive as a best response, but still not very exploitable, like a Nash equilibrium. It turns out that you can often get 80% or more of the best response's value while only opening up a small hole in your own strategy.

We can't compute those for the full unabstracted game the way we can do best responses, though - it'd be as hard as computing a Nash equilibrium.

ohsosick - No plans for another Man-vs-Machine match just yet.

Quote

10-09-2013 , 04:37 PM

#55

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

If you don't record the strategy does that mean you have no idea where your AI is getting exploited (e.g, bluffs river too much, limps the button with too defined a range etc.)?

Quote

10-09-2013 , 04:40 PM

#56

jaytorr

centurion

Join Date: May 2003 Posts: 127

Quote:

Originally Posted by PartyGirlUK

How much can you beat the strategy that beats your best strategy for 2bb/100 for?

Keep in mind that the "perfect adversary" could easily be a ridiculous strategy that folds all his bluff catchers in certain spots, raises them in others, etc. just because that's the best response against its slightly unbalanced opponent.

Quote

10-09-2013 , 04:54 PM

#57

crabbyface

banned

Join Date: Oct 2013 Posts: 454

Quote:

Originally Posted by TouchOfEVil

Sick post, also made me feel better about the "bot threat"

, most posts regarding the matter gives me tummy hurt

Quote

10-09-2013 , 05:03 PM

#58

CoronalDischarge

grinder

Join Date: Dec 2006 Posts: 574

Quote:

Originally Posted by FullyCompletely

PartyGirlUK - Computing a perfect adversary to a bot isn't the same thing as computing a Nash equilibrium, and making a perfect adversary is way easier.

So bringing this back to the cryptic mention of 'software' that started this conversation (and I won't mention the player's name again because no one's accusing anyone of anything here)... given that most mid/high stakes HU action nowadays is between players who have a ton of hands on eachother already (or can easily obtain them), isn't the threat of the 'Nemesis bot' quite alarming and real?

What I'm really wondering is, is it feasible to write algorithms that can detect things like 'bluffs too much in x spot', without needing to have solved the game first? I may well be wrong but I can imagine there would be shortcuts for things like that, assuming one had a large sample of hands to work with.

Last edited by CoronalDischarge; 10-09-2013 at 05:09 PM.

Quote

10-09-2013 , 05:13 PM

#59

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

PartyGirlUK - Sort of; the single exploitability number tells us that we're making a mistake, but doesn't tell us where we're making it. Even if we kept the counter-strategy, though, it's tough to connect the opponent's response back to the point earlier in the game where you made the mistake, or how to fix that mistake.

The program teaches itself how to play, so we almost never look at or adjust its behaviour at individual decisions. Instead, we look for broad trends like "it might not be recognizing this board texture properly", then find a way to represent board texture better, and then recompute another strategy from scratch.

Quote

10-09-2013 , 05:32 PM

#60

PonyTailGeorge

enthusiast

Join Date: Oct 2009 Posts: 95

Quote:

Originally Posted by FullyCompletely

Thanks for your insight. Your passion almost makes me want to go back to school. I hope you are able to accomplish your goals on this path.

Best regards!

Quote

10-09-2013 , 06:04 PM

#61

idonot

journeyman

Join Date: Aug 2010 Posts: 278

Quote:
Team Members: Alexander Lee
Affiliation: Independent
Location:
Technique:
The bot was built using proprietary universal game theory methods applied to poker. We complete Fixed Limit Hold’em game tree search without approximation. Original AI utilizes own database of about 3TB and to comply with completion format our team provided special simplified version of Neo - Neopokerbot_FL2V.
the bolded part means you have the database but not THE equation ,no?
now you need the bot to play a zillion hand then have those hands analysed by an AI to be able to discern the form of an optimal strategy, or you already have those partial informations about THE equation?

Quote

10-09-2013 , 06:10 PM

#62

PartyGirlUK

Carpal \'Tunnel

Join Date: Nov 2004 Posts: 23,530

Would you send me a link to the best explanation of how the program teaches itself to play?

Quote

10-09-2013 , 07:02 PM

#63

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Sure. The best comprehensive description of the whole problem and how all the pieces work is probably still my Masters thesis from 2007. I wrote it just after the first Man-vs-Machine match against Phil Laak and Ali Eslami, and it describes all of the pieces that went into Polaris. At that point, making a good equilibrium approximation had three steps: 1, take the real game and simplify it down to an "abstract" game that fits in your computer's memory; 2, use the CFR algorithm which plays this simplified poker game against itself and converges towards a Nash equilibrium for the simplified game; 3, use that strategy to play the real game of poker and hope that it's also close to a real game Nash equilibrium.

The progress that we've made since 2007 has attacked that "hope that it's also close" part. We have new variants of CFR now that provably get us close to a real game Nash equilibrium, and we have tools like our best response algorithm that lets us measure exactly how close we're getting. The reason I'm suggesting my Masters thesis first even though it's a bit out of date is that it was long enough that I could go into detail and describe the intuition behind how everything works, and it should give a good foundation for reading anything more recent. Our research papers usually have very tight 6 or 8 page limits, so we have to assume that the reader has a lot of technical background knowledge; they aren't easy places to start.

Here's a link to my MSc thesis:
http://webdocs.cs.ualberta.ca/~johan...msc-thesis.pdf

Here's a link to all of my research papers. The "Details" link for each paper has a "Notes" section where I've tried to give an easy-to-read description of what the paper's about, in contrast to the "Abstract" sections which can be pretty technical.
http://webdocs.cs.ualberta.ca/~johan...lications.html

My research group has a list of our publications here:
http://webdocs.cs.ualberta.ca/~games...lications.html
And a Twitter account where we announce any new papers or results:
https://twitter.com/PolarisPoker

Last edited by FullyCompletely; 10-09-2013 at 07:10 PM.

Quote

10-09-2013 , 07:27 PM

#64

R*R

I wish I was Einstein

Join Date: Mar 2007 Posts: 12,753

Based on my sample size HAL is solved.

Quote

10-09-2013 , 09:01 PM

#65

pilliapina

grinder

Join Date: May 2012 Posts: 490

Is there any way you could play against the best HUNL bot currently available? I would love to try playing against it on a couple of tables, just really curious.

Quote

10-09-2013 , 09:04 PM

#66

pilliapina

grinder

Join Date: May 2012 Posts: 490

Quote:

Originally Posted by yanks

let it be known: if you are in a field that a computer will eventually be able to outperform your abilities, your career has an expiration date. poker could, although we have yet to see, be this way.

This applies to pretty much every field imaginable. Eventually artificial intelligence and robotics will advance to such a level that every job currently done by humans will more efficiently be performed by machines.

Quote

10-09-2013 , 09:34 PM

#67

druidfluid

adept

Join Date: Mar 2006 Posts: 992

Quote:

Originally Posted by pilliapina

Is there any way you could play against the best HUNL bot currently available? I would love to try playing against it on a couple of tables, just really curious.

Quote

10-09-2013 , 09:41 PM

#68

JudgeHoldem

banned

Join Date: May 2012 Posts: 8,197

Quote:

Originally Posted by pilliapina

Hookers?

Quote

10-09-2013 , 10:38 PM

#69

Hoopster81

Carpal \'Tunnel

Join Date: Feb 2005 Posts: 7,277

Quote:

Originally Posted by JudgeHoldem

Hookers?

Quote

10-10-2013 , 12:12 AM

#70

SmokeyQ123

Carpal \'Tunnel

Join Date: Nov 2007 Posts: 6,583

FullyCompletely, thanks for your posts here - interesting stuff and I plan to give your thesis a read

Quote

10-10-2013 , 12:35 AM

#71

Jeff W

Carpal \'Tunnel

Join Date: May 2004 Posts: 9,556

Quote:

Originally Posted by ohsosick

The graph beetween best no limit hu bot (200deep @200NL) and the second :

This variant is not solved, but the nash approximation used by the bots are so close to the real nash that the winner can only won 670bb on 102KH :
0.33BB/100
Std Dev= 131bb/100

Although this doesn't tell us how close the bots are to equilibrium, it's interesting that the bots are so closely matched despite having divergent strategies.

Quote

10-10-2013 , 04:01 AM

#72

clintbygget

centurion

Join Date: Apr 2007 Posts: 174

if winrates are so close in the graph on 1st page how come the swings are so low? kinda goes against what ive learned when goofing around with ev++ calcs. if I remember correctly you could easily end up a looser over big samples with small winrates.

Quote

10-10-2013 , 09:32 AM

#73

ChazDazzle

adept

Join Date: Aug 2005 Posts: 947

Quote:

Originally Posted by FullyCompletely

We have another branch of research that's aimed at making what we call robust counter-strategies: strategies that are almost as exploitive as a best response, but still not very exploitable, like a Nash equilibrium. It turns out that you can often get 80% or more of the best response's value while only opening up a small hole in your own strategy.

Is opponent modeling used to develop strategies like these?

Quote

10-10-2013 , 09:51 AM

#74

...|...

banned

Join Date: Feb 2012 Posts: 3,072

Quote:

Originally Posted by R*R

Based on my sample size HAL is solved.

That bot is terrible All you ahve to do is overbet everytime he chks on the river and he'll fold

Bet small or limp for a 85% raise reraise strat...

Quote

10-10-2013 , 03:18 PM

#75

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

We don't have Hyperborean or Polaris online anywhere to spar against, but we're looking into it. We'd like to get a better feel for how good our programs are getting at no-limit.

ChazDazzle - we do a lot of research into opponent modelling, counter-strategies and adaptation, and that's what I was talking about in the bit you quoted. Our equilibrium strategies that I've been posting exploitability numbers for don't use opponent modelling, though - they just try to play perfect defence.

It's easier to see the trend in exploitability as a graph than as just the raw numbers. Here's the exploitability of the best heads-up limit bot we've had in every year over time. I've labelled some of the more important or interesting events.

Quote

Page 3 of 12

First

1 2 3 4 5 6 7 8

Last

Post Reply Subscribe

...

Page 3 of 12

First

1 2 3 4 5 6 7 8

Last