Math is ...apparently not that important? - Page 4 - Heads Up Poker

Two Plus Two Forums Archived Forums Strategy Heads Up Limit

Math is ...apparently not that important?

...

Page 4 of 7

1 2 3 4 5 6 7

Page 4 of 7

1 2 3 4 5 6 7

08-15-2011 , 03:58 PM

#76

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi AmyIsNo1,

Quote:

Originally Posted by AmyIsNo1

Is that 5.8 billion information sets? That's more than 7 times as many as in the 2010 version (800 million, from the Accelerating Best Response paper). That's a lot of progress in a single year (but the exploitability only decreased from 135 to 104 mb/g?). If you can keep this up you will be able to compute the full, unabstracted equilibrium just 5 years from now! How much RAM and how many CPU days did it take to compute this strategy?

Yeah, I've been using 'decision point' to mean 'information set' in this thread. The 2011 program has 5.8 billion information sets. We got access to a new cluster this year, where every node has 256 gigs of RAM, 8x as much RAM as the 32-gig nodes on the cluster that we used last year. That let us solve a game that was about 8x as large by refining the card abstraction we used, without having to write any new code. We're not going to get a new cluster every year, though, so next year's program will probably be the same size. However, we can instead go back to looking at smarter ways to do card abstraction, to make a better program without needing even more resources.

To make this year's program, we used a single cluster node that had 24 cpus and used around 240 out of its 256 gigs of RAM for about two weeks. Like you noticed, you get diminishing returns by throwing larger computers at the problem. Compared to last year's program, which used only about 30 gigs of RAM, 8 cpus and about 2 weeks, this year's program is 30 mb/g less exploitable and only wins about 2 mb/g more in a one-on-one match.

Quote:

Wow, that's an awful lot of combinations (1,286,792, right?). How much of this is remembered for the turn and river rounds? Seems a bit wasteful though to keep everything; for example if the board is KQJ, do we really care if our hand is 76 or 65?

It's a good point, but yeah, that's the trick. You definitely care some amount about the difference between 76 and 65, but the errors you make from abstracting those hands together could be far less than from merging two other hands together elsewhere in the game. But in practice, doing the preflop and flop perfectly (at least, in heads-up limit) is definitely worth doing. The first two rounds are so small compared to the river that there's really not much point in trying to figure out ways to save memory there.

But like you mentioned, if we had to remember that information for future rounds, that would hurt the quality of the turn and river abstractions. We use imperfect recall abstractions now ("A Practical use of Imperfect Recall", SARA 2009), where we abstract on each round without considering the sequence of buckets from previous rounds. So, none of it is remembered, and that's what makes even the perfect preflop possible. Just to be clear, the perfect flop abstraction remembers the preflop; it's just the turn and river that use imperfect recall. In terms of information sets, each round in this year's program is this big:

(betting sequences * buckets = total information sets)
Preflop: 8 * 169 = 1,352
Flop: 70 * 1,348,620 = 94,403,400
Turn: 630 * 1,521,978 = 958,846,140
River: 5,670 * 840,000 = 4,762,800,000

So even if we could use abstraction to cut out half of the flop buckets in order to spend them elsewhere, it would make virtually no difference to the size and quality of the turn or river abstractions. Since our abstraction technique isn't perfect anyways, doing the early rounds perfectly seems like a great way to spend those information sets.

Quote:

Have you done any work on PLO? If betting sizes are limited to say half-pot, three quarter pot and pot, it seems to me that this could be easier to handle than NLHE, even though there are more private card combinations.

No, so far all of our real work has been in Texas Hold'em, but we're now working on heads-up and ring, and limit and no-limit. We've talked about Omaha a bit, but we haven't found a good scientific reason to look at other games just yet. You're probably right that PLO would be easier than large stack heads-up no-limit, but since both games need action abstraction, we can reuse a lot of our card abstraction work by staying in Texas hold'em. The techniques we use for card abstraction should still work for Omaha, but it'd take some programming time to change the code to handle it.

Quote:

I understand that you cannot publish the full strategy. But I wonder if it would be possible for you to produce a video (or just some hand history text, with or without hole cards exposed) of the program playing against itself for a few hundred hands or so. Would be very interesting to watch (although I don't really play heads up Limit, I'm sure there are other people here who could learn a lot from watching this).

Sure. The Annual Computer Poker Competition is just wrapping up, and the official results and hand histories from the match should get published to the website sometime over the next week or two:
http://www.computerpokercompetition.org/

The hand histories will have thousands of full information games between every pair of competitors, so you'll be able to get a sense of how these strategies are playing.

Somewhat related to this, we started a Twitter account for our research group a while ago: @PolarisPoker. When we have news to share, like the competition results and logs or new research papers, we'll tweet a link to it.

Quote:

Thank you!

No problem! Thanks for the great questions.

08-16-2011 , 04:38 PM

#77

AmyIsNo1

journeyman

Join Date: May 2011 Posts: 304

Quote:

Originally Posted by FullyCompletely

Thanks for this very interesting information! I didn't even know that a single machine could have that much RAM (thought 144 GB was the maximum, and I don't have access to anything even near that). And yeah, I understand now that these kinds of leaps in processing power won't happen every year.

Quote:

We use imperfect recall abstractions now ("A Practical use of Imperfect Recall", SARA 2009), where we abstract on each round without considering the sequence of buckets from previous rounds. So, none of it is remembered, and that's what makes even the perfect preflop possible. Just to be clear, the perfect flop abstraction remembers the preflop; it's just the turn and river that use imperfect recall. In terms of information sets, each round in this year's program is this big:

(betting sequences * buckets = total information sets)
Preflop: 8 * 169 = 1,352
Flop: 70 * 1,348,620 = 94,403,400
Turn: 630 * 1,521,978 = 958,846,140
River: 5,670 * 840,000 = 4,762,800,000

I had seen the imperfect recall paper before but hadn't realized exactly what it meant (and I'm still having trouble understanding exactly how the bucketing is done). Now I see how unimportant the number of flop buckets is in deciding the total size of the abstraction.

It still is a bit puzzling to me how such an enormous abstraction could still be exploitable for a tenth of a big blind per hand (even if this is the very worst case) so I started thinking about what exactly it could be that makes it exploitable. Everything that follows is just speculation on my part, so feel free to ignore it if it doesn't make any sense...

It seems to me that forgetting public information could be much worse than forgetting private information. For information hiding reasons we will want to play lots of different private holdings the exact same way anyway, so it shouldn't hurt so much if some of the hands are indistinguishable. In contrast, having a completely different strategy for each specific board is not unreasonable. I wonder what would happen if we put more emphasis on remembering public information by decreasing the number of flop buckets and then (with respect to the board only) using perfect recall for the turn and river, something like:

Flop: 70 * 8 * 1,755 = 982,800
Turn: 630 * 8 * 8 * 1,755 = 70,761,600
River: 5,670 * 8 * 8 * 8 * 1,755 = 5,094,835,200

My next idea is something I'm borrowing from audio processing (which I have done some work on in the past). We could look at the translation from cards to buckets as converting a high-resolution signal to a similar signal of lower resolution. This is a kind of quantization, that will introduce a quantization error. If the "card signal" is simply rounded to the nearest bucket, this error will be correlated with the signal, i.e. it will in some sense be predictable (which intuitively seems bad).

The correlation of the quantization error to the signal can be prevented by adding a small amount of noise (dither noise) to the signal when doing the conversion. So instead of always assigning to a specific bucket we would randomly choose between adjacent buckets. I don't think it's that far-fetched to think that this new dimension of randomness could push the abstract equilibrium closer to the real one, even though the solution in the abstract game itself becomes less accurate. (Sort of like the overfitting problem working in reverse!). Do you have any thoughts on this, assuming it's something you haven't already tested?

08-18-2011 , 07:11 PM

#78

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi AmyIsNo1,

Quote:

Originally Posted by AmyIsNo1

It still is a bit puzzling to me how such an enormous abstraction could still be exploitable for a tenth of a big blind per hand (even if this is the very worst case) so I started thinking about what exactly it could be that makes it exploitable.

I think the right joke here is "size isn't everything" :^P

You can increase the size of the abstract game to somewhat make up for not being perfect in how you group cards together. But if you're using abstraction (other than just removing suit isomorphisms), you're going to cause at least some loss. Our new card abstraction technique is pretty good, but it's definitely not perfect. And other than just the information you lose from abstraction, there's another problem, too. One of our papers, "Abstraction Pathologies in Extensive Games", AAMAS 2009, shows that once you use abstraction for both players, the equilibrium in an abstract game might not be (and often isn't) the real-game least exploitable strategy that you can represent in that abstract space. This is related to the overfitting problem we describe in the Accelerated Best Response paper, where after some initial improvement, the strategy gets farther from equilibrium the longer we run our game solver. It's a little funny because as we run the program longer, the strategies continue to improve in one-on-one performance even as they get worse in worst-case performance.

So, being exploitable for 104 mb/g wasn't too surprising to me. In spite of the large size (which is still around 1/100,000 the size of the real game), there's still two problems: the quality (and not just the size) of the abstraction, and the equilibrium mismatch where the abstract game equilibrium might not be the closest abstract strategy to a real equilibrium strategy.

Quote:

It seems to me that forgetting public information could be much worse than forgetting private information. For information hiding reasons we will want to play lots of different private holdings the exact same way anyway, so it shouldn't hurt so much if some of the hands are indistinguishable. In contrast, having a completely different strategy for each specific board is not unreasonable. I wonder what would happen if we put more emphasis on remembering public information by decreasing the number of flop buckets and then (with respect to the board only) using perfect recall for the turn and river, something like:

Flop: 70 * 8 * 1,755 = 982,800
Turn: 630 * 8 * 8 * 1,755 = 70,761,600
River: 5,670 * 8 * 8 * 8 * 1,755 = 5,094,835,200

What you're describing is similar to how we represent board texture in our buckets. Our board texture abstraction identifies 9 flop textures (2 of a suit? 3 of a suit? Is there a pair? 3 of a kind?). We then have 51 turn textures which recall which of those 9 flop textures were seen, and 280 river textures which depend on the flop and the turn. This lets us keep perfect recall of the board texture across all three rounds, even though we are forgetting how strong our hand was on previous rounds.

When I listed 840,000 river buckets earlier, that's actually 280 textures with (on average) 3000 buckets of private information inside of each (280 * 3000 = 840,000). Some board textures get more private buckets than others, depending on how frequently the board texture occurs and how difficult it is to abstract the hands in that texture.

Quote:

My next idea is something I'm borrowing from audio processing (which I have done some work on in the past). [....] So instead of always assigning to a specific bucket we would randomly choose between adjacent buckets. I don't think it's that far-fetched to think that this new dimension of randomness could push the abstract equilibrium closer to the real one, even though the solution in the abstract game itself becomes less accurate. (Sort of like the overfitting problem working in reverse!). Do you have any thoughts on this, assuming it's something you haven't already tested?

This is something that we've thought of as well, but we haven't gotten a chance to try it yet. I think it's a good idea, though. If the abstraction was already perfect, then we know this couldn't help. But our abstractions definitely aren't perfect, and something like this might help smooth out some of the overfitting problem. There would be a little bit of extra overhead at the start in deciding which bucket a hand translates to, but I don't think it would be too bad in practice.

08-21-2011 , 08:48 PM

#79

blaaaaaaa

stranger

Join Date: Aug 2011 Posts: 5

Hello Guys ,

I am a math student from Germany and recently read a bunch of all the papers/thesis written on poker bots. Especially your master thesis sums up alot pretty good (althought I am not through with it yet) , so thanks for that and all the good content here, Mike. I might try to write my thesis about some poker stuff too.

The recent insights in Best Response strategies seems pretty exciting.
Did you guys ever run a Best Response strategy against a Best Response strategy to your best bots? I would be curious to see how they perform overall , in sense of exploitability and how they perform against other bots.

Intuitively I would say that finally finding best responses to a given strategy should speed up progress a lot, since the best response strategy inherently shows all flaws to a given strategy and will give hints in which direction to look for improvement/better modelling.

For instance, since THE (unknown) Nash Equilibrium of hulhe has the same counter-strategy, it seems plausible to look deeper into the biggest diffrences in strategy and its best response strategys and maybe try to bring them (or probably just the one of the bot) closer together.

Another question: if you fix preflop action , would the total game be considerable smaller or would it be the same argument why no preflop/flop action is merged, that it would be just a tiny part of the game?

I am looking forward to the competition results!

08-22-2011 , 08:48 PM

#80

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi blaaaaaaa,

Thanks! I'm glad you found my thesis useful. Poker is a fun field to do research in as a student. There are still lots of open problems that nobody has a good handle on just yet.

Quote:

Originally Posted by blaaaaaaa

The recent insights in Best Response strategies seems pretty exciting.
Did you guys ever run a Best Response strategy against a Best Response strategy to your best bots? I would be curious to see how they perform overall , in sense of exploitability and how they perform against other bots.

Sort of. When you run the Accelerated Best Response (ABR) algorithm, you can get the best response's strategy and/or just the value for using the best response strategy. Since the best response strategy itself is massive, we just keep the value. That means we don't have the entire strategy around to use it against other opponents. There are some ways around this that we haven't had time to try yet.

But we have done experiments with computing abstract-game best response strategies, which are small enough that we could do these other experiments. Like you suggested, we were able to measure how exploitable the best response strategies themselves were, and use them against other strategies to see how they work against someone they weren't trained to beat.

Some of those results are in my thesis and in the Restricted Nash Response paper. For example, look at Table 4.2 on page 64 (pdf page 76) of my thesis. The columns are different strategies that we had in 2007, and the rows are best responses (well, "good" responses) designed to beat those strategies. The number is the score for the row player. You'll see that the best responses are brittle: they work great against their intended target, and lose badly to other players, even if those other players are terrible. If you look at Figure 5.2 on page 71 (pdf page 83), you'll see a graph showing a set of counterstrategies to one opponent. The point in the top right corner is a best response. The y-axis is how much it beats its opponent, and the x-axis is how exploitable it is. So, you want to be as far to the upper left as you can, and there's a whole set of tradeoffs between these two goals. The restricted Nash response technique generates the best possible set of tradeoffs, that's the red curve. The best response is the highest (it does the best against the opponent), but it is also very far to the right, meaning that it's very exploitable by a worst case opponent. That top-right point is beatable for almost 8000 mb/g by an abstract best response. An ABR would beat it for more than that. Recall that Always-Raise is beatable for 3697.69 mb/g; that means that the best response strategy is far, far worse than Always-Raise against a perfect opponent.

I expect that the strategies we get out of ABR would perform about the same. They will be the best possible strategy to use against their intended opponent, but they are so focussed on maximally exploiting all of the tiny errors in that opponent that a different player, even a bad one, might be able to beat the best response for quite a bit just by accident. And judging from these earlier experiments, a best response to a best response would absolutely crush it.

Quote:

Intuitively I would say that finally finding best responses to a given strategy should speed up progress a lot, since the best response strategy inherently shows all flaws to a given strategy and will give hints in which direction to look for improvement/better modelling.

For instance, since THE (unknown) Nash Equilibrium of hulhe has the same counter-strategy, it seems plausible to look deeper into the biggest diffrences in strategy and its best response strategys and maybe try to bring them (or probably just the one of the bot) closer together.

I agree. There are some great opportunities to use the best response information to figure out how we can do a better job of doing state space abstraction and finding out what cases our programs are failing on.

Quote:

Another question: if you fix preflop action , would the total game be considerable smaller or would it be the same argument why no preflop/flop action is merged, that it would be just a tiny part of the game?

This question comes up a lot. If you fix the preflop, then the game is effectively easier to handle, but there's a big downside as well. That downside is so bad that we usually don't use this approach in practice.

If you fix the preflop, then instead of one big problem, you have lots of small problems and you can try to work on every flop subgame by itself. There are 12,285 smaller subgames to consider, each corresponding to one of 7 preflop betting sequences and 1755 flops (ignoring suit rotations). Solving 12,285 small problems is going to be easier than solving one big one. You could use a finer-grained card abstraction on each.

But there's the downside. After you improve your strategy for the flop onwards, your preflop strategy won't be perfect anymore. Given your new postflop strategy, you could really make better preflop decisions if you went back and changed it. And once you redo your preflop strategy, all of your ranges and your opponent's ranges change on the postflop, so you need to revisit those decisions again, and so on. You wind up cycling between the problems.

People have tried this kind of approach in the past, and I haven't yet seen it work convincingly. For example, one of our old programs from 2003 (PsOpti4, aka Sparbot in Poker Academy) split the game into pre-river and post-flop games and tried to merge them into one strategy. It plays too slowly for us to run ABR on it, but we have abstract game best responses that were beating it for a lot back in 2007 (137 mb/g in my thesis, Table 4.2), and an ABR strategy would win far more than that. In 2009, our computer poker competition entry used a technique called "strategy grafting" that was intended to be a reasonable way to stitch together a game from subgames like this, while making sure that everything was still well balanced. It was far larger than our 2008 entry, and beat it in one-on-one matches, but the 2009 "grafted" strategy was beatable for 440.823 mb/g, compared to 266.797 mb/g for the 2008 entry. So even though the abstraction was much larger and it won one-on-one, it was much farther from equilibrium.

We still use this kind of approach in 3-player games, but that's about it. In 3-player games, we have a few "heads-up experts", different strategies that take over in some of the common preflop folding cases. For example, if the first player to act immediately folds, then you really have a two player game. Neither of those players has acted yet, and the only difference between it and a heads-up game is that the distribution of cards in the deck changed a little, as you can expect that a few of the weak cards are probably out of the deck.

But that's about the only corner case where we still try to break up the game. For everything 2-player, and the 3-player game except for a few early fold cases, we solve the whole problem in one computation instead of trying to break it up.

Hope that helps. Thanks for the questions!

08-29-2011 , 09:20 PM

#81

blaaaaaaa

stranger

Join Date: Aug 2011 Posts: 5

Thank you very much for your insights. I reached the pages with the FBR, nice stuff (ABR even nicer

). I'll finish it reading the next days and then get back in touch with you.
Is there maybe a little minor subject which would be more in the math/ game theoretical corner you would suggest me to look into. Since I am not a programmer I guess its pretty hopeless I will ever be able to really implement stuff.

Regards

08-30-2011 , 12:39 PM

#82

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi blaaaaaaa,

Send me an email at johanson@ualberta.ca .

09-04-2011 , 02:35 AM

#83

BetaPro

adept

Join Date: Apr 2008 Posts: 1,033

Hi Mike,
Any comment on the recent victory of you guys' bot in this year's computer poker competition? Why did it do better than the last 2 years?
Thx

09-04-2011 , 09:57 AM

#84

Jeltsin

centurion

Join Date: Dec 2008 Posts: 146

I downloaded the handhistories (logs but I figured it was the same), but when I try to open it my 7-zip says the file is broken.

How did you calculate the theoretical winrates that were possible against your bot? The numbers are very high and from my experience it's almost impossible to win 12BB/100 (240mb/g, 2008 version), even against very bad players. Obviously I'm nowhere near a perfect player, but still the 12BB/100 figure feels very high.

Is there any reason in particular no human can even get close to those numbers?

09-04-2011 , 02:04 PM

#85

AmyIsNo1

journeyman

Join Date: May 2011 Posts: 304

Quote:

Originally Posted by Jeltsin

I downloaded the handhistories (logs but I figured it was the same), but when I try to open it my 7-zip says the file is broken.

I have sucessfully downloaded and extracted the files. I used command line bzip2.exe and tar.exe (Windows versions); maybe 7-zip has problems with the large size (3.7 GB uncompressed) or number of files (18,490)?

Is there some tool available to convert from the compact 1-line "STATE:0:f:2cAc|3h7c:5|-5" format to normal hand history files?

Last edited by AmyIsNo1; 09-04-2011 at 02:11 PM.

09-04-2011 , 04:40 PM

#86

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi BetaPro,

Quote:

Originally Posted by BetaPro

Hi Mike,
Any comment on the recent victory of you guys' bot in this year's computer poker competition? Why did it do better than the last 2 years?

I talked about this in my replies to KeepItSimple and AmyIsNo1 earlier in the thread. We've made a lot of progress on state space abstraction, and we're also solving much larger (which usually means better) abstract games. This allows us to get much closer to a real Nash equilibrium strategy. In practice, against other opponents, that actually gets us a fairly small (if any) improvement against other programs that are also trying to approximate a Nash equilibrium, even though the program has gotten much stronger against a worst-case opponent.

09-04-2011 , 05:34 PM

#87

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi Jeltsin,

Quote:

Originally Posted by Jeltsin

I downloaded the handhistories (logs but I figured it was the same), but when I try to open it my 7-zip says the file is broken.

I haven't looked at the logs yet, so I'm not sure what could cause that. If you can't get the file open, you could try emailing the competition organizer or make a post on the competition forums to see if anyone else there is having problems.

Quote:

How did you calculate the theoretical winrates that were possible against your bot? The numbers are very high and from my experience it's almost impossible to win 12BB/100 (240mb/g, 2008 version), even against very bad players. Obviously I'm nowhere near a perfect player, but still the 12BB/100 figure feels very high.

Is there any reason in particular no human can even get close to those numbers?

Earlier in the thread, I linked to the research paper that we've written that describes how we did the computation:
http://webdocs.cs.ualberta.ca/~games...t_response.pdf

Here's a very simplified description. The algorithm depends on knowing exactly what probability the strategy has of taking each action at every decision. If you know that, then you can do the math to determine exactly what the opponent's range is at every river showdown. Then, at all of the decisions just before the showdown, you can determine the best action to take for every set of hole cards, and how much you will win on average by taking that action. Once you know the value and action for those decisions, you can make the right choice one step earlier, and so on, until you get back to the start of the game and you know how much you expect to make by playing perfectly before you see your hole cards.

What I've just described is a well known algorithm called expectimax, or sometimes just "best response calculation". The new research that we presented in the paper is a new way of doing expectimax that takes advantage of the structure of imperfect information games like poker, to let you dramatically speed up the computation so that it can be done in one day instead of years.

As to why the numbers are so high and whether or not a human could actually win that much against it... It's certainly possible for a human to win more than 0, but I do think it is very unlikely for a human (or a program) to learn to win the absolute maximum just by playing against it. The algorithm we use to do the computation has a huge advantage over how people play poker and learn to exploit their opponent's mistakes.

In order to learn the best response strategy, the algorithm has to be given the opponent's entire strategy ahead of time, and it assumes that the opponent is not changing over time. It can then look at every possible one of the 10^14 decisions, know exactly what the strategy will do, and choose the single best action to use at each decision in response.

Contrast that with trying to learn how to exploit an opponent while watching them during a game, or by studying a big set of their hand histories. Even if you got to watch them for millions of games, you would still only see a tiny fraction of the 10^14 decisions in their strategy. And worse, to get an accurate idea of what their strategy is, you'd have to see them act at each decision many times in order to accurately judge how often they are calling, raising and folding at that point. If you have a limited number of observations of their strategy, you just won't have the information you need in order to pick the best action every single time.

And in practice, even bad opponents will change their strategy over time, especially if they are losing badly. You're trying to hit a moving target, while the best response algorithm is playing against a single unchanging opponent.

So the best response calculation gives us an upper bound on how bad the strategy could do. Anyone that has to learn about the strategy through observations instead of just being given the exact strategy is at an obvious disadvantage. It seems extremely unlikely for someone to learn enough inside of a few tens of thousands of hands to be able to pull of the maximally exploitive strategy. But that doesn't rule out someone being able to win against it, or find a strategy that can beat it for (say) 75% of that maximum loss.

09-04-2011 , 05:36 PM

#88

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi AmyIsNo1,

Quote:

Originally Posted by AmyIsNo1

Is there some tool available to convert from the compact 1-line "STATE:0:f:2cAc|3h7c:5|-5" format to normal hand history files?

Not that I'm aware of, no. That's the format that we usually work with.

09-06-2011 , 01:45 PM

#89

DoctorBaboon

newbie

Join Date: Jun 2011 Posts: 18

I have a question about this:

"One of our papers, "Abstraction Pathologies in Extensive Games", AAMAS 2009, shows that once you use abstraction for both players, the equilibrium in an abstract game might not be (and often isn't) the real-game least exploitable strategy that you can represent in that abstract space. This is related to the overfitting problem we describe in the Accelerated Best Response paper, where after some initial improvement, the strategy gets farther from equilibrium the longer we run our game solver. It's a little funny because as we run the program longer, the strategies continue to improve in one-on-one performance even as they get worse in worst-case performance."

Could you explain what you mean by the "real-game least exploitable strategy that you can represent in that abstract space"? Does that mean a sort of projection of the true GTO strategy for the real poker game onto your abstract game (i.e. forcing strategies of holdings belonging to the same buckets to always be the same)?

If so, it's of course conceivable that the GTO strategy of the abstract game is not the same as that projection. But what's surprising is that your solver "accidentally" hits a better approximation to that projection on its way to solving the GTO strategy. I would normally assume that the intermediate strategies of the solver are essentially the GTO strategy randomly perturbed (the less the perturbation the closer the solver is to the GTO), and since the number of possible variables to be perturbed is enormous it would seem to be a superlucky coincidence if that perturbation happened to be precisely in the direction of better representing the projection of the GTO for the true game.

09-06-2011 , 06:29 PM

#90

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi DoctorBaboon,

You're on the right track. Here's the way that we're interpreting this effect that should make more sense and seem less surprising.

When we solve an abstract game, we run our solver, CFR, for billions of iterations. Each iteration produces a different strategy that could be used to play the game. Over time, this sequence strategies will converge towards a Nash equilibrium for the abstract game. In practice, if we're trying to make a graph, we just grab (say) one strategy every hundred million iterations and evaluate it, so that we can track our progress towards that goal.

So, we have three ways we could evaluate each of these strategies:
1) How close to a Nash equilibrium in the abstract game is the strategy?
2) How much does the strategy win in one-on-one matches against some fixed opponent?
3) How exploitable is the strategy in the real game? (This tells us how far this strategy is from equilibrium).

As we run CFR, the strategies converge towards an equilibrium in the abstract game. The theory guarantees it and it converges reliably in practice (the NIPS 2007 paper on CFR has examples), so there's no surprises there. This is the only metric that CFR is really guaranteed to improve as you run it. So the longer you run CFR, the closer you get to an abstract game equilibrium, and the better you do on that first metric.

As we keep running CFR, we also find that the strategies keep getting better in one-on-one matches. Every strategy beats the strategies that came before it in that one CFR run (so strategy 10 billion beats strategy 5 billion and the ones before it, subject to a little noise), and they also tend to do better against fixed benchmark opponents (the 10 billion iteration strategy wins more against our 2010 competition entry than the 5 billion iteration strategy does). I don't think we've ever seen a counterexample to this in practice, although I don't think that's guaranteed. So in practice, the longer you run CFR, the better you do in one-on-one matches.

The third evaluation metric is what you're asking about. For any one of the abstract game strategies, we can evaluate how well it does in the real game against an optimal opponent. So the projection is the other direction from what you mentioned; we're not bringing the unknown real game equilibrium down into the abstract game. Instead, we're bringing the abstract strategy up into the real game, and evaluating it there. And if we evaluate several strategies from one run of CFR, we can make those graphs to see if it's getting better or worse over time.

So far, intuition probably suggests that the longer you run CFR, the better the strategies would get in the real game. That was what we expected until the 2009 abstraction pathologies paper, where we found counterexamples in the tiny game of Leduc hold'em, and it wasn't until this 2011 accelerated best response paper that we could actually verify whether or not this would happen in large games like Texas hold'em. You described it as the solver "accidentally" finding a good real-game strategy on the way to the good abstract-game strategy. Here's a different interpretation of that that makes it seem not accidental.

The abstract game is always going to be strategically different from the real game. We try to make it as close as we can, but there's always going to be some differences. For example, if neither you or your opponent understand board texture perfectly (like is the case when we run CFR with our current abstractions), you might be able to improve your strategy in the abstract game by relying on that difference. And while CFR is running, all it's trying to do is find an equilibrium in the abstract game: it doesn't even know that the real game exists, or that it's going to eventually get evaluated there. So when we first start CFR, it learns to improve its strategy in ways that are effective in both the abstract game and the real game. That's the early downwards slope, where exploitability decreases quickly. Eventually, as it continues to tweak its strategy to adapt to the intricacies of the abstract game, it makes changes that get it closer to an abstract equilibrium, but maybe aren't effective in the real game. This is a form of overfitting: the strategy starts getting so good at the abstract game (that we're trying to learn to act in) that it becomes worse in the real game. In studying this more after the paper was published, we've found that the curve doesn't keep getting worse indefinitely. Given enough time, it hits some upper level of exploitability and stays around there. But the lowest point on the curve is still near the start.

We know that it's not just luck that brings it to this low point early in the run; this effect is repeatable, and we get very similar curves every time we run this experiment on different abstract games.

But the end result is still kind of unclear. If all you cared about was finding a strategy with the lowest possible exploitability (and that's a nice well defined goal), then you might want to study this effect to try to learn where the lowest point will be and use that strategy. But like I mentioned earlier, in one-on-one matches, the longer you run CFR the better the strategies are. Even though these strategies have gotten worse in terms of real game exploitability, in practice they keep getting better in one-on-one matches against the earlier versions and against fixed opponents. So if you want to make strategies to submit to the annual computer poker competition (like we do), then the answer still appears to be to run CFR for as long as you can.

Another twist on this is the over-agressive strategies that we mention in the ABR paper. Even though these aggressive strategies are less exploitable in the real game than the normal equilibrium approximations, the aggressive strategies lose by a small margin in one-on-one matches against the normal equilibrium strategies. But against a weak opponent, the over-aggressive strategies win a lot more than the normal equilibrium strategies. So making the decision of which strategy to use is not as simple as just picking the least exploitable one.

Hope that helps. Thanks for the question, and let me know if you still have questions about this.

09-07-2011 , 01:56 AM

#91

gaming_mouse

Carpal \'Tunnel

Join Date: Oct 2004 Posts: 13,786

FullyCompletely,

Just want to say thanks for taking the time to do this thread. Your answers are always well-written, and do a great job of making complex ideas understandable.

09-07-2011 , 05:44 AM

#92

garrondo

newbie

Join Date: Sep 2010 Posts: 21

@FullyCompletely: The 'overfitting' phenomenon is interesting. Does it also occur with perfect recall abstractions?

09-07-2011 , 03:53 PM

#93

AmyIsNo1

journeyman

Join Date: May 2011 Posts: 304

Mike,

Is there any theory about the exploitability of a combination of individual strategies? Let's say we have calculated two different near-equilibrium strategies (with different bucketing schemes). They each have an exploitability of 100 mb/hand. Then when playing, before each hand starts we randomly choose which strategy to use for that hand (and doesn't tell the opponent which one).

Wouldn't the exploitability of this combined strategy be much lower than 100 mb/hand? What if we had ten or a hundred strategies to choose from?

09-07-2011 , 03:56 PM

#94

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

gaming_mouse - Thanks! It's been fun from my side, too. The questions that people are asking are really good.

garrondo - Yes, so far we've found a similar overfitting curve for every type of abstraction that does more than just remove suit isomorphisms. Perfect recall and imperfect recall both suffer from this; there's a graph in the Accelerated Best Response paper I linked to earlier that shows an example of overfitting in both of those cases.

The abstraction that just removes suit isomorphisms in Texas hold'em is far too large to solve, but we can use smaller poker games (like 2-round 4-bet hold'em, which just has a preflop and a flop) to test the effects of abstraction there. If we use abstraction in that small game to group together flops based on hand strength, then we get an overfitting effect there as well.

09-08-2011 , 04:00 AM

#95

Herald

newbie

Join Date: Jun 2007 Posts: 30

Quote:

Originally Posted by AmyIsNo1

Hi AmyIsNo1,

I'm a former member of the CPRG, and I actually did write a tool that converts some logs that are very similar to the ones the poker competition uses into full tilt style hand histories.

I've made some modifications to the scripts to make them work for these logs and I've sent the scripts on over to Mike to put somewhere for the community to use, so you should be able to get them in a day or two.

Cheers!
Herald

09-08-2011 , 06:18 AM

#96

AmyIsNo1

journeyman

Join Date: May 2011 Posts: 304

Quote:

Originally Posted by Herald

Great, thanks!

09-08-2011 , 11:46 PM

#97

BetaPro

adept

Join Date: Apr 2008 Posts: 1,033

that would be cool, thx

09-09-2011 , 06:29 PM

#98

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Hi AmyIsNo1,

Quote:

Originally Posted by AmyIsNo1

Is there any theory about the exploitability of a combination of individual strategies? Let's say we have calculated two different near-equilibrium strategies (with different bucketing schemes). They each have an exploitability of 100 mb/hand. Then when playing, before each hand starts we randomly choose which strategy to use for that hand (and doesn't tell the opponent which one).

Wouldn't the exploitability of this combined strategy be much lower than 100 mb/hand? What if we had ten or a hundred strategies to choose from?

That's an interesting idea.. Mixing between two agents like that shouldn't make the resulting strategy any more exploitable than the weighted average of the two individual strategies' exploitabilities, and you might be able to improve. It wouldn't be guaranteed to do better if the same counterstrategy worked against both, or if the strategies were dissimilar enough such that only one strategy would go down some line of play, but it could help.

We had a similar intuition for the 2008 Man-vs-Machine match, where Polaris switched between five strategies during the match. It wasn't a mixture like you're describing, because only one strategy was used on each hand, and the probability of choosing each strategy changed during the match depending on which strategy was predicted to be the strongest. But part of our reasoning was that even if the players were able to beat each strategy individually, that they wouldn't be able to use the same counterstrategy for one strategy against the ensemble of strategies.

We evaluated each of the five Polaris2008 strategies for the ABR paper, but since Polaris changed over time, we weren't able to say anything about how exploitable the system as a whole was. But it should be possible to try running ABR against an even 5-way mix, with each strategy being used 20% of the time, to see how exploitable that mixture is. I'll put it on the to-do list, because that'd be an interesting number to have. Thanks!

09-09-2011 , 06:32 PM

#99

FullyCompletely

enthusiast

Join Date: Mar 2009 Posts: 64

Quote:

Originally Posted by Herald

I've made some modifications to the scripts to make them work for these logs and I've sent the scripts on over to Mike to put somewhere for the community to use, so you should be able to get them in a day or two.

Hi everyone,

I forwarded Herald's script on to the competition organizer. He's going to have a look, and will probably wind up hosting it on the competition website. When it's available, I'll post a link to it. It'll probably be a few more days, though - sorry for the delay.

09-12-2011 , 11:23 AM

#100

DoctorBaboon

newbie

Join Date: Jun 2011 Posts: 18

Quote:

Originally Posted by FullyCompletely

...
But the end result is still kind of unclear. If all you cared about was finding a strategy with the lowest possible exploitability (and that's a nice well defined goal), then you might want to study this effect to try to learn where the lowest point will be and use that strategy. But like I mentioned earlier, in one-on-one matches, the longer you run CFR the better the strategies are. Even though these strategies have gotten worse in terms of real game exploitability, in practice they keep getting better in one-on-one matches against the earlier versions and against fixed opponents. So if you want to make strategies to submit to the annual computer poker competition (like we do), then the answer still appears to be to run CFR for as long as you can.

...

I think your explanation makes sense - it's a very interesting effect. Perhaps this effect could be used to design different abstractions that more closely approximate the game (within the same resource limitations). I'm guessing if the size of this effect is smaller that might be an indication that the abstraction is better. (Though of course, another measure is simply the exploitability of the final strategy, which is easier to measure).

A few more related questions: how early in the run does the strategy reach its maximum real-life unexploitability (compared to stabilizing near the abstraction game GTO)? The figures that you gave for Hyperborean 2011 (around 5-6BB/100 loss in the worst case, if I remember correctly) - are those for the version that was made to run way past the peak point, in preparation for the computer competition? If so, then what was the exploitability at the peak point? If not, then what was the exploitability of the competition versions? Also, when you gave the figures for _tbr, how did you measure it given the fact that,unlike _iro, it adapts its strategy over time?

Page 4 of 7

First

1 2 3 4 5 6 7

Last

...

Page 4 of 7

First

1 2 3 4 5 6 7

Last