Quote:
Originally Posted by Lifrordi
I disagree that it's artificial.
I can't see how the most naive translation would not be easily exploitable.
In the example you gave, if the SB opens to 4.5$, we need to map it to either open to 4$ or to 5$, which is an error of 0.5$ a (or 25BB/100).
The error will grow significantly after next player actions in the hand.
Furthermore - while differences between bets can be small, DeepStack computes an unique strategy for any situation possible.
While it can be argued this is not necessary, I don't think that game size is artificial.
I appreciate the reply but your perspective confuses me. From the Deepstack paper:
Quote:
The imperfect information game HUNL is comparable in size to go, with the number of decision points exceeding 10^160 (13).
(13) is a reference to Johanson (2013), the paper that i pointed out above. Johanson clearly shows that modelling HUNL requires an arbitrary decision on how to model bet sizes. Academic researchers follow a convention, which changed from 2009 to 2010 from 10^49 to 10^160 in size -- a drastical difference given that both game trees describe 200bb HUNL. He even chose the word "inflate". That sounds artificial to me given that there is no clear-cut scientific argument for modelling it as the current ACPC convention.
I already covered your counter argument of raising to 4.5 at 1-2 with 400 stacks with making the same point using 5-10 with 2k stacks in my previous post. 5-10 with 2k stacks has a much much lower game size than your convention, but the differences in optimal strategies would be very small and exploitation of a naive translation technique would be infinitesimal. The only spots where going from being able to bet up to 1/10th of a bb to being able to bet up to 1/100th of a bb would lose some EV by cutting out optimal bet sizes that are in between are preflop around the optimal raise size regions and postflop spots with high stack-to-pot ratios -- but the overall lost EV would likely be lower than the lost EV due to numerical stability that can be found in most modern solutions as you pointed out.
I'm not sure how you define error in your example, but the way you put it kind of implies that mapping 4.5 onto 5 50% of the time and 4 otherwise would yield an overall EV loss of 25bb/100 for the bb -- that is definitely incorrect, it would be much lower than that (25bb/100 would prolly be higher than the equ game value for btn). Not sure if you meant it that way, but I can't see any logic supporting that idea if you did.
Using the current ACPC convention and comparing 200bb HUNL to Go without explaining the convention properly seems like an overstatement to me. Take another perspective: Hypothetically, if another independent research team that doesn't need to follow academic conventions came up with a similiar bot, and published the results on their private website claming the have a very close to optimal bot for 200bb HUNL, the largest game ever, far larger than Go, since they model it at 500-1000 with 200k stacks with game size of 10^1367 (number may not be accurate
), wouldn't that strike you as odd, or even unscientific? If someone makes a simliar argument of that game size being an overstatement and the researchers would counter with "well in your abstraction, if btn opens to 2.0005bb and we need to map it onto 2.001 or 2bb, I can't see how that would not be easily exploitable", would that be a satisfying answer?
Another perspective: In online poker, one can usually bet up to 1 cent -- stating that a $1-$2 online cash game with Otb, Sauce and
Quote:
Originally Posted by Lifrordi
Dough Polk
would be way way less complex than the same lineup at $100-$200 (disregarding rake and motivational effects ofc) would be ridiculous.
The way this should be handled is that academic papers should state that the size of 200bb HUNL is around 10^160 -- followed by a footnote that briefly explains that modelling bet sizes (and thus the game size) is not a clear-cut decision and follows a convention, along with some motivation of why the current convention has been chosen in the first place.
Alternatively, or even better additionally, you can add a short discussion of how the traditional measure of game size (counting decision points or info sets) may not be the best way to compare the complexitiy of different games, such as poker and Go and chess. One might need another measure to better capture strategic depth and therefor complexity as games with theoretically continuous action spaces can be modelled arbitrarily, and games with lots of easily detectable (weakly) dominated strategies in the early branches are actually easier to solve than their game size suggests (such as HUNL, where such strategies can be found at the first decision point in the tree, poke at Libratus' play).
Why did the ACPC choose 50-100 with 20k stacks in the first place? While your example might have been part of the motivation to change the 2009 format to have a finer preflop abstraction, why would one ever model it as such a big problem when the added value is so small compared to how it blows up the game tree -- it seems counterintuitive and inefficient to me. The goal of every model is to capture as much as possible without blowing it up; seems like the ACPC just did the latter. Given the timing of the Go and HUNL AI vs human matches, and 160 being a number that is very close to 170 I guess it's a really nice sell for poker researchers -- to me it just seems inaccurate and artificial.
Quote:
Originally Posted by Lifrordi
I believe the incentive structure was chosen quite well.
What structure would you suggest that you think would be more appropriate? [...]
I'm not the right person to ask since I don't know a lot about experiments. I always thought that it's odd that play vs humans is such a highly accepted benchmark/test for optimal strategies in HUNL in the first place. There are a lot of potential problems with choosing human play over limited sample sizes as a benchmark. Figuring out better ways to estimate exploitability for new solutions seems like the more natural and scientific way to go.
Regarding the argument that head-to-head performance has "very little scientific value": I understand that a bot can mainly exploit an implementiation error, or whatever there is to exploit, and thus skew the results, but how can that statement be true and not be applied to the AI vs human matches, which are also head-to-head perfomance measures? I don't see the theoretical agrument why the same problems can't occur with human play. Given that both bots are designed to be essentially hole-less, with Deepstack's "intuition" and Libratus being able to endgame-solve and learn new bet sizes for the early streets overnight, is head-to-head performance really that likely to be skewed? If I got the argument wrong, please elaborate.
How can a measure like LBR that is designed to give a lower bound of exploitability give out negative numbers in the first place? I would really appreciate a short explanation, it's tough to understand from the papers. It seems that the LBR results are interpreted in a way that suggests that Deepstack is much closer to optimal that any other bot -- why can't the alternative interpretation that LBR is a flawed measure for exploitability be true?
Thanks for reading, I know it was a long post.
Last edited by samooth; 03-07-2017 at 01:56 PM.
Reason: typos