Open Side Menu Go to the Top
Register
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot

05-11-2015 , 01:01 AM
Quote:
Originally Posted by WCGRider
Also to answer another question, Ive been asked pretty constantly if I am going to do this again. I am most likely not going to. It was way too hard for me and I took too much of a hit in what my expected value is elsewhere, not that 210/hr is bad, it just isnt really on the map with where I think I could most effeciently spend my time for how hard I had to work.
If they improve the interface and implement multi-tabling, you might be able to get in, say, ~5x as many hands in per hour (especially given how much time is wasted on river tanks). Also, instead of having 4 humans participate at 20k hands each, they could conceivably ask 8 humans to participate at 10k hands each.

With a couple simple changes like this, you could conceivably be looking at a ~$1k/hr 1-to-2-day commitment instead of a $210/hr 2-week commitment.

I hope CMU considers this in order to continue to attract top players, in order to improve the viewer experience, and in order to reduce the effect of human fatigue on their experimental results.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 01:52 AM
I have a theoretical game theory question.

My understanding is that if a pair of strategies, (S, T), are in Nash Equilibrium, then T loses no edge against S by purifying any subset of its random decisions.

For example, in rock-paper-scissor, S = T = uniform-random represents a Nash Equilibrium (S, T). And so T can purify its decisions (e.g., switch to 100%-rock), and its EV against S remains unchanged. Of course, this new T' is now exploitable by some other S' (e.g., 100%-paper), but its EV against S is still maximal.

I'm trying to wrap my head around the implications of this for NLHE GTO. Say my GTO opponent goes all-in, and say GTO prescribes that I should not fold 100% of the time with the hand I am holding. My stated property of Nash Equilibria seems to imply that I can then simply call 100% and achieve maximum EV. Is this right?

If so, it seems that some of the humans' tank-decisions on the river during the match were somewhat misguided. If you assume the bot is exactly GTO, then you simply need to decide if you believe that GTO dictates that you should call with probability >0. Once you make that determination, you should just call. Now, if you relax the assumption that the bot is exactly GTO, then perhaps you have some legitimate tanking to do, to try to ascertain the direction of the bot's bias relative to GTO in this particular spot. But that's a somewhat technical exercise that should be based on knowledge of the bot's abstraction leaks; typical poker hand-reading deductions would seem to be irrelevant apart from the initial determination of whether your call probability should be nonzero.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 02:00 AM
^^ yes, isn't that just another way of saying if you are indifferent, it doesn't matter what you do unless and until opponent modifies his strategy to exploit you?

In practice though, some hands will be indifferent, and some hands will perform better or worse than indifference, because of card removal, which RPS doesn't have.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 02:05 AM
god damn you're sexy doug
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 02:32 AM
Quote:
Originally Posted by dodgybob
^^ yes, isn't that just another way of saying if you are indifferent, it doesn't matter what you do unless and until opponent modifies his strategy to exploit you?
I think what I'm saying is slightly different.

I'm suggesting that when facing an all-in bet from a GTO opponent, you can ask yourself whether it's possible that 100% fold is the correct GTO decision for you. You might be able to rule that out by reasoning that if you were 100% folding here, then you would be exploitable. Once you make that determination, then you can just call, knowing that you are maximizing EV by doing so.

But after further reflection, I'm thinking that maybe there really aren't that many spots where both (1) calling is not obviously correct, and (2) 100% fold can be logically ruled out via exploitability considerations. So maybe opportunities to take advantage of this property of GTO are pretty rare.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 05:33 AM
Quote:
Originally Posted by otp
But after further reflection, I'm thinking that maybe there really aren't that many spots where both (1) calling is not obviously correct, and (2) 100% fold can be logically ruled out via exploitability considerations. So maybe opportunities to take advantage of this property of GTO are pretty rare.
Yes. As calling on the river ends the game, there will only be at most one hand (up to suit isomorphism and non-playing cards) where the decision between calling and folding will be random. This is easy to show: Say the board reads A349T rainbow, you are confronted with an all-in bet (so raising is off the table) and GTO say you should randomize your calling with AQ. Then it is clear that all hands up to AJ are 100% folds and all hands AK and above 100% calls, otherwise you could gain by trading AQ- calls for AQ+ folds while keeping total the calling-percentage fixed. (Note that the normal order of hand strength might be influenced by card removal and range-considerations, so it might well be that A5 ranks higher in this particular situation than some A6+ hands due to the straight-blocking 5, but this does not invalidate the argument).

ignatius
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 06:14 AM
Quote:
Originally Posted by TimTamBiscuit
Theoretically you are right but in NLHE the game tree is too big to solve this way so instead several fixed sizes are used to simplify the game tree and then solved. This is kind of what Humans do, too since Humans have to simplify bet-sizes because the game is too hard otherwise.
The key difference here IMO is that humans are capable of thinking about the proper betsize in terms of higher and lower, while the currently used algorithms are not. From what I understood, Claudico doesn't do any betsize-optimization at all - the possible sizes as well as the (probabilistic) reverse mappings are chosen by the programmers (maybe situation dependent, maybe not) and not altered during the learning phase. So the bot is reduced to choosing between several (very likely already sub-optimal) sizes, which are not even treated as numbers but as discrete, value-less choices.

This becomes even more deadly when the history of discretized reverse mapped standard bets is used to implicitly define the remaining stack-size, so the implicitly assumed SPR and the actual SPR deviate exponentially, so any multi-street strategies the bot found during its leaning phase can get seriously distorted.

ignatius
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 07:00 AM
Quote:
Originally Posted by Ignatius
so any multi-street strategies the bot found during its learning phase can get seriously distorted.
Humans are pretty clever at observing the patterns of multi-street betting, too. Often, for example, Humans may infer on the river that certain hands that most likely would have bet the turn and yet didn't and so are to be discounted if facing a river bet. Bots find this much harder to figure out.

Computers are excellent at brute force calculation, humans not so much. Humans are exceptional at pattern recognition, computers not so much. Humans should not give up the fight vs AI just yet. While chess is much harder for humans than poker, the reverse applies for computers as the hidden information blocks a brute force calculation and forces algorithmic approximations of human pattern matching. That is also why the algorithmic approximations themselves inherently create weaknesses to be exploited: because at this stage AI pattern recognition is too crude.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 07:03 AM
Random thoughts:
  1. What a cool idea, and a great match. I loved what little of it I was able to watch.
  2. For years, I've hoped that bot developers would discover unknown (perhaps seemingly heretical) truths about poker. Stuff that would be the poker equivalent of "Wait, you mean the sun doesn't revolve around the earth?" [1] It looks like it's finally happening and I'm pretty damned excited about it.
  3. Massive congrats to NVG for the most interesting, flame-free, highest S/N discussion I've seen here since I can remember.
  4. I was so disappointed in Tuomas's apparently ego-driven defense of the results. As others have pointed out, Claudico played stupendously good HUNL; it just got beaten by a bunch of world-class best-of-the-best.
  5. I really hope this continues - if the developers continue their project, I'm happy to escrow $100 that the AI beats the best available in five years.

Regards, Lee

[1] I know SFA about HUNL, but do experts limp on the button?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 07:15 AM
Quote:
Originally Posted by Lee Jones

[1] I know SFA about HUNL, but do experts limp on the button?
Rarely. I do recall when Doug/Sulsky played their challenge that Sulsky used a limp strategy, but I don't think it had great success.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 08:52 AM
Quote:
Originally Posted by TimTamBiscuit
Computers are excellent at brute force calculation, humans not so much. Humans are exceptional at pattern recognition, computers not so much. [...] That is also why the algorithmic approximations themselves inherently create weaknesses to be exploited: because at this stage AI pattern recognition is too crude.
Maaaaaaayyyyyyyyybe what you're saying is true in the context of poker but saying that AI pattern recognition is too crude is a very bold statement ...
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 09:41 AM
Quote:
Originally Posted by Lee Jones
[*]I really hope this continues - if the developers continue their project, I'm happy to escrow $100 that the AI beats the best available in five years.

I will take this bet.


Please escrow to my pokerstars account. If you happen to be right in 5 years I will send you $200 back. If the above terms are not met then it is a push. Please do not welch on this bet.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 09:45 AM
Quote:
Originally Posted by Ignatius
The key difference here IMO is that humans are capable of thinking about the proper betsize in terms of higher and lower, while the currently used algorithms are not. From what I understood, Claudico doesn't do any betsize-optimization at all - the possible sizes as well as the (probabilistic) reverse mappings are chosen by the programmers (maybe situation dependent, maybe not) and not altered during the learning phase. So the bot is reduced to choosing between several (very likely already sub-optimal) sizes, which are not even treated as numbers but as discrete, value-less choices.

This becomes even more deadly when the history of discretized reverse mapped standard bets is used to implicitly define the remaining stack-size, so the implicitly assumed SPR and the actual SPR deviate exponentially, so any multi-street strategies the bot found during its leaning phase can get seriously distorted.

ignatius
Very well put. I'm guessing that the KdTx hand or the 99 hand suffered from this exponential divergence between implicitly assumed SPR and actual SPR.

It seems that this lends itself to a nifty implementation idea. When faced with a decision to make the last call of the hand (either a river non-all-in, or a pre-river all-in), the bot should break out of its look-up table (which might suffer from an incorrect implicit SPR), and instead explicitly construct its opponent's range (which had been updated upon each opponent action during the hand, but only implicitly). Then it can do a simple pot-odds computation against that range to alter the decision prescribed by the look-up table.

Furthermore, in this process, the bot can correct for loss resulting from the betting abstraction (which Sam describes here). For game tree look-up's, the bot needs to snap an opponent's bet-size into an appropriate bucket and proceed down the game tree accordingly. But when it break out for the call decision, it can retroactively "un-snap" those bucketizations for the purpose of opponent range deduction.

To illustrate, suppose the opponent min-bet on the turn, and the bot interpreted that as 80%-check/20%-bet. Suppose the look-up table implicitly states that KhJh checks the turn 100%, and suppose that the randomized bucketing rounded the min-bet to a bet. Rather than place 0 probability mass on KhJh at range-construction-time, the bot can do a p -> 0.8p Bayesian update.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 10:20 AM
Quote:
From what I understood, Claudico doesn't do any betsize-optimization at all - the possible sizes as well as the (probabilistic) reverse mappings are chosen by the programmers (maybe situation dependent, maybe not) and not altered during the learning phase.
Not quite that simple. See, for example, my paper on bet sizing ("parameter optimization"). It's pretty technical but if you skip to the end there's an experiment on bet sizing in HUNL. It found the optimal opening size to be about 2.5x (if you only use one), which is why we went with that.

Last edited by NoamBrown; 05-11-2015 at 10:27 AM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 12:30 PM
Quote:
Originally Posted by otp
Very well put. I'm guessing that the KdTx hand or the 99 hand suffered from this exponential divergence between implicitly assumed SPR and actual SPR.

It seems that this lends itself to a nifty implementation idea. When faced with a decision to make the last call of the hand (either a river non-all-in, or a pre-river all-in), the bot should break out of its look-up table (which might suffer from an incorrect implicit SPR), and instead explicitly construct its opponent's range (which had been updated upon each opponent action during the hand, but only implicitly). Then it can do a simple pot-odds computation against that range to alter the decision prescribed by the look-up table.
I would guess that Claudico does exactly that - and probably even more. In fact, I assume that, besides calculating the implicit ranges from the previous action, the bot then directly solves the resulting river game with a finer grained abstraction and more betsizes. - Why else would it tank so long?

However, at that point the damage is already done, as the assumed ranges, which uses the tabled pre-river strat, also got distorted. So it may play the river near perfectly but this does the bot no good as the perfection is for the wrong game as the computed opponent starting range is not in line with the games strat it has been computed from.

Quote:
Originally Posted by otp
Very well put. I'm guessing that the [URL="http://forumserver.twoplustwo.com/showpost.php?p=46837257&postcount=1061"]
Furthermore, in this process, the bot can correct for loss resulting from the betting abstraction (which Sam describes here). For game tree look-up's, the bot needs to snap an opponent's bet-size into an appropriate bucket and proceed down the game tree accordingly. But when it break out for the call decision, it can retroactively "un-snap" those bucketizations for the purpose of opponent range deduction.
This wouldn't be too hard and can be done for an overhead of 2^n with n being the number of the opponent's odd bets, so I would assume that Claudico might in fact do so as it's reasonably cheap. It would however only mitigate the damage. The range-diversion would still be exponential in n, however the base would be somewhat smaller.

It also cannot make good for errors that happened on earlier streets - it can, at best, only avoid to lose more. A play which might make sense with a turn-SPR of 4 might be a considerable leak if the actual SPR is 8 - which can already happen if three opponent betsizes up to that point have been only about 20% lower than the internal standard sizes.

The bottom line is that the current methodology does not allow the bot to effectively manipulate the pot-size, which is one of the most important skills in a multi-street game like NLHE. A way to improve the situation would be to use betsizing abstractions with bucket for SPR (along with some action details), but real progress in NL would probably require a new paradigm which uses parameterized continuous probability distributions instead of distinct nodes and probability values. This would also be more useful for non-Poker uses and of high scientific value in its own right.

ignatius
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 01:13 PM
Quote:
Originally Posted by NoamBrown
Not quite that simple. See, for example, my paper on bet sizing ("parameter optimization"). It's pretty technical but if you skip to the end there's an experiment on bet sizing in HUNL. It found the optimal opening size to be about 2.5x (if you only use one), which is why we went with that.
Very interersting read, thank you!

So this new method basically allows you to more efficiently scan through a number of candidate bet-sizes, as you don't have to run the CFR from scratch each time. Nice. Does it also work for CFR+ (i.e. when truncating negative regrets with 0)? Or do you use CFR+ anyway?

Alas, as is the problem with all battle plans: They tend not to survive the first contact with the enemy, so for NLHE this is really only good for sizing the opening bet (as you did). Why, btw., did you restrict yourself to only one betsize? Computation costs? Did it turn out that using different opening betsizes didn't make much of a difference? Or did you fear that - given other unavoidable imperfections due to abstractions - it might give a human player too much of a handle for exploitive play?

ignatius
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 01:37 PM
I agree that Claudico likely computes explicit ranges at the river and solves a finer grained abstraction. In fact, Sam's endgame paper indicates that it does just that.

But when (approximately) solving the river game at the start of the river and then executing it, betting abstractions can still lead to divergence between implicit and actual (ranges and pot ratios) within the new game tree. My suggestion to recalibrate a lookup-table-dictated call/fold to a fold/call is independent of that.

Furthermore, my suggestion applies to pre-river-all-in and river-non-all-in decisions. In my opinion, some of the hands I mentioned, like the KT hand and the 99 hand, as well as another poster's mentioned 5-high call on the river, seem to indicate that Claudico is not doing what I'm suggesting.

Last edited by otp; 05-11-2015 at 01:49 PM.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 02:25 PM
I wonder if the bot's own timing tells had any influence, especially in the first few sessions.

The bot never tanked with a very strong hand, yet still bluffed very often when it tanked.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 05:59 PM
Doug,

Do you feel as if you've improved your game in any way or learned anything stratwise about the game that you didn't already know and will sort of implement into your overall game after this whole experience? Or was your strat/poker mind when playing Claudico very Claudico specific?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 07:06 PM
Quote:
Originally Posted by otp
But when (approximately) solving the river game at the start of the river and then executing it, betting abstractions can still lead to divergence between implicit and actual (ranges and pot ratios) within the new game tree.
True, but with finer grained betsizes and no further streets, the effect should not be severe. Even then, it would be trivial to apply the same method (calculating ranges and solve the game) for the river-after-the-first-opponent-action-game (which would be smaller still).

Quote:
Originally Posted by otp
My suggestion to recalibrate a lookup-table-dictated call/fold to a fold/call is independent of that.
Sure.

Quote:
Originally Posted by otp
Furthermore, my suggestion applies to pre-river-all-in and river-non-all-in decisions.
In my opinion, some of the hands I mentioned, like the KT hand and the 99 hand, as well as another poster's mentioned 5-high call on the river, seem to indicate that Claudico is not doing what I'm suggesting.
Maybe somone from the Claudico-team can enlighten us, on whether the bot actually does use reverse-map-weighted hand-ranges (which is what your suggested method amounts to).

For pre-river all ins, it seems obvious that they didn't put such a pot-odds-based sanity check in, even though it would have been trivial to do so, as the bot folded a draw to the nuts on turn in the 99 vs. A4s hand despite proper odds. Again, we can only speculate on the reasons. Maybe they feared that it would either make the bot too slow (if done regularly) or give away timing tells otherwise. OTOH computing the range alone should not have been too expensive (even with rev-map-weighted ranges) and can be computed on the fly as the hand evolves. Maybe they were overconfident and would have considered such an extension inelegant as they would no longer have been able to claim that the strategy was completely implicitly determined by the learning algorithm.

ignatius
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 07:45 PM
Quote:
Originally Posted by Sam Ganzfried
There are a lot of variance-reduction techniques we'd like to try, such as the one you describe. We decided to not use anything beyond duplicate/AIEV for the competition, since it would be impossible for anyone else to verify the accuracy of more sophisticated approaches that depend on looking at our own strategies.
So if you have tried this approach (equity-chopping showdowns based on the bot's weighted hand distribution) before, how much does it reduce the standard deviation? It's understandable that you didn't want to over-complicate things for the competition, but is this something you can do after the fact? If you are really concerned about external verification, it's possible to set up a server with Claudico's strategy and make it available to a referee.

Thanks for providing details about the abstraction, it's interesting that you were able to perform better using a smaller asymmetric abstraction.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 09:36 PM
Quote:
Originally Posted by Ignatius
The bottom line is that the current methodology does not allow the bot to effectively manipulate the pot-size, which is one of the most important skills in a multi-street game like NLHE.
Computers have solved Limit Holdem HU but limit multi-street betting does not have the geometric growth in pot size that NLHE has. Geometric potsize growth has the effect of greatly magnifying early street errors in betsize and in hand range and hence is suggestive of the need for different algorithms to solve NLHE HU.

In particular, it would seem more important to have as many buckets as computationally practical on pre-flop and flop before reducing number of buckets on turn and river. Secondly, as you say, NLHE involves multi-street strategies to manipulate potsize that do not occur in Limit. Thirdly, actions on flop and turn in some spots are constrained by the need to not bloat the pot so as not to face a difficult pot odds decision with a weaker hand if opponent shoves river. Such checking actions on earlier streets caps river ranges in logically deductable ways so multi-street actions interacting with board runout inform accurate hand reading for Humans. These action tells are more reliable in NLHE than in Limit due to the geometric potsize pressure.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-11-2015 , 10:56 PM
If anyone is curious, here is some second-hand evaluation of the significance. Using the Professor's session-by-session results:

Quote:
Originally Posted by NoamBrown
we can eyeball the amount won each session, and then use these 26 data points to calculate significance. This per-session calculation has the same expected significance as a per-hand calculation. That doesn't mean it will give the same actual significance, as per-session calculation unavoidably has more error. But it's the best we can do with the data we have.


Proceeding on this path, I get these outcomes for the 26 sessions:

95000
50000
-7000
-80000
6000
100000
20000
143000
-122000
65000
100000
90000
185000
-75000
35000
-15000
100000
-95000
-25000
115000
63000
-70000
32000
35000
-15000
8000


and using these figures, we get:

mean = 28384
stdev = 77640
t-stat = 1.864
p-value (two sided) = 93.8%

which matches closely with the official "above 90%, but not quite 95%" figure. Hopefully, if anyone is skeptical about the evaluation, this provides a bit more comfort with it.
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-12-2015 , 12:48 AM
Quote:
Originally Posted by Poseidon65
p-value (two sided) = 93.8%
And using Hypothesis Humans are better than AI, one-sided p-value = 0.03, Confidence interval 96.9%
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote
05-12-2015 , 11:11 AM
Sorry if it was already asked, but is this bot available on the internet for anyone to play against it ?
WCGRider, Dong Kim, Jason Les and Bjorn Li to play against a new HU bot Quote

      
m