Several people pointed out a few particularly problematic hands for Claudico from the competition. For one of them, we apparently called a small river bet with 5 high. I didn't see this hand, but the reason for why we could do this was described in my prior post,
http://forumserver.twoplustwo.com/sh...postcount=1500. We use a randomized "action translation" algorithm that maps a bet probabilistically to one of the sizes in our abstraction. If the opponent bets a size smaller than the smallest size in our abstraction, we will map it down to a check/call with some probability, and our protocol will force us to check/call for the next action when this occurs.
Here is the example I gave in my prior post for how this approach works. You can read more information and see a comparison to other approaches in the paper,
http://www.cs.cmu.edu/~sganzfri/Translation_IJCAI13.pdf.
Basically, it assumes that opponent will call a bet of size x with probability 1/(1+x), which comes from pot odds. The main mapping we use is
f(x) = [(B-x)(1+A)]/[(B-A)(1+x)],
where A, B are sizes in the abstraction, and x is the size taken by the opponent.
For the flop minbet example, he bets 100 into pot of 500. So x = 0.2. The closest actions we have are A = 0 (for check) and B = 0.25. Plugging these in to the formula give f(x) = 1/6 = 0.167. This is the probability we map his bet down to 0 and interpret it as a check. So we pick a random number in [0,1], and if it's above 1/6 we interpret the bet as 0.25 pot, and otherwise as a check.
People have also brought up several other problematic hands -- most notably a hand where I think we had A4o and folded preflop after putting in over half our stack to a human's 99. On a different hand we had KT vs A2 and folded to a shove on the turn after putting in about 3/4 of our stack despite having a pair and FD (maybe top pair even, I forget the specifics). I looked at the log in detail for both hands during the competition, and the problem for both was due to the translation issue described above. For the A4 hand, we had mapped the opponent's 3bet or 4bet down to a smaller size, which caused us to look up a strategy for ourselves that had been computed thinking that the pot size was much smaller than we thought it was (I think we thought it was 7k and it was actually 10.5k). These translation issues can get magnified further as the hand develops if we think we have bet a percentage (say 2/3) of the (correct) size of the pot, while the strategies we have precomputed assumed a different size of the pot.
For the KT vs. A2 hand, the issue was similar. Here, the human made a 3bet on the flop which was slightly below the smallest size we had in our abstraction in that situation, and we ended up mapping it down to just a call (I believe it was just mapped down with 3% probability in that situation, and so we ended up getting pretty "unlucky" that we mapped it in the "wrong" direction). This ended up causing us to think we had committed far fewer chips to the pot at that point than we actually had.
I went over the log files for these two specific hands with Doug in person after the competition had ended (I think some of the other humans were present too), and he agreed actually that our lines with both the A4 and KT hand were reasonable had the pot size been what our computed strategies perceived it to be at that point. Of course, we both agree that the hands were both major mistakes if you include the misperception of the pot size. Even though these were only low probability mistakes due to randomization outcome used for the translation mapping, these types of mistakes can really add up over time, particularly if playing against humans who are aware of them and actively trying to exploit them. Doug alluded to this point as well in his interview,
http://www.highstakesdb.com/5793-exc...ider-polk.aspx.
We actually became aware of this problem (partially based on those particular hands) mid-way during the competition, and decided to switch the translation mapping to a deterministic one for some of the sessions (the one called "deterministic pseudoHarmonic" described in the paper linked above). This one is likely far more exploitable than the randomized one, but we did not think the humans were focusing much energy on exploiting the leaks in how we responded to bet sizes. It turns out that I was correct in thinking this, and in subsequent discussions with the humans the said they weren't putting much effort into doing this, other than the frequent "minbets" that they thought might be problematic for us by pushing us off of our tree somehow.
Obviously this is a big leak of our agent that would need to be improved in the future. Based on Doug's interview it seems that he views this as the biggest leak currently as well, and it will be very interesting to see what improvements can be found, and whether those can be exploited in turn by good countermeasures.
One benefit of the "endgame solver," which computed the river strategy in real time, is that it solved that "off-tree problem" I just described. I give an example for how it does this in Section 3.4 of the paper,
http://www.cs.cmu.edu/~sganzfri/Endgame_AAMAS15.pdf. When the river is dealt, we correct the pot size from our perception of it to the correct amount before computing our strategies, so any disparity that developed earlier in the hand will be corrected. This is one of the key benefits of the endgame solving approach, and I expect real-time computation to play a pivotal role in future research in this area. Other benefits and full details of the algorithm are described in the paper.
Some people commented on the running time of the algorithm (it averaged 15-20 seconds per river hand for most of the competition). The algorithm did not necessarily require that much time. We could have modified it to take less time, at the expense of the degree of card bucketing and/or number of bet sizes used. I realize it was somewhat frustrating to the players and the spectators to have to wait so long for each river. If a future competition occurs, perhaps a time limit would be mutually agreed upon in advance.
One limitation of the endgame solving algorithm (that I already knew was problematic, but became especially apparent early on in the competition) is that it does not fully account for card removal when deciding which hands to group together. It looks at the equity of each hand vs. the opponent's perceived range (assuming he had followed our strategies to that point as well), and buckets hands together with similar values. This may result in, for example, grouping together the nut low with a flush blocker together with the nut low without a flush blocker (since both would have equity 0) into the same bucket and not being able to distinguish between the two. This is why the agent sometimes made huge overbets without blockers, which some people pointed out was a mistake. That said, I'll clarify that the algorithm does actually take blockers/card removal into account to an extent -- if two hand have the same ranking, but one has higher equity vs. opponent's range due to card removal than the other, then the algorithm would account for this and potentially place them into different buckets. The humans seemed divided early on as to whether they thought the algorithm was taking into account card removal or not, and the answer is that it was, but just not as fully as possible.
Primarily for this reason, we decided to take out the large bet sizes for ourselves for the endgame solver partway through. Interestingly, Dong told me that they had looked into it and we were actually making money on those big sizes during the time we used them. I think everyone agrees that huge overbets are likely part of full optimal strategies, and likely underutilized by even the best human players. But card removal is also particularly important for these sizes, and I think for an agent to use them successfully an improved algorithm for dealing with blockers/CR would need to be developed, though I'm still quite curious how well we would've held up if we continued with the agent as it was with those sizes in.
A few posters, such as punter11235, claimed that there is better software available on the market for solving endgames given ranges for both players (perhaps software that fully accounts for card removal). I once looked into this and my understanding was that the best tool assumed just one bet size for all situations. While this may work very well for post-mortem analysis of human poker play, it's pretty clear that an agent that assumed just one bet size was available for the opponent would get creamed playing against humans of this caliber. The humans were certainly willing to make very small bets or huge overbets, particularly if they thought our algorithm had a weakness in responding to those. So we opted to use many different bet sizes to protect ourselves from bet-size exploitation (the version we used at the beginning had 8 different sizes for the first river bet, plus fold/check), at the expense of having to use some card abstraction and not fully account for card removal. Some of the humans informed me that there's software available now that uses 2-3 sizes and possibly doesn't use any card abstraction. I still think that using 8 sizes with card abstraction is much better against top humans than just 2-3 without it, though it would be interesting to run comprehensive experiments to test this. So I'm not convinced that there actually exists other software out there that is better for this than the approach we used, though I have not done a thorough investigation, and would be happy to hear from people familiar with the state-of-the-art tools.
I believe punter11235 or someone else also claimed that there exist stronger bots than Claudico. I'm pretty skeptical of this. If the developers of one of these software tools really has a stronger bot, then I don't understand why they wouldn't have submitted it to the computer poker competition. Winning that would certainly boost visibility for their product. I would understand why someone who actually has a bot playing illegally for profit might want to keep a lower profile, and I admit this is a possibility, though I still think it is very unlikely that there exist any bots that are better than Claudico.
I think that with the right team working full-time on it, an agent could be created that beats the top humans in one year. But accounting for the limitations imposed within academia, I would guess that a timeline of 3-4 years is more realistic.
Some people have claimed that the fact that the developers modified the agent during the competition constitutes cheating. I completely disagree. There was no specification in the rules that were agreed upon that it was not allowed. The humans were very aware that we were making modifications, and they frequently commented on this throughout the competition, and you can also see Doug discuss this in his interview. They caught on to most of the changes surprisingly quickly. I have described some of the major modifications above in this post (these have already been brought up in this thread/Doug's interviews, and the approaches are described in publicly-available papers linked above). The humans made significant modifications to their strategies throughout the competition as well. From what I hear, they went over databases and discussed strategy for hours every night. The players in the same room often consulted with each other for key decisions. I personally think it was exciting to include these elements in the competition. But in any case, I think a future event should provide clear specifications as to what modifications are allowed on both sides.
Regarding statistical significance, Noam Brown summed up everything extremely well in his post, and I agree with much of what he said,
http://forumserver.twoplustwo.com/sh...postcount=1501. If I were the one writing the press release, I would have made the title: "Humans beat computers by an amount that is statistically significant at the 90% level but not at the 95% level!" I think it is misleading to flat-out call it a "tie" or even a "statistical tie," but many of the articles gave a pretty fair characterization of the results in my opinion.
I have recently completed my PhD and am not currently employed by Carnegie Mellon.
Last edited by Sam Ganzfried; 05-23-2015 at 01:45 AM.