Really appreciate your support guys. I am almost done here, so lets wrap things up a little bit.
In my previous post I have briefly described a more systematic approach to finding other clones of a known bot. It was a 3 step process that included calculating the distribution of stats, finding the outlier ones and creating a spreadsheet with all the frequent players on a Pokersite. Then its enough to filter the spreadsheet for the outlier stats allowing for a margin of error. The list of suspects will be really short. Once we have the list of suspects its time to calculate how different they are.
Lets go back in time a few years to 2010.
http://www.pokernewsdaily.com/bot-ri...erstars-13513/ PTR uncovered a NLHE bot ring with the help of players. The PTR guys have written down all the major stats for the alleged botters, calculated the Euclidean distance and came up with a very rare outlier stat to further proof the bot theory (the outlier stats was raise/folding from BB if there were limpers, it was over 99% for all the now banned botters). Sounds familiar?
A much more interesting challenge is finding botters if we dont have any idea what they may look like, but that can be also done fairly easily as long as there are at least 2 of them in the player pool.
Once again I would create a giant spreadsheet with stats for everyone that played above a certain number of hands. Now I would substitute the stats for the percentile ranks.
If a stat has the normal distribution then its much more likely for a couple of guys to be within the the blue part, then in the red part of the picture below simply due to large size of population in the blue part. Outlier stats once again!
I mean having the C/R flop within 2% of another guy is one thing if its a common value, and a lot of accounts have it so close, and its another thing if its also within 2% but its an outlier value shared by only a couple of accounts.
I think the method I have used in my OP is good also, but the one with percentile ranks would be way, way better. I have considered standardizing the stats in few other ways but the one with percentile ranks is by far the best one. The question, what kind of stats to use is an interesting one. Better not to make such suggestions in public.
So now that we have a standardized spreadsheet for everyone its time to compare accounts in pairs (calculate the squared distance). Afterwards we sort the list bottom-top. I dont know what kind of value would mean that its 2 botters and what value would mean that its 2 humans playing similar by coincidence, but there is one thing I do know, the pairs that made it to the top are definitely worth looking at. If the top looks like account A similar to B, B similar to C and A similar to C then its very likely that its a cluster of bots.
My conclusion:
I have talked to a guy that is knowledgeable about database coding and creating software that would do what I have described above would be really cheap and simple. Once again I dont want to go into any specifics, but the possibilities are endless and the amount of common stats that converge quickly is fairly big also. There is no such thing as human stats or botter stats, but the thing is that running the same bot on 2 different accounts will produce a lot of identical stats and that is why its so useful to look for clones.
I know that the sites catch plenty of botters and have really sophisticated systems of detecting them, but it looks like they lack tools for proactive statistical analysis of the field. If they had such tools, the 2010 case or the one here with PLO bots would never happen. Or maybe the sites do have such tools but not good enough ones and really need to step up their game. I strongly believe that online poker can be extremely safe due to the high transparency and enormous amounts of data on everyone.
Last edited by Schwein; 06-22-2015 at 04:17 AM.