TL;DR:
Pictured above: a graph (and a zoomed-in view) plotting yearly volume on the x-axis, and average winrate of the players putting in that volume, on the y-axis. The standard variation of these winrates is around 1bb/100.
INTRODUCTION
In a series of threads starting with this one, I will analyze the current situation of PLO rake using statistical tools. All of this analysis is based on hand histories bought by gui166 (the Brazilian player rep in the recent Pokerstars meeting, from
this thread), who was kind enough to share these HHs with me after I helped him in his analysis. These are essentially all the hands played between March 2012 and March 2012 in PLO100 on Pokerstars (a total of around ten million hands).
As we all know, the sheer variance of PLO makes it impossible to figure out the true winrate of individual players without gigantic hand samples. Therefore, we need to use statistical tools to figure out how true winrates look, to know how many players are beating the games and how significant is the rake.
In this first part I present analysis of the entire player pool, partitioned according to yearly volume. This allows me to get estimates of winrates from samples large enough to have already converged, since I'm pooling dozens or even hundreds of players with similar yearly volume together. On the flip side, this analysis tells us limited information: Suppose we know that the population of players who played between 40k and 50k hands of PLO100 have an average winnings of -0.3bb/100 pre-rakeback and 1.7bb/100 post-rakeback. What does this tell us? This population could include bad players. We don't expect all players to be winning, or even all regs to be winning: as a community, we only demand that a sizable chunk of the best players would be winning. So, keep in mind that the results here are population-wide averages, and do not tell us how many players have decent winrates.
In future installments in this series (hopefully less than a week apart from one another) I will show more sophisticated statistical analysis which will hopefully give us more idea about the actual winrate distribution of the player population. Here is a tentative plan:
Part II: cross-validation analysis of the player pool, or some other method based on partitioning the hands into chunks
Part III: Using de-convolution to get a guess for the true winrate distribution
Additional parts might come later. If this analysis proves useful, my overall future plans are to:
1. gain a good understanding of the rake paid by regs at the PLO100 level.
2. attempt to model and understand the PLO ecosystem (mathematically or otherwise) and the effect of rake on it
3. reach a consensus with the community of how much rake is "fair", both in the sense of fairly rewarding skilled players and keeping the ecosystem healthy.
4. extend this analysis to different stakes as well as different games (such as NLH or LHE), perhaps by initiating a cooperation with some of the hand-tracking websites, or by raising money from the community to buy hand histories.
I would like to solicit ideas from the 2p2 community about the content and future plans in this thread. I will be happy to do any data crunching that community members are interested in (limited by my programming time, of course), so if you think of interesting ways to analyze the data, please do tell. I have very little idea on how to approach the task of modeling the poker ecosystem (item #2 above) so I would encourage anyone who is interested in helping to start thinking about this.
I will make my source code available throughout the project (I'm a very sloppy python programmer so I can't promise the code is particularly legible).
THE ANALYSIS -- PART I
In this part, I will analyze the winrate of groups of players. So, for example, I'll look at all players who played between 10k and 20k hands, and will compute their overall winrate, as if they were one player.. They played so many hands as a group that the winrate we get is very close to the "true" winrate. Therefore, from this data you'll be able to understand what your winrate is likely to be if you are an "average" player who plans to play a particular number of hands in a single year.
To obtain these groups of players I have sorted the whole player pool according to annual volume, and started bunching together players with similar volume, until getting groups who played three million hands overall.
This analysis has various drawbacks, not least of which is self-selection: the players who played 100k PLO100 hands are the players who didn't go bust in the first 10k hands so they have some skill; on the other hand, they are the players who didn't move up to PLO200 so they are probably, on average, not the best players in this limit. There are more drawbacks, of course, which we can discuss further in the thread.
Regarding Rakeback: Since I don't have the VIP status of the players, I have computed rakeback a little arbitrarily: I assumed that play in the PLO100 level accounted for half of each player's rake for the whole year, and that the player has paid an equal amount of rake during each month. I assumed players are spending their FPP's in an efficient way. I also assumed that there are no bronzestars: I promoted all players automatically to silverstar. I assumed that each player who made supernova was also supernova in the beginning of the year. There are other drawbacks to my rakeback analysis: for example, my rake statistic is calculated on won pots (I think) while rakeback is computed based on weighted contributed; this probably won't make much of a difference, but it probably gives a little less rakeback to winning players and a bit more to losing players than in my calculation.
My (messy) code can be viewed here:
http://www.evernote.com/shard/s224/s...91d7e4a03a77d2
THE RESULTS
The results are as follows:
Annual Volume | #players | Hands | Won (bb/100) | Won+Rakeback | Rakefree Winnings |
---|
1 -- 323 | 38k | 3.0M | -68.4 | -65.5 | -49.9 |
323 -- 779 | 6k | 3.0M | -33.9 | -31.1 | -15.8 |
779 -- 1384 | 3k | 3.0M | -26.4 | -23.8 | -9.3 |
1385 -- 2257 | 1704 | 3.0M | -21.5 | -19.0 | -5.3 |
2258 -- 3596 | 1057 | 3.0M | -19.5 | -17.2 | -4.0 |
3597 -- 5301 | 694 | 3.0M | -15.2 | -13.0 | -0.6 |
5304 -- 7944 | 470 | 3.0M | -11.8 | -9.8 | 1.5 |
7948 -- 12k | 306 | 3.0M | -10.7 | -8.8 | 1.6 |
12k -- 17k | 205 | 3.0M | -6.1 | -4.2 | 5.7 |
18k -- 26k | 140 | 3.0M | -4.5 | -2.7 | 6.2 |
26k -- 38k | 94 | 3.0M | -1.6 | 0.2 | 8.5 |
38k -- 52k | 67 | 3.0M | -0.3 | 1.7 | 10.0 |
53k -- 69k | 50 | 3.0M | 0.0 | 2.0 | 10.0 |
69k -- 91k | 38 | 3.0M | -0.8 | 1.7 | 9.2 |
92k-- 132k | 27 | 3.0M | 0.0 | 3.6 | 9.4 |
133k -- 212k | 18 | 3.1M | 0.2 | 4.2 | 9.6 |
216k -- 672k | 15 | 5.0M | -0.9 | 2.9 | 7.8 |
They are pictured in the graphs in the beginning of the post.
Results for different methods of grouping the players give reasonably similar results. If anyone is interested in results from other groupings of the players, please ask.
DISCUSSION
Note that, as expected, low-volume players have abhorrent winrates. Also, maybe less expectedly, winrates actually dip for the players with close to the maximum number of hands. From working on the data I believe this is not an artifact: players who played 300k hands seem to have lower winrates than players who played only 100k hands; this might not be that surprising, since a particularly skilled player would mostly move up to 200PLO before playing 100k hands.