Hi All,
Since the stars PLO rake meeting is coming up, I have less time to work on the data than I wanted. So let me describe here my most advanced analysis to date: the one I used to derive my guess above that 25% or so of the players are winning at about 4bb/100. This analysis has multiple flaws, but I feel it's still better than nothing.
Obviously, when I get back from the meeting I'll keep working on the data and trying to develop a methodology to rigorously analyze the distribution of true winrates. To the best of my knowledge, stars don't know the distribution of true winrates of PLO players, either. They have more data, but still not enough to know the true winrate of any particular player, and I don't think they have a statistical method of the sort I'm trying to develop here, so the results of this analysis might be news to stars as well.
Anyway, here is what I did:
The Data
I took all the players who played at least 40k hands in the sample period. I calculated all their winrates and standard deviations. (Call these the "empirical winrates", to distinguish from the "true winrates"; all winrates in this post are pre-rake.)
I assume that there is some underlying distribution of true winrates, call it D. I assume that each player's true winrate is independently sampled from D. Then the empirical winrate is equal to the true winrate plus normally-distributed noise (according to the player's standard deviation). Our goal is to deduce the distribution D from the empirical winrates.
Note that the assumption I made here, that all players' true winrates are draws from the same distribution D, is wrong: the population of players who played 100k hands probably has a different true winrate distribution that the population of players who played 40k hands. But I think it's a reasonable assumption anyway which should give approximately corrects results.
Now, recall that we have the empirical winrates, and we need to deduce the distribution D. This is a
deconvolution problem: we get a bunch of noisy samples, and we wish to estimate the distribution that these samples are drawn from.
There is a software package called
"extreme deconvolution" that does this, but I'm not sure their methodology is appropriate for our case (I intend to try using it in the future). Instead, I wrote my own algorithm.
The Algorithm
I wrote a
maximum likelihood algorithm. I'll describe it in a follow-up post, but the jist of it is that it performs a local search over possible D's, and tries to find the distribution D that maximizes the chance of obtaining the empirical winrates. Running it over gui166's hand histories, the algorithm spit out the following hypothesis:
Results - first method
The red normalized histogram is the empirical winrates. The green curve is the output: the distribution D. It says roughly that 23% of regs (with >=40k hands) have true winrate around 5bb/100 and the rest have winrate around -2bb/100. The blue histogram is a sanity check of sorts: it shows what happens when the noise is applied back on D: if D is a good solution, then the blue histogram should be very similar to the red histogram.
As we see the blue histogram is indeed similar to the red histogram, so the algorithm worked well. However, the solution it gave cannot possibly be the correct distribution of true winrates: there's no way the true winrates have only two peaks and nothing in between. So, what's going on?
Well, what we're seeing here is severe
over-fitting, which is a very common problem of maximum likelihood algorithms. The green hypothesis does explain the red empirical data, but many other similarly-good solutions exist, so the algorithm took one of them that fit the irregularities in the red data really well, and that created a weird looking hypothesis. I'll discuss this issue of over-fitting in a future post.
Now, can we still get some useful information from the green solution, even though it's obviously not the correct solution? To find this out, I created some synthetic data from various winrate distributions and checked to see what my algorithm spits out for that synthetic data. The results were encouraging (I'll detail them in a future post): it seems that if we mash the spikes towards the center, that seems to give a decent estimation of the true winrate distribution on the synthetic trials that I ran: So a reasonable guess for the true winrate distribution for gui166's empirical data is that around 22% of players have positive winrates of around 4bb/100 on average, and the rest have negative winrates of around 2bb/100 on average.
This method is highly non rigorous, but I think it gives a decent guess (for now).
Second method
How does one solve overfitting? There is a lot of literature on this subject. One way is to make the search space smaller. For example, here is what my maximum likelihood algorithm spits out when I allow it only to use hypothesis which are first increasing and then decreasing:
As you see, this graph makes more sense than the first graph, but it still doesn't look right. And also we already know that these methods are prone to overfitting. But we at least feel better seeing that the properties of this solution are similar to the first solution: 27% of players are winners with average winrate of 4bb/100 and the rest are losers with winrate of around -1.5bb/100.
Summary
We get some sort of guess for the distribution of true winrates. It is still a guess, but trials with synthetic data show that in some aspects it's at least somewhat credible.
In future work I intend to try to solve the over-fitting problem, either by modifying the maximum likelihood algorithm or by using a different algorithm.