There are undoubtedly more sophisticated methods of analyzing this data. I've never thought about adjusting the winrate based on hands played- that's an interesting idea. You'd probably need some kind of correlation coefficient. Unfortunately, I only have a college dropout surface-level understanding of statistics. The best I could do was implement 1-sigma error bars. The data is
publicly available here if anyone wants to analyze it.
I disagree that only 2% of players are long-term winners. The rake can be thought of as a force that pulls the entire graph down vertically. 11.7bb/100 is a pretty big downward shift, but more than 2% of players should have a "true" win rate higher than 11.7bb/100 at 5NL. Furthermore, players who win at 5NL tend to move up to higher stakes quickly, whereas a lot of losing players will stay at 5NL because it's so near the lowest available stake. Therefore, losing players should be somewhat overrepresented in this sample.
One experiment I've thought about doing was to track player's results across many different stakes on the same site and use that to create some kind of skill distribution for each stake. You could empirically compare the difficulty levels of each stake. For the moment it's just an abstract idea tho.