Quote:
Originally Posted by Do it Right
Very interesting stuff but aren't you more or less testing the elo system itself as opposed to chess with this simulation? All your results should fall perfectly well on a bell curve - eg about 5% of samples will lie outside 2 standard deviations, etc.
Your conclusions would fall in line with the simulation - particularly that most players would have a substantially higher peak rating than average rating over large sample sizes yet that's never been true for me and from my incredibly thorough research of randomly clicking a couple dozen USCF and FIDE graphs it also doesn't seem to be accurate for most players.
What I'm testing, specifically, is what level of statistical variance there should be *within the bounds of the ELO system*. We know that results should fall on a bell curve, but we don't necessarily know what the standard deviation actually is.
The point is that after I calculate what sort of variance we should expect based on the structure of the ELO system. Then, once we are clearer on the expected level of statistical variance, we can compare that to observed rating fluctuations to better determine whether what we're seeing is consistent with pure statistical noise.
If the rating fluctuations of actual players are significantly GREATER than the expected variance, it would allow us to argue that "hot" and "cold" streaks are a real phenomenon. Whereas if rating fluctuations fall in line with statistical expectations, then we can argue that what we perceive as hot and cold streaks are probably just statistical noise with a false narrative laid on top.
If observed rating fluctuations prove to be LESS than expected then I don't know what conclusions we would be able to draw from that. Other than that my model is probably flawed.
Right now I'm still working on revising the model to my satisfaction. The quick and dirty one really was pretty bad and meaningless. Right now it's looking like the effect of playing stronger players as your rating goes up is rather effective in curbing "hot streaks", and that most players shouldn't see a peak rating more than about 90 points above their "true" rating. I still need to look deeper at draw rates, though. That's always the catch with ELO based probability analysis. ELO gives you an expected score, but the possibilities aren't binary, and that complicates things. I also need to look at distributions of opponent ratings relative to your own.