Quote:
Originally Posted by Sholar
I haven't thought through the details, but does it make sense to think that the tension is really between the color bonus (for white) and the adjustments to draw rates?
Elo seems pretty robust even at the highest levels as far as I have seen, and the predictions seem in the ballpark.
(The next step if for you to start tracking the prediction error and then optimize these parameters to minimize error on cross-validation samples...)
I don't think it has so much to do with the color bonus as just with the draw rates themselves. I mean if this algorithm is sound, it should work for more than just this event, and I need a way to handle what happens when Magnus plays a ~2600 player. Say Tata Steel earlier this year, I would have wanted to be able to predict Hou Yifan's results as well.
Fundamentally, there's just a major challenge in accurately predicting draw rates. There are clearly two main factors - the specific players involved, and the rating difference between them. Individual players will have stylistic differences - some players of the same rating as each other will have much higher draw rates, while other players will have much lower draw rates, with more wins and more losses balancing each other out to achieve the same scores. At the same time, as the ELO gap grows, draw rates will shrink. A 2000 rated player who stylistically has an unusually high draw percentage against similarly rated opponents still isn't likely to garner many draws against GMs.
I had initially hoped that all these players would be close enough in skill that I could ignore that second factor, but that's just not the case. The average draw rates of this field, when going up against each other, are too high to be reasonable for a >100 ELO differential, and that happens in several of Carlsen's games even without applying a color adjustment. So I had to make an effort to model that second factor, and figure out how much I should reduce draw rates as the ELO differential grows. I used the 1.8 multiplier I described before as a simple way to try to do something with it, but I'd like to come up with something better (or else to find evidence to support the idea that what I'm doing is in fact the best option).