How do "hot streaks" happen in chess? - Chess and Board Game Strategy

Two Plus Two Forums Sports and Games Chess and Other Board Games

How do "hot streaks" happen in chess?

Post Reply Subscribe

...

06-25-2013 , 06:19 AM

EGarrett

old hand

Join Date: Oct 2007 Posts: 1,562

I notice in my watching of ChessVideos.tv that Curtains will have periods where he "runs hot" and gets his ELO rating as high as 2550 or even above 2600 (including a stretch where he won a long series of games in a row and beat a guy above 2770), then usually will end up returning to his normal rating range of the mid 2400's.

But this seems a bit strange to me. Chess isn't like poker where you can have a stretch of good hands, so what do you think may cause this, if it is anything other than random? Is it purely a result of the player having increased focus? Is it something they change in their game? Can one change really cause that much of an increase in strength?

Quote

06-25-2013 , 06:37 AM

wlrs

adept

Join Date: May 2012 Posts: 989

I guess you mean something like ICC 5-min or 3-min ratings. Motivation/tiredness/distraction level etc. are easily worth +-200 points for me

Quote

06-25-2013 , 08:04 AM

lkasigh

adept

Join Date: Jun 2013 Posts: 953

Mental state and level of fatigue are definitely a factor.

But there is also considerable variance in chess in games between fairly equal players. In a great many positions you can't calculate the best move and you make an "educated guess", which injects a certain amount of randomness.

Quote

06-25-2013 , 10:43 AM

Do it Right

Pooh-Bah

Join Date: Jul 2010 Posts: 3,890

Before answering this I'm curious about one thing.

Does somebody have the software/hardware/patience to run a lengthy sample test? I'm curious how a program like Houdini would do against a program like Rybka over a large sample of say 100 5-minute games, presumably with some sort of setting to prevent the same game and result from being played 50 times.

I expect there would be a fairly large amount of noise in the results of any matchup that isn't a complete wash, even amongst computers.

Quote

06-25-2013 , 12:32 PM

BobJoeJim

Tournament Director

Join Date: Jan 2006 Posts: 11,200

I just set up a quick and dirty simulator to look at this.

Here are the assumptions:

Player A and Player B are identical in "true" strength - I arbitrarily said that they are both rated 2500 at the beginning of the simulation, and that those ratings accurately reflect their true playing strength throughout the simulation.

They play a 100 game blitz match, alternating colors.

I arbitrarily decided that white is worth 50 ELO points, and 45% of games are drawn, so in any given game there is a 34.7% chance that white wins, a 45% chance of a draw, and a 20.3% chance that black wins. If anyone doesn't like these numbers, I would be happy to change them and re-run things.

In any drawn game, neither player's rating changes, and in any decisive game the winner gains 20 rating points and the loser loses 20 rating points. This is the "quick and dirty" part - obviously that's not what would happen in a match, since as one player went on a hot streak and the ratings diverged, there would be diminishing returns for the (now) higher rated player if he continued to win. However I'm really trying to use this as a model for playing lots of games against varying opponents, where the diminishing returns wouldn't necessarily apply, so that's why I set it up this way. So if someone's rating hits 2600 in the sim, that really just means that they are +5 in the match at that point.

I will record each player's highest and lowest rating over the course of the match, then re-sim. I will post the results of 5 simulated matches.

Results:

Match 1: Player A scored 51/100, and reached a peak rating of 2640 (+7), with a low rating of 2440 (-3). Of course that means Player B's peak rating was 2560, with a low of 2360, and I won't post about player B in future match results.

Match 2: Player A scored 48.5/100, and reached a peak rating of 2540 (+2), with a low rating of 2400 (-5).

Match 3: Player A scored 53.5/100, and reached a peak rating of 2700 (+10), with a low rating of 2480 (-1).

Match 4: Player A scored 47/100, and reached a peak rating of 2540 (+2), with a low rating of 2320 (-9).

Match 5: Player A scored 54.5/100, and reached a peak rating of 2740 (+12), with a low rating of 2500 (=).

Conclusion:

It's very easy, just by pure random chance, for a player to get to +10 (or even better) in a 100 game sample against an equal opponent. The results would be the same against numerous opponents who are all roughly the same strength (give or take). When that +10 streak occurs, you can rack up a lot of ELO points - but they won't last in the long run because you're not actually better than your opponents. This doesn't account for the fact that as your rating goes up, your opponent will likely be better on average, making the hot streaks a little harder to maintain, but I don't see any reason not to expect just about every player with a large sample size to have peak ratings that are 200+ points above their "true" rating (and also to sometimes see their rating fall to 200 or more points below their "true" rating, before rebounding).

At my lunch break I'll set up another sim with more accurate rating changes and with varying opponents and see what it says.

Quote

06-25-2013 , 01:56 PM

TexAg06

veteran

Join Date: May 2007 Posts: 2,475

Quote:

Originally Posted by lkasigh

Completely agree with this. Also, and this is based purely on my own experience from 5-min blitz on ICC, but openings play a large factor. Since you don't have time to think through moves in blitz, if you play into an opening you don't know that well but your opponent is very familiar with, you can get into a lot of trouble. Often I'll play 3-4 games in a row against openings I don't know and get beaten pretty badly.

Quote

06-25-2013 , 06:15 PM

leofric

grinder

Join Date: Feb 2012 Posts: 470

I think confidence is huge in running up a streak. I had a friend who racked up 50 consecutive wins (ok mostly against weaker opponents!) but could also hit troughs of poor results where he would almost gladly accept a draw against a player much weaker than him. When he was confident he could reject drawing opportunities and continue to play for the win - when his confidence deserted him he couldnt see past anything but an easy win and so conceded too many draws.

Quote

06-26-2013 , 10:59 AM

Do it Right

Pooh-Bah

Join Date: Jul 2010 Posts: 3,890

Quote:

Originally Posted by BobJoeJim

I just set up a quick and dirty simulator to look at this.

...

Very interesting stuff but aren't you more or less testing the elo system itself as opposed to chess with this simulation? All your results should fall perfectly well on a bell curve - eg about 5% of samples will lie outside 2 standard deviations, etc.

Your conclusions would fall in line with the simulation - particularly that most players would have a substantially higher peak rating than average rating over large sample sizes yet that's never been true for me and from my incredibly thorough research of randomly clicking a couple dozen USCF and FIDE graphs it also doesn't seem to be accurate for most players.

Quote

06-26-2013 , 11:45 AM

BobJoeJim

Tournament Director

Join Date: Jan 2006 Posts: 11,200

Quote:

Originally Posted by Do it Right

What I'm testing, specifically, is what level of statistical variance there should be *within the bounds of the ELO system*. We know that results should fall on a bell curve, but we don't necessarily know what the standard deviation actually is.

The point is that after I calculate what sort of variance we should expect based on the structure of the ELO system. Then, once we are clearer on the expected level of statistical variance, we can compare that to observed rating fluctuations to better determine whether what we're seeing is consistent with pure statistical noise.

If the rating fluctuations of actual players are significantly GREATER than the expected variance, it would allow us to argue that "hot" and "cold" streaks are a real phenomenon. Whereas if rating fluctuations fall in line with statistical expectations, then we can argue that what we perceive as hot and cold streaks are probably just statistical noise with a false narrative laid on top.

If observed rating fluctuations prove to be LESS than expected then I don't know what conclusions we would be able to draw from that. Other than that my model is probably flawed.

Right now I'm still working on revising the model to my satisfaction. The quick and dirty one really was pretty bad and meaningless. Right now it's looking like the effect of playing stronger players as your rating goes up is rather effective in curbing "hot streaks", and that most players shouldn't see a peak rating more than about 90 points above their "true" rating. I still need to look deeper at draw rates, though. That's always the catch with ELO based probability analysis. ELO gives you an expected score, but the possibilities aren't binary, and that complicates things. I also need to look at distributions of opponent ratings relative to your own.

Quote

06-27-2013 , 02:29 AM

#10

Paymenoworlater

adept

Join Date: Dec 2007 Posts: 1,111

I would like to know exactly when he had this kind of streak?
Fide:s rating history only stretches 10 years back but as you can see his rating curve has been quite steady (Based on about 250games.) http://ratings.fide.com/id.phtml?event=2009064

Quote

06-27-2013 , 04:59 AM

#11

EGarrett

old hand

Join Date: Oct 2007 Posts: 1,562

Oh, I'm referring to Greg's 5-minute games on ChessVideos.tv. Not his Classical rating.

Quote

06-27-2013 , 06:29 AM

#12

Paymenoworlater

adept

Join Date: Dec 2007 Posts: 1,111

Quote:

Originally Posted by EGarrett

Oh, I'm referring to Greg's 5-minute games on ChessVideos.tv. Not his Classical rating.

I have actually watched one of these videos and its fairly obvious that he pretty far from always plays these games at full strength.

Quote

06-27-2013 , 07:05 AM

#13

EGarrett

old hand

Join Date: Oct 2007 Posts: 1,562

Definitely. But of course even at full strength I don't think he's 2600 and able to beat guys who are 2770...which he has done during some of his hot streaks.

Obviously this may all be random, but it seems a bit difficult to rectify randomness with the way chess fundamentally works, and there are probably some other influential factors. Like for example, I noticed that during a lot of his 2500+ runs, he stops fretting over the opening and doesn't lose a minute-plus staring at a position when he's unable to find a good move. He just plays something mediocre much more quickly.

Quote

07-02-2013 , 11:52 PM

#14

Kittens

Carpal \'Tunnel

Join Date: Jun 2009 Posts: 6,016

It's just variance, the same as poker.

Of course sometimes you are in a more acute mental state.

Quote

07-03-2013 , 11:05 AM

#15

MKarne

adept

Join Date: Oct 2008 Posts: 762

i think there is something for a poker player to learn here as it seems that some of the time one plays poorly and sometimes ones best. the winning and losing streaks are related as is the level of competition then as when one is in good form, weaker players have little chance and one makes a profit vs. on average equal players and gives a tougher competition to better players. maybe one should vary between three levels based on ones form.

Quote

08-05-2013 , 02:43 PM

#16

curtainz

adept

Join Date: Apr 2013 Posts: 881

I am clearly 2600 level , every time I lose its because of my cats

Quote

Post Reply Subscribe

...