Variance reduction is a way to use evaluations to "speed up" rollouts. That is, if a regular rollout takes 10,000 games to converge to a particular result, a rollout with variance reduction will converge much faster to the same result, say after 1,000 games.
If the evaluations are flawed, it doesn't affect what result the rollout will converge to, it just means it will take more games to converge.
Here's an analogy I thought up to convince myself it works. bgblitz can comment if the analogy is flawed in some ways, but I think it gets across the main point.
Quote:
Let's say you want to figure out what the average roll of one die is, but you don't know how to do the calculation. You could do a rollout -- roll a die a 100 times and you'd get the following rolls in roughly equal proportion:
1 2 3 4 5 6
Then you would average those 100 rolls and it would be around 3.5.
But let's say you had an evaluation function that knew three things --
6 is a little higher than average
1 is a little lower than average
6 occurs exactly as often as 1
Then you could do the same experiment again, but every time you see a 6, you know it's a little too high, so you subtract 1 from it. And when you see a 1, you know it's a little low, so you add 1 to it. This won't affect the average because you have a 1 in 6 chance of adding 1 and a 1 in 6 chance of subtracting 1. So now you get the following rolls in roughly equal proportion:
2 2 3 4 5 5
They still average out to 3.5, but they will converge faster to 3.5 than in the original experiment, as they're more closely clustered around 3.5. In other words, you'll need fewer rolls to get an accurate result.
Now what happens if your evaluation function is bad? Let's say it dumbly thinks 6 is too low and 1 is too high. Then after adjusting, you get the following rolls in roughly equal proportion:
0 2 3 4 5 7
They still average out to 3.5, but it will take longer to converge. An inaccurate evaluation function won't affect the final result, it will just affect how long it takes to converge on it. Luckily, XG's evaluation function is pretty good.