05-27-2011 , 06:37 AM
Doing some hand analysis and I have come across an issue. The problem I have come across is best explained by way of an example:

1) Consider this example of a 'perfect' binomial distribution:

A number of trains (n) travel a given route every day and they all have the same probability of being late (p).

The expected number of trains that are late = pn

The standard deviation from this expected number = SQRT(pn(1-p))

2) Now consider this example, it is not really a binomial distribution but is similar to one:

A number of trains (n) travel different routes every day and they all have different probabilities of being late. Let the mean of these probabilities be m.

The expected number of trains that are late = mn

Can the standard deviation be estimated using SQRT(mn(1-m)) ???

If so how accurate is this?

If not is there a better method?
05-27-2011 , 07:15 AM
Quote:
Originally Posted by Laughing Assassin
Doing some hand analysis and I have come across an issue. The problem I have come across is best explained by way of an example:

1) Consider this example of a 'perfect' binomial distribution:

A number of trains (n) travel a given route every day and they all have the same probability of being late (p).

The expected number of trains that are late = pn

The standard deviation from this expected number = SQRT(pn(1-p))

2) Now consider this example, it is not really a binomial distribution but is similar to one:

A number of trains (n) travel different routes every day and they all have different probabilities of being late. Let the mean of these probabilities be m.

The expected number of trains that are late = mn

Can the standard deviation be estimated using SQRT(mn(1-m)) ????
If the trains are independent, the variance of the number late is the sum of the variances for each train. The variance for a train with probability p of being late is the variance of a random variable that is 1 with probability p and 0 with probability 1-p which gives a variance of p(1-p). Sum these for each train and take the square root to get the standard deviation. That's how the formula you gave for the binomial distribution is derived too with all probabilites and variances the same.

You can do some simple examples to see how close it is to sqrt(mn(1-m)). It isn't the same because of the p2 terms, but it can be close if the probabilites are small or if they are almost the same.

In real life they might not be independent if one train being late can make others late. Then you would need the covariances or correlations.

Last edited by BruceZ; 05-27-2011 at 07:55 AM.
05-29-2011 , 11:30 AM
05-30-2011 , 08:11 AM
Just used this method to recalculate the standard deviations from the Stars analysis I ran before...

Results showed the standard deviations to be very slightly different than when I used the estimation (sqrt(mn(1-m))...

The biggest difference was when the standard deviation was 122 hands instead of the estimated 132 hands and this was with a sample of just under 70000 hands, so percentage wise the difference is very small.

In short, results and conclusion were unaffected.

m