Simple(?) question about standard deviation - Science, Math and Philosophy Forum

Two Plus Two Forums Other Topics Science, Math, and Philosophy

Simple(?) question about standard deviation

Post Reply Subscribe

...

08-25-2018 , 08:25 PM

Sooga

veteran

Join Date: Sep 2002 Posts: 3,320

Ok, I've taken more than my share of math classes in my day, but I'm a little hazy about SD. Hopefully someone can help me...

Is there an actual 'meaning' to standard deviation? For example, if I had 10 people and they each made an average of $50,000 per year, that number means something to me - it means if I added all their earnings and distributed it among all 10 people equally, each person would make $50,000. I get that. It makes sense.

But if you told me the standard deviation of their earnings was $10,000, what exactly does that mean? How do I put that into context? I get that a group with a standard deviation of $10,000 has earnings more clustered together in general than one with a sd of $20,000, but do those numbers have any meaning on their own? (Assuming a smallish sample that is not a normal distribution)

I've been told SD is basically a sort of average of how far data are from the mean, but it's not even that - that would just be the average of the absolute value differences between the data and mean. So what's the importance of SD? Does it only come into play when you're talking about large samples of normal distributions?

Quote

08-25-2018 , 10:12 PM

PairTheBoard

Carpal \'Tunnel

Join Date: Dec 2003 Posts: 10,037

Quote:

Originally Posted by Sooga

I've been told SD is basically a sort of average of how far data are from the mean, but it's not even that - that would just be the average of the absolute value differences between the data and mean. So what's the importance of SD? Does it only come into play when you're talking about large samples of normal distributions?

It's the same idea but the distance ("how far from mean") is the Euclidean or "Pythagorean" distance, i.e. square root of the weighted sum of squared differences from the mean. It's comparable to measuring the distance of the point (2,-1) to (0,0) as sqrt(5) rather than 3. In general, it's the metric that's easier to work with.

PairTheBoard

Quote

08-25-2018 , 11:02 PM

Sooga

veteran

Join Date: Sep 2002 Posts: 3,320

Quote:

Originally Posted by PairTheBoard

Thanks for the response!

I understand your example about the pythagorean distance, and I understand that sqrt(5) is more intuitive than 3 when you're talking about 2-dimensional points on a plane, but I don't understand how this is easier to work with when talking about single one-dimensional datapoints.

I guess what I'm asking is when we're using these one-dimensional datapoints, how is SD a metric more useful/easier to work with than the average absolute value difference from the mean?

Quote

08-26-2018 , 01:44 AM

PairTheBoard

Carpal \'Tunnel

Join Date: Dec 2003 Posts: 10,037

Quote:

Originally Posted by Sooga

Think of the n data points as an n dimensional vector wrt the n dimensional origin (m,m,...,m) where m is the mean. i.e. ( (x1-m),(x2-m),...(xn-m)) where x_i are the data points. Then the sd is the length of that vector with the weighting (or averaging) factor 1/n applied to the (x_i -m)^2 terms. Similar to how you would compute the length of a vector in R^n. Remember, the sd is a measurement on the entire collection of data.

PairTheBoard

Quote

08-26-2018 , 03:40 AM

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

It's what makes central limit theorem so useful (what defines it as it is rather than average of abs values differences etc).

It's the connection with the normal distribution in the end that makes it so reasonable choice among others. (also sum of variances is variance of the sum of uncorrelated random variables).

Selecting the average m the way we do minimizes the variance function as classically defined (second moment) defined for any m. I think that is not trivial because of normal distribution and likelihood connections.

Quote

08-26-2018 , 09:32 AM

PairTheBoard

Carpal \'Tunnel

Join Date: Dec 2003 Posts: 10,037

You might say the Euclidean norm for vectors in R^n is obviously the correct one because of the Pythagorean Theorem but not so much for a data point vector about the mean. However, a taxi driver on a grid of streets would disagree, pointing out that for him the distance between 2 points is the sum of the vertical and horizontal moves he has to make. In fact, there are many vector space norms that can be used. See this link on L^p spaces.

https://en.wikipedia.org/wiki/Lp_space#Statistics

The L^1 norm you're talking about is sometimes referred to as the taxi cab norm. The L^2 norm, sqrt(weighted sum of squares), is particularly nice because it arises from an inner product, making it a Hilbert Space for which there is a great deal of theoretical development made possible by the concept of orthogonality provided by the inner product.

PairTheBoard

Quote

08-26-2018 , 10:45 AM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by Sooga

I'm going to approach this question from a slightly different angle.

Instead of thinking about the standard deviation, let's think about the variance. This number is a true average. It's the average of the squared distances from the mean.

But why would we use squared distances?

(1) Absolute values are algebraically annoying. If you remember solving absolute value inequalities, you might remember all of the little "rules" that you have to apply in order to solve them. (If it's of one form, you have an AND condition, but if it's another form it's an OR condition...)

(2) Squaring biases in favor of small distances and against large distances. In other words, it's a weighted average. An error of 1 only contributes 1 to the average, but an error of 2 contributes 4 to the average.

(3) Using absolute values also means that you lose access to some calculus tools, because you can't take the derivative of the absolute value function everywhere.

If you accept this, then at least it would seem plausible that the variance could be interpreted an average of how far data is from the mean. Except that it's not a distance. It's a square distance. If you consider the variables to have units (say, units of length), then the variance is actually a square length.

So we will take the square root at the very end. It's just adjusting the units to be correct. If we want to talk about distances, then we really should be having something with units of distance.

And this is say the standard deviation is kind of like an average distance from the mean.

Quote

08-26-2018 , 11:00 AM

Sooga

veteran

Join Date: Sep 2002 Posts: 3,320

Thanks for all your responses everyone. I think I'm understanding this now, so correct me if I'm wrong.

So standard deviation gives you a weighted average of the distance from the mean, while the average abs. val. difference from the mean does not, and so because of this, SD is a more illustrative indicator about the data?

This is not to mention all of the other benefits of SD like differentiability, its use in normal distributions, etc etc?

Quote

08-26-2018 , 10:25 PM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by Sooga

So standard deviation gives you a weighted average of the distance from the mean, while the average abs. val. difference from the mean does not, and so because of this, SD is a more illustrative indicator about the data?

It's not that the standard deviation is "more illustrative" of the spread of the data. It's a "more common" method for the reasons provided.

Quote

09-07-2018 , 08:03 PM

#10

NewOldGuy

Pooh-Bah

Join Date: Mar 2009 Posts: 5,935

Using root mean square (RMS) instead of Absolute Mean simply makes the algebra a lot easier. The fact that it also emphasizes large errors is not a benefit, just something to recognize as an inherent bias of the method.

Quote

Post Reply Subscribe

...