Open Side Menu Go to the Top
Register
Benford's law Benford's law

05-31-2017 , 12:39 PM
I just learned of Bedford's law, which states that in many data sets, more numbers begin with the digit one than in a truly random sample, 2 is next most common, etc.

I did not understand the explanations why. My intuition is that it has to do with the fact that naturally occurring objects come in quite limited quantities. So if we sample flocks of birds, their populations are more likely to be around 125 rather than 9,481,869. So natural limits will trend numbers down, making them more likely to start with 1 or 2.

If that is true, then immense naturally occurring numbers I would think would stop following Bedford's rule. So the molecular weights of random pebbles on a beach would begin with more random digits.

Am I anywhere close to being on the right track?
Benford's law Quote
05-31-2017 , 02:08 PM
I didn't quite catch your meaning. Could you repeat it please?
Benford's law Quote
05-31-2017 , 03:13 PM
You mean Benford's Law?

https://www.youtube.com/watch?v=XXjlR2OK1kM
Benford's law Quote
05-31-2017 , 04:56 PM
Yes, Benford's.

Howard, if you survey river lengths by any unit (centimeters, leagues), about 30% of the lengths will begin with the digit 1. Numeral 2 is next most common, 9 the least.

Watching the vids, it's a scale effect, but it is hard to articulate much further.

I did get this: Consider a population of 19 things. Eleven of those numbers start with 1, which is more than 50%. If we add in 20-29, then the proportion of "one" numbers will fall. But then it starts rising again at 100.

In the complete set 1-99, "one" numbers have no advantage. But natural populations do not stop conveniently at 9 or 99. With a group of randy prairie dogs growing steadily from two to 19, they will more often have a population starting with 1. So if we survey the populations of 500 prairie dog colonies, the distribution will follow Benford's law. And the effect scales.

The real mathematicians were explaining it with logarithms, the above is just my attempt to get a handle on it.

Last edited by Bill Haywood; 05-31-2017 at 05:21 PM.
Benford's law Quote
05-31-2017 , 05:19 PM
I find that there are few things worse than a joke that has to be explained.

You double posted the question.
Benford's law Quote
05-31-2017 , 05:37 PM
Quote:
Originally Posted by Bill Haywood
Yes, Benford's.

Howard, if you survey river lengths by any unit (centimeters, leagues), about 30% of the lengths will begin with the digit 1. Numeral 2 is next most common, 9 the least.

Watching the vids, it's a scale effect, but that's as far as I can articulate.
I think it's got something to do with the log scale. The numbers can't be distributed uniformly all the way out to infinity. They have to get more sparse at some point. But they can be distributed more or less uniformly on the log scale, at least for quite a ways out. The phenomenon is evidently a quirk of approximately uniform distributions on the log scale. I'm guessing it's because the first numbers at a new order of magnitude take up more space on the log scale than the later numbers for that order of magnitude. The log scale squeezes the larger numbers more tightly together. This would hold whether the "order of magnitude" is in base 10 or some other base.

However, this doesn't explain why the data sets happen to be approximately uniformly distributed on the log scale to begin with. I think it's because the data gets more sparse for higher numbers in an unbiased way.


PairTheBoard
Benford's law Quote
06-01-2017 , 01:45 AM
Quote:
Originally Posted by PairTheBoard
I think it's got something to do with the log scale. The numbers can't be distributed uniformly all the way out to infinity. They have to get more sparse at some point. But they can be distributed more or less uniformly on the log scale, at least for quite a ways out.
I'm not sure this actually makes sense. On a "log scale" the distribution goes to zero extremely fast, so saying that something interesting happens "for quite a ways out" means that you've got some gigantic range of values. For example, if you mean "quite a ways out" to mean going to about 10 (which isn't far out at all), you've got data that is on the order of 10^10, which is much larger than many applications of Benford's Law (such as fraud detection in financial records).

So whatever you mean to say, I think you mean to say something different.

The underlying principle has to do with scale invariance. Here's the basic idea: Let's say that numbers in some collection follow a rule (such as Benford's Law). Then we should be able to scale the numbers by multiplying everything by some constant k, and the law should still be there. This is because it can't possibly be just a quirk of whatever our unit measure is.

The following short note shows how to derive Benford's Law from this idea. If the first digits of your x values satisfy some scale invariant rule (1 <= x < 10), then it turns out that y = log x will be uniformly distributed (0 <= y < log 10). And once you know the distribution of the y, you can work that back out again to get the distribution of the x values, which is Benford's Law.

https://math.dartmouth.edu/~m20x16/a...ts/Benford.pdf
Benford's law Quote
06-01-2017 , 09:08 AM
Let me take another stab at the intuition on this. If the density of the data on the log scale is uniform with constant k over several orders of magnitude then the density of the raw data decreases like k/x over its raw domain. Suppose the density of the data goes like k/x from 10 to 100,000. Then the data from 10-19 is more dense than the data from 20-29, which is more dense than the data from 30-39 and so on up to 90-99. Similarly the data from 100-199 is more dense than the data from 900-999, 1000-1999 more dense than 5000-5999, 10,000-19,999 more dense than 70,000-79,999.

The data must become less dense for larger numbers eventually. Evidently a lot of data becomes less dense in a nice way like k/x. Furthermore, it should be easy to see that if the data becomes less dense like k/x the leading groups in new orders of magnitude will be more dense than later groups in that order of magnitude regardless of what base the orders of magnitude is written in.


PairTheBoard
Benford's law Quote
06-01-2017 , 09:17 AM
Quote:
Originally Posted by Bill Haywood
more numbers begin with the digit one than in a truly random sample, 2 is next most common, etc.
This is correct. The way I think of it is that most numbers come from random distribution.

So, for example, consider a collection of uniform distributions on [1,N] as N varies. It does not need much thought to realise that a random digit from the collection will start with 1 more often than anything else. Natural occurring probability distributions might not be uniform but will typically have enough fuzzy similarities to cause the effect to still work.

I first came across it when I was a child, and the example the book I was reading gave was to look at the population of cities around the world. A quick experiment with the AA guide was enough to convince me.

Edit: (It also suggested looking at how dog-eared the 1 and 9 pages in a book of log tables where, although that unfortunately just shows my age or the age of the book I was reading.)

Edit: If you are unhappy with the 1 in a collection of [1,N] uniform distributions, consider a collection of [M,N] uniform distribution as M&N change uniformly and independently. The same effect will still hold.

Quote:
Originally Posted by Bill Haywood
If that is true, then immense naturally occurring numbers I would think would stop following Bedford's rule. So the molecular weights of random pebbles on a beach would begin with more random digits.
No, I think pebbles on the beach will still follow the rule. Nocking digits off the end does not change anything.

Last edited by Piers; 06-01-2017 at 09:39 AM.
Benford's law Quote
06-01-2017 , 09:39 AM
Quote:
Originally Posted by Piers
This is correct. The way I think of it is that most numbers come from random distribution.

So, for example, consider a collection of uniform distributions on [1,N] as N varies. It does not need much thought to realise that a random digit from the collection will start with 1 moe often than anything else. Natural occurring probability distributions might not be uniform but will typically have enough fuzzy similarities to cause the effect to still work.
So if the data is uniform from say 1-500, 100-199 takes up 1/5 of the numbers and 5,6,7,8, and 9 start only 2% of the numbers apiece. For this kind of "random" data the largest order of magnitude dominates. The leading 1 group is always included in the largest order of magnitude but others may or may not be. Cool.


PairTheBoard
Benford's law Quote
06-02-2017 , 06:01 AM
Log 2
Benford's law Quote
06-02-2017 , 06:46 AM


During an order of magnitude change from x to 10x the increase that is the greatest is from x to 2x, from 2 to 3 is less impressive etc. So how much time you spend in 1-2 range vs the others?

Where are the big boxes? (ie distribution uniform in log of object but not the actual object)

Last edited by masque de Z; 06-02-2017 at 06:52 AM.
Benford's law Quote
06-02-2017 , 08:28 AM
I don't get this thread. This is a simple property of counting where you start from zero. It doesn't have profound implications. That the first numbers will come up with the most frequency, and the last numbers, the least, is expected.

Consider this. Numbers below 20 max out at 50% starting with a 1. Numbers below 200 max out with 50% frequency for 1s. And so on. Anything below 700 will have virtually no nines.

In something where 9 is as likely as 250, the frequencies for starting with 1 are minimum 10% and maximum 50%
Whereas frequencies for starting with 9s max out at 10%.

So the frequency of 1 is always going to be >>> the frequency of 9 in a the set of sets where a low number is as likely as a high.
Benford's law Quote
06-02-2017 , 09:39 AM
It looks to me like the key factor in whether data can be expected to obey Benford's law is the relationship of the variance to the mean. If the data is relatively "spread out" so that the variance is large compared to the mean you get Benford's law. Benford's law tends not to apply if the variance is small compared to the mean. For example, the number of heads in 1,000,000 coin flips.


PairTheBoard
Benford's law Quote
06-02-2017 , 05:29 PM
Quote:
Originally Posted by ToothSayer
I don't get this thread. This is a simple property of counting where you start from zero. It doesn't have profound implications. That the first numbers will come up with the most frequency, and the last numbers, the least, is expected.

Consider this. Numbers below 20 max out at 50% starting with a 1. Numbers below 200 max out with 50% frequency for 1s. And so on. Anything below 700 will have virtually no nines.

In something where 9 is as likely as 250, the frequencies for starting with 1 are minimum 10% and maximum 50%
Whereas frequencies for starting with 9s max out at 10%.

So the frequency of 1 is always going to be >>> the frequency of 9 in a the set of sets where a low number is as likely as a high.
For the set of 1-99, there are exactly 11 numbers that start with 1, 11 that start with 2, 11 that start with 3, etc.

That is why it is strange .
Benford's law Quote
06-02-2017 , 05:51 PM
Mostly the sets aren't 1-99 style. If it's 1-98 you already have one 9 less, if it's 1-100 you already have one 1 more, etc.

Last edited by plaaynde; 06-02-2017 at 05:58 PM.
Benford's law Quote
06-02-2017 , 08:05 PM
Quote:
Originally Posted by plaaynde
Mostly the sets aren't 1-99 style. If it's 1-98 you already have one 9 less, if it's 1-100 you already have one 1 more, etc.
That doesn't get you to 50% of numbers starting with a 1...
Benford's law Quote
06-02-2017 , 08:10 PM
If you'd have listened to the podcast that Bill listened to, you'd immediately say "oh, well that makes sense. OFC, the number one will be overrepresented."
Benford's law Quote
06-02-2017 , 08:25 PM
Quote:
Originally Posted by BrianTheMick2
That doesn't get you to 50% of numbers starting with a 1...
1-200 gives 50%+, then go down in % from there (never going under 10%), until you hit 1000, then start building the percentage again, and now you really are in for it.

Last edited by plaaynde; 06-02-2017 at 08:36 PM.
Benford's law Quote
06-02-2017 , 09:08 PM
Quote:
Originally Posted by plaaynde
1-200 gives 50%+, then go down in % from there (never going under 10%), until you hit 1000, then start building the percentage again, and now you really are in for it.
It works regardless of scale on the things it works for...

It never works for the things it doesn't work for...
Benford's law Quote

      
m