Probability a set of data comes from a given probability distribution

Two Plus Two Forums Other Topics Science, Math, and Philosophy

Post Reply Subscribe

...

11-20-2020 , 04:29 PM

Matt R.

Pooh-Bah

Join Date: Mar 2005 Posts: 3,920

Had a discussion with a colleague today. Is the following valid?:

We have a distribution of data, that does or does not come from a given probability distribution. Perform a chi^2 test comparing the expected frequencies and the observed frequencies. If the p-value is above a certain threshold (what threshold and how do you choose it in this case? This is one question I had.), then we can say the observed values come from the probability distribution.

It's kind of the converse of how a p-value is typically used. If p <0.05 (or whatever) we can conclude it is likely the data does NOT come from the distribution. But if p > something, we can conclude it does (? -- this is where I don't necessarily agree).

I don't think this is valid reasoning but I'm having trouble explaining why as my background is not in statistics. In particular, I'm curious that if we CAN conclude it comes from the tested distribution, would say, p > 0.95 mean there is a 95% chance it does come from the distribution? What would be the proper p threshold for this (certainly not p> 0.05, right?)

I've never seen a p-value interpreted this way and it doesn't seem right, but wanted some feedback from someone more formally trained in statistics. Thanks!

Quote

11-20-2020 , 04:33 PM

Matt R.

Pooh-Bah

Join Date: Mar 2005 Posts: 3,920

Ugh this was meant to go in SMP. My bad. Can a mod please move?

Quote

11-20-2020 , 06:31 PM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by Matt R.

p-values are used to reject a hypothesis, not accept it. When you get p < 0.05, what you're saying is that the observed distribution is probabilistically rare given the assumed distribution. Given a distribution, what are the chances of the data?

You can't turn it around because the structure of the question is backwards. Given the data, what's the probability of the distribution? To answer that question, you need a way of measuring the space of possible distributions, which p-values can't do.

For some more intuition on that, it's worth noting that even if you knew with absolute certainty what the underlying distribution is, you would only *expect* to see p > 0.95 about 5% of the time because of how it's calculated. So using this metric, you would only have a 1 in 20 chance of confirming something you were absolutely certain about. And so by that measure, this would be a terrible tool.

Quote

11-23-2020 , 09:31 PM

masque de Z

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 9,961

If p value is small but not below your threshold for initial null hypothesis rejection then you are risking what is called a type 2 error ie fail to reject null hypothesis when it actually is false (type 1 is when you reject it although its actually true, something very costly if it happens in science or society in general). 2 is not as bad of an error because you can keep researching this if motivated by other clues. In that case what you need to do as your responsibility to the truth is to realize the null hypothesis barely survived it and doesnt look good for it. You have to go back to design a new test that has more data and a higher power if you believe the alternative hypothesis is the true one. Your new test if very powerful can even deliver 0.01 p if the truth supports it.

So just do more analysis. Of course what is interesting now is to ask what happens if one keeps testing until they clear the threshold. In that case you need to be ethically responsible and disclose all prior failed attempts to reject null hypothesis. A new test that incorporates all tests together is probably needed to be developed to give a better overall picture of the debate, assuming the data is not evolving in time.

You could be testing a stock market pattern claim and in this case data is plenty but sometimes it is changing over bigger timescales. If you fail to reject the null hypothesis try again and when you are personally convinced further, design a test that will clear even the 0.000001 level if possible, why not.

Quote

11-29-2020 , 11:14 PM

Matt R.

Pooh-Bah

Join Date: Mar 2005 Posts: 3,920

Quote:

Originally Posted by Aaron W.

Thanks Aaron, this is great.

Reading a bit more into p-values and going by what you said, it seems like my colleague is/was making a "confusion of the inverse" fallacy? He was equating P(distribution|data) with P(data|distribution). A chi^2 test can tell you if there's a statistically significant difference between an observed frequency distribution and expected distribution, but, if there is not, that does not imply the observed data comes from that distribution.

Quote

12-05-2020 , 10:57 PM

Aaron W.

Carpal \'Tunnel

Join Date: Sep 2002 Posts: 30,132

Quote:

Originally Posted by Matt R.

Reading a bit more into p-values and going by what you said, it seems like my colleague is/was making a "confusion of the inverse" fallacy? He was equating P(distribution|data) with P(data|distribution).

I guess you can call it that.

Quote:

A chi^2 test can tell you if there's a statistically significant difference between an observed frequency distribution and expected distribution, but, if there is not, that does not imply the observed data comes from that distribution.

This is correct.

Another way to think about it is to think about flipping a coin 100,000 times. Even if the coin is a perfectly fair coin, you don't expect to get exactly 50,000 heads. But suppose you get 49,999 heads (which is going to give you a p-value very close to 1). Are you so certain that this is because the underlying coin is fair? Maybe it's a 49.999-50.001 distribution. Under this assumption, your p-value would also be close to 1. Or maybe it's a 49.998-50.002 distribution, and you just got one extra head compared to the expectation. And this would also have a p-value close to 1.

It turns out to be really, really hard to confirm that data matches some underlying distribution because there's just a certain amount of noise that's inherent in the system. So reading it that way is simply problematic.

Quote

12-11-2020 , 12:05 PM

stremba70

old hand

Join Date: Aug 2020 Posts: 1,301

Quote:

Originally Posted by Matt R.

As others have said, p-values are used to reject the null hypothesis. If the p-value doesn’t justify rejection, then that’s all you can say; you can’t say that the p-value indicates acceptance of it. In English, the statistical test either provides evidence that there is a significant difference or it does not provide evidence of such a difference; we can NEVER use a statistical test to say that we have evidence that the difference does not exist.

As for what the p-value to trigger significance should be, that’s a valid question. A value of 0.05 is accepted as a standard in many fields, but researchers will too often just blindly use that value as if it were some magic number instead of a somewhat arbitrarily chosen standard. There are many instances where p=.05 just is not appropriate. A simple example: suppose you are researching drugs to treat some condition and you have 100 candidates. If you run a clinical trial on each, it would be massively inappropriate to use .05. The reason should be obvious— that value is far too high. With 100 trials you will almost certainly see a few “successful” drugs simply by chance if you use .05 as the significance value. A lower value is more appropriate in that situation.

Ultimately there is no universally correct value. The value used will depend on the tolerance for the two possible types or errors in statistical testing. Using a value that’s too high will risk rejecting a null hypothesis when it is actually true. Using a value too low risks accepting a null hypothesis that actually is false. Only you can determine which of these is a more severe error and adjust p appropriately.

Quote

Post Reply Subscribe

...