"P-value" and "level of significance" in statistics - Gambling and Probability

Two Plus Two Forums Other Topics Probability

"P-value" and "level of significance" in statistics

Post Reply Subscribe

...

11-06-2008 , 11:43 PM

Gster

centurion

Join Date: Aug 2008 Posts: 106

Can someone offer a definition of these two concepts in statistics?
I tried looking it up on wiki and I still can't grasp the definitions given

I'm under the impression that the "P-Value" has some kind of relationship with
"alpha"??

As for the "level of significance" when it is said that we require a 5% level of significance, does it mean a 95% confidence interval (so an "alpha" of 5%)?

If my understanding of anything is faulty please correct me as well...

Quote

11-07-2008 , 03:08 AM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by Gster

One problem is that the behavioral sciences have completed butchered the meaning and understanding of these terms. I'm sure that is one reason you are confused...totally understandable.

A p-value is a probability or proportion. Strictly speaking, a p-value is the probability that the sample data would occur if a pre-defined "null hypothesis" were in fact true in the population.

That is, a p-value is the probability of getting the data you got, given the hypothesis (null, which is often "zero" by default). In a bayesian sense the p-value is p(D|H).

Many people make the mistake of thinking that the p-value is the probability of the null hypothesis being true. IT IS NOT! People make this mistake and then assume that a p-value of .05 means that they are 95% sure that their alternative hypothesis is true. They could not be more wrong.

A "statistically significant" p-value simply means that the data you gathered (group differences, correlation, etc.) was unlikely to have occured by chance alone (i.e. unlikely to have occured if the null hypothesis were in fact true).

Let me emphasize: A STATISTICALLY SIGNIFICANT P-VALUE DOES NOT MEAN YOUR RESULT IS "REAL" OR EVEN "IMPORTANT".

But a small p-value does mean that your result (observed data) was unlikely to have occured if in fact the null hypothesis were true at the population level (e.g. no relationship between the variables in the population, no mean differences in the groups at the population).

There are some major problems with reliance on p-values that intelligent behavioral scientists have long identified. The first of which is the fact the "null hypothesis" is often assumed to actually be a "nill hypothesis". That is, null often = no effect whatsoever. In the behavioral sciences however, this is an absurd notion. Almost everything is related to everything. Almost every group mean is different to some decimal place. For the "null/nill" hypothesis to be true, one would have to prove that two groups are equal to an infinitely long number of decimal places. I don't have the time to walk that far.

So it is quite odd that behavioral scientists (like me) would be even remotely interested in knowing the probability of getting their data if in fact the null/nill hypothesis were true, when in fact we know it almost never is.

A second problem with over-reliance on p-values is that people tend to believe that effects with p-values below .05 are "real" and anything above ".05" isn't. The .05 cut-off is completely arbitrary. And ultimately decisions about whether an effect is "real" or "important" come down to the researcher and his/her research interests.

In any case, the .05 level is often chosen because in a signal detection theory sense, it means that there is a 5% chance of making a Type I error. A type I error means that you reject the null hypothesis (say there is an effect when in fact there is none). Now as I have mentioned before, the notion of Type I error is pretty much absurd in the behavioral sciences. But stay with me anyway. The probability of making a type I error is called an "Alpha Level". Likewise, there is a probability called "Beta" which is the probability of making a Type II error (saying there is no effect when in fact there is an effect).

As an example, consider this scenario. Person A either has HIV or doesn't. A test for HIV is either positive or negative. If Person A doesn't have HIV and the test is negative, we are correct. If Person A has HIV and the test is positive, we are again correct. However, if person A doesn't have HIV and the test is positive we are incorrect and have made a Type I error (said there was an effect when there wasn't one). If person A has HIV and the test is negative we are incorrect and have made a Type II error (said there was no effect when there was one).

In any case, alpha is typically set at .05, accepting the fact that 5% of the time one might reject the null hypothesis when in fact it is true.

Anyhow, I'm sure this answer is much longer than you expected. I hope you found it valuable. Most importantly however, remember that a p-value is the probability that you would have gotten your data if the null hypothesis were in fact true in the population. It is not the probability that the null hypothesis is true.

Sherman

Quote

11-07-2008 , 03:22 PM

Gster

centurion

Join Date: Aug 2008 Posts: 106

thx for the reply,
so we are assuming the null hypothesis is true, then we look at our data and I guess depending on the deviation our data is from our null hypothesis will influence the P-value? Am I making sense? Could you show the calculation you would need to do to get the P-value in an example?
Thank you

Quote

11-07-2008 , 09:13 PM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by Gster

Yes. You've got it. The p-value could be thought of as a deviation of our data from what the data would be if it were in fact null at the population level.

Unfortunately the calculation of a p-value is rather complicated. P-values differ based on sample size, the size of the effect, and the distribution on which the test is being compared (t distribution, f distribution, etc.).

The best way to get a p-value is to get an inferential test statistic (like an F or a T) and to use excel to calculate the p-value for your sample size. For a two-sample t-test for example type the following into an excel cell:

=tdist(t-value,degrees of freedom,tails)

where t-value is the t-test statistics obtained from your data, degrees of freedom is the sample size of both groups minus 2, and tails is either 1 or 2 depending on if you want a one-tailed (hypothesis guided) test or a two-tailed test.

I asked something similar to this last week; there is no canned formula to compute a p-value. Excel uses an iterative procedure until the proper conditions are met to find a result.

Sherman

Quote

11-08-2008 , 12:04 AM

AaronBrown

Pooh-Bah

Join Date: May 2005 Posts: 4,240

Here is a simple example of p-value and its computation.

Someone tells you he can predict whether the stock market will go up or down on a given day. Your null hypothesis is that it's not true, that he's just guessing at random. To keep it simple, assume the market goes up 50% of the time.

If he gets it right one day, the p value is 0.5, because there is a 50% chance he will be correct, given your null hypothesis. If he gets it right two days in a row, the p-value is 25%. The third day in a row makes it 12.5%, the fourth makes it 6.25% and the fifth makes it 3.125%. This is below 5%, the conventional level for "statistical significance." At this point, you have cast doubt on the null hypothesis and might want to think about whether he really can predict the market.

Of course, you have to take into account other possibilities. Maybe he's sending predictions to 32 people, half up, half down. He'll get five in a row right for one of those people. Or maybe 100,000 people claim to be able to predict the market, some of them will get it right even more times in a row.

There's no magic about 5%, it's just an arbitrary level that people often use. The more he gets right, the lower the p-value and the more you doubt your null hypothesis.

Quote

11-08-2008 , 01:13 AM

zeepok

centurion

Join Date: Oct 2008 Posts: 107

Quote:

Originally Posted by Sherman

That is, a p-value is the probability of getting the data you got, given the hypothesis (null, which is often "zero" by default). In a bayesian sense the p-value is p(D|H).

No, the P-value is the probability of getting the data you got or a result even more extreme than the one you got according to some test statistic. It's P(area beyond D|H). If you got x% correct stock market predictions, and if for a random guesser the probability that the fraction is x% or more than x% is .02, then the P-value is .02.

Generally it's true that lower P-values mean stronger evidence, but it's not completely straightforward; according to something called Lindley's paradox there are some cases where a P-value below (but not too far below) .05 is actually strong evidence in favor of the null hypothesis.

Last edited by zeepok; 11-08-2008 at 01:23 AM.

Quote

11-08-2008 , 04:25 AM

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by zeepok

Yeah...I was way off...thanks for the clarification.

And yes, if you play around with Bayes enough, it becomes pretty obvious that even small p-values are not great evidence against the null hypothesis.

For instance, with a large sample (i.e. huge power = low type II error rate) a small p-value is almost guaranteed. Specifically, say you have a sample with a large enough power to detect the effect of interest 99.9% of the time. You also happen to believe the prior probability for the null hypothesis is 50%. If you get a p-value [p(D|H)] of .05, your posterior probability for the null hypothesis will in fact be .98. That is, if you have high levels of power, Bayes holds your p-values to a higher standard...requiring more "proof" that the data did not come from a null distribution so to speak.

Sherman

Quote

11-11-2008 , 03:42 PM

Klyka

grinder

Join Date: Jun 2005 Posts: 675

I'm only briefly familiar with p-values. Zeepok's clarification was new to me, I thought that the p-value was simply P(D|H), where D is the exact data set that we have. How does the "even more extreme" property of the p-value work? Someone, please explain.

Also, but this is based on my previous (mis?)understanding of the p-value, I've never quite understood how anyone can dismiss a hypothesis based on the fact that the p-value is lower than 0.05. As someone pointed out, for large data sets, we're almost guaranteed to obtain small p-values. What seems interesting to me is comparing the p-values of two hypotheses to each other - the one with the larger p-value is the more plausible hypothesis.

I use this test sometimes, and I've been thinking that I'm calculating p-values when I've done so. Now I see that this may not have been the case (

), but it's still a handy tool.

Quote

11-12-2008 , 07:09 PM

zeepok

centurion

Join Date: Oct 2008 Posts: 107

Quote:

Originally Posted by Klyka

I thought that the p-value was simply P(D|H), where D is the exact data set that we have.

Nope! I can see how you might think so. A lot of people are confused about statistical concepts like this, and even people who do understand the concepts (like Sherman) sometimes misspeak.

Quote:

How does the "even more extreme" property of the p-value work? Someone, please explain.

To define "extreme" you need some test statistic that you expect to have a clearly different random distribution if the null hypothesis is true, than if the alternative hypothesis is true. For example, "percentage of correct predictions" is something you'd expect to center around 50% according to the null hypothesis that the predictions are no better than random, but you'd expect it to center around some higher value according to the alternative hypothesis that the predictions are better than random. Then you can look at all possible outcomes in which the percentage of correct predictions is at least as high as the one you got, and say that those outcomes are "more extreme"; the probability of all of them together, under the null hypothesis, is your P-value.

Quote:

What seems interesting to me is comparing the p-values of two hypotheses to each other - the one with the larger p-value is the more plausible hypothesis.

When you say "the more plausible hypothesis" you're comparing different P(H|D) (the probability of the hypothesis given the data), not different P(D|H) (the probability of the data given the hypothesis, which annoyingly enough is also called the "likelihood" of the hypothesis). These are related by Bayes' theorem, but they're not the same thing. Sometimes if you have different hypotheses H1 and H2, P(H1|D) is greater than P(H2|D) even though P(D|H1) is less than P(D|H2). For example, the probability that someone bets if he has a pair may be less than the probability that he bets if he has a straight flush; but even after he bets it's still more plausible that he has a pair than a straight flush, just because the straight flush started out as so improbable. (This is why I'm never sure if I should take someone seriously when they say they "read someone for a set".)

Quote

11-12-2008 , 08:59 PM

#10

Sherman

Carpal \'Tunnel

Join Date: Jun 2005 Posts: 7,766

Quote:

Originally Posted by zeepok

(This is why I'm never sure if I should take someone seriously when they say they "read someone for a set".)

I hear you! I have tried to no end to convince some people of this. I get this all the time: "I gave him a range of 77+,AQo, but my read was KK." I tried to explain through Bayes that their logic is just really wrong. If your range for a guy is either has KK or 32o, you don't get just "guess" that he has KK this time. You will make less money if you do. If your "read" says he has KK, then your range for him is KK...period. Anyhow, this is spot on imo.

Sherman

Quote

11-13-2008 , 04:30 PM

#11

Klyka

grinder

Join Date: Jun 2005 Posts: 675

Embarrassingly, I forgot about the a-priori probability in my post. The "p-value", meaning what I previously thought it meant, should be weighed by the a-priori probability, then the comparison can be made.

This is a step which, for example, creationists fail to take when comparing creation with evolution. I find the spaghetti monster very useful for illustrating this point.

Quote

11-16-2008 , 10:15 PM

#12

AaronBrown

Pooh-Bah

Join Date: May 2005 Posts: 4,240

Quote:

Originally Posted by Sherman

I'm not sure I agree. You may well be right that most people who say things like this are confused, and lose money. But I don't think the two statements are inconsistent.

I think of a range as a long-term frequency statement. This kind of player (or this player in particular if I know her) in this situation will generally have this range of hands to make this bet. Therefore, I'd like to pick a response that wins versus anything in the range, and definitely one that has positive EV versus a random hand in the range. I'm less worried about by EV versus hands not in the range.

A read on the other hand, is a specific guess about this hand. It's generally not as specific as KK, it's more often that you read someone for a bluff, or a flush draw, or a full house. Something about the way they act makes you zero in on one set of hands, which may or may not be in the range.

Unless you're a psychic or a cheater, your reads are not always correct. But if you use them properly, they don't hurt you if they're random, and they help you if they're even a little better than random (they will, however, hurt you if they are worse than random, that is, if the other player is misleading you).

Quote

11-16-2008 , 11:25 PM

#13

Klyka

grinder

Join Date: Jun 2005 Posts: 675

Quote:

Originally Posted by AaronBrown

Unless you're a psychic or a cheater, your reads are not always correct. But if you use them properly, they don't hurt you if they're random

I must disagree. We want to play in such a fashion as to do well against villains true range as a whole. This often means compromises between plays that are good against different parts of his range. It doesn't necessarily mean that we'll mix up between plays that are good against each part of his range, as using "random reads" would imply.

But this is OT. =)

Quote

Post Reply Subscribe

...