Open Side Menu Go to the Top
Register
(collecting NHL data) How do i determine statistical significance? (collecting NHL data) How do i determine statistical significance?

11-16-2011 , 04:44 PM
Need help.

Im collecting data on NHL scores since 2010 to current.

I took every score in the regular season from 2010 and put them into categories of more or less than 5.5 goals. I'm trying to find trends in goal scoring in:

West vs West games
West vs East games
East vs East games

How do I know if the amount of data I have is enough to show statistical significance?

Thanks for help.
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 05:52 PM
You should start from here:

http://en.wikipedia.org/wiki/Statist...est_statistics

What is the actual statement that you want to test for significance?
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 06:04 PM
Depends on what claim you are trying to evaluate.
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 06:15 PM


Well for starters, here is a graph of what ive collected from 2010-11.

This shows the % of games (by day) that had more than 5 goals.

The square bracket number next to the days of the week represent total games played on that day.

An interesting finding, for example, is that on Thursdays there seemed to be a lot more scoring in west vs west games compared to the res and also a lot more scoring in WvW games on friday nights compared to the others.

Can i conclude that there is a significant cause here?
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 06:19 PM
Quote:
Originally Posted by toble
Can i conclude that there is a significant cause here?
I would start with this test:

http://en.wikipedia.org/wiki/One-way_ANOVA

Quote:
The ANOVA tests the null hypothesis that samples in two or more groups are drawn from the same population.
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 06:23 PM
Quote:
Originally Posted by toble
Can i conclude that there is a significant cause here?
Basically, the easiest statistical tests are asking something like "Given the day of the game has no impact at all...what is the chances of seeing data like this". That's probably a good place for you to start.

I'm not sure...but I imagine total goals (rather than % of games with goals over a certain amount) is a better variable to look at.
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 07:21 PM
Quote:
Originally Posted by Max Raker
Basically, the easiest statistical tests are asking something like "Given the day of the game has no impact at all...what is the chances of seeing data like this". That's probably a good place for you to start.

I'm not sure...but I imagine total goals (rather than % of games with goals over a certain amount) is a better variable to look at.
Thanks for the reply.
The reason I am not particularly interested in total goals is because I am researching over/under for sports betting and the result of a game with 14 total goals vs a 1 goal game is irrelevant since the result (over/under) is binary.

I appreciate the quick replies fellas
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 08:38 PM
Turn your problem into a statement. Is there any reason to think a priori that w vs w or e vs e would differ (in other words, can you dichotomize "travel some distance" vs "not traveling very far")? Might make the problem a bit easier.

If you're testing a binary dependent variable, then you're probably looking at a logistic regression, which is a bit trickier (to implement and to interpret) than anova.

If you dichotomize the variables into travel/don't travel (or same conf/different conference) vs. over/under, you can do a simple chi square. (You could do a chi square with e/e, e/w, w/e, but there's no formal test for significance available.)
(collecting NHL data) How do i determine statistical significance? Quote
11-16-2011 , 08:53 PM
Quote:
Originally Posted by toble
Thanks for the reply.
The reason I am not particularly interested in total goals is because I am researching over/under for sports betting and the result of a game with 14 total goals vs a 1 goal game is irrelevant since the result (over/under) is binary.

I appreciate the quick replies fellas
You can still use total goals to do this. You can get a 95% confidence interval for the mean of each set of games. If 5 goals is below the lower end of the confidence interval, then you are 95% confident that the mean you are testing is greater than 5 goals. Do that for each WW, EW, EE set. Then, run a one-way ANOVA or a Tukey's all pairs test to test for statistically significant differences between the means of WW, EW and EE.
(collecting NHL data) How do i determine statistical significance? Quote
11-17-2011 , 09:59 PM
Meh, looked back at the graph. Obvious all confidence intervals will include 5 goals or lower. ANOVA or Tukey's all pairs for mean differences still applies.
(collecting NHL data) How do i determine statistical significance? Quote
11-17-2011 , 11:54 PM
Quote:
Originally Posted by toble


Well for starters, here is a graph of what ive collected from 2010-11.

This shows the % of games (by day) that had more than 5 goals.

The square bracket number next to the days of the week represent total games played on that day.

An interesting finding, for example, is that on Thursdays there seemed to be a lot more scoring in west vs west games compared to the res and also a lot more scoring in WvW games on friday nights compared to the others.

Can i conclude that there is a significant cause here?
Yep. A couple of goalies in the western conference sucked last year.
(collecting NHL data) How do i determine statistical significance? Quote
11-18-2011 , 04:23 PM
i'm trying to think what can account for the differences...

some thoughts

1) Travel Time

East vs East - very little travel time...unless one of the teams is returning from a west coast game.

West vs West - more travel than E v E because the cities are spaced further apart.

East vs West - depends on who the home team is. If the west is home, then the east is likely on a road trip, and not only did they have to fly west, but each trip between cities is also further than usual. Also the possible 3 hour time difference, meaning a 7pm game in the west is like they are starting at 10pm.

If east is the home team...west is likely doing a road-trip of the east. the initial travel is the same as vice-versa, but then the travel time between cities is minimized compared to the reverse. Also playing a 7pm game that feels like 4pm is probably not as much a change in quality as when it feels like 10pm.

2) Conference positioning - since top 8 teams per conference make the playoffs, if you are playing a team in your own conference, that inherently could factor into the game. Especially during the end of the season, when usually you'd expect the game to be lower scoring as defence starts to reign supreme.

3) Level of skill - as said, goalies in the west sucked last year, which could lead to the difference in numbers.

I think the more data you are able to factor in, the better. This is a good start though and interesting.
(collecting NHL data) How do i determine statistical significance? Quote

      
m