Open Side Menu Go to the Top
Register
Official Advanced Math Help Thread Official Advanced Math Help Thread

01-27-2012 , 07:44 PM
After hours of searching I still can't find anything on the internet pertaining to my question. So hopefully the statisticians here can lend a hand.

Simplest example:

I think a basketball player's shooting % is determined by 1-his distance from the hoop(feet), how well he is being guarded on a scale from 0 to 1, and his height(inches).

I believe that shot distribution when the player is 17 feet from the hoop, being guarded at .67 level, and is 75 inches tall goes as such:
-2% of the time he shoots at each level from 20%-29%. So, 2x10=20%
of the distribution.
-6% of the time he shoots at each level from 30% to 34%.
-4% at each level from 35% to 39%
-3% at each level from 40% to 49%
-0% at every other level.

Quick question: Is that the correct way to do this or would it be better to say something like 40% of the time he shoots at level 100% and 60% he shoots at level 0%?

Now say I believe that shot distribution when the player is 14 feet from the hoop, being guarded at .54 level, and is 77 inches tall goes as such:
-2% of the time he shoots at each level from 30%-39%.
-6% of the time he shoots at each level from 40% to 44%.
-4% at each level from 45% to 49%
-3% at each level from 50% to 59%
-0% at every other level.

Last one...I believe that shot distribution when the player is 12 feet from the hoop, being guarded at .33 level, and is 78 inches tall goes as such:
-2% of the time he shoots at each level from 35%-44%.
-6% of the time he shoots at each level from 45% to 49%.
-4% at each level from 50% to 54%
-3% at each level from 55% to 64%
-0% at every other level.

I see player who is 75 inches tall, 17 feet from the hoop, and being guarded at .67 level(first player described) miss a shot.

Now I can update that particular distribution using Bayesian Inference. But how do I update the other distributions given that same information(the miss)? It isn't reasonable to think I should have to wait to see a 78 inch player shoot from 12 feet being guarded at .33 before I can update that particular distribution.
Official Advanced Math Help Thread Quote
01-28-2012 , 01:48 AM
Dear diary,


No one is going to do your homework for you.
Official Advanced Math Help Thread Quote
01-29-2012 , 06:58 AM
Subscribed
Official Advanced Math Help Thread Quote
01-29-2012 , 10:02 PM
-get data
-run regressions
-create formulas
-????
-profit
Official Advanced Math Help Thread Quote
01-29-2012 , 10:25 PM
get data
run regressions
make horrible bets relative to reality but good relative to your regression
-????
-insurmountable debt
Official Advanced Math Help Thread Quote
01-29-2012 , 10:57 PM
U are forced to bet real $ while creating your formula ?
Official Advanced Math Help Thread Quote
01-29-2012 , 11:00 PM
prolly easier to just wait for thousands of trials to pass to make sure its ok
Official Advanced Math Help Thread Quote
01-30-2012 , 03:18 AM
For the last 5 WTA Tennis Seasons I try to figure out if the market has overall become sharper, in order to find out if back testing relative to closing odds is justified over the whole period or only the last 1-2 years.

I have roughly 2500 games per season.

What statistical test would you use for this?

I squared the pinnacle vig free winning lines distance from 1 for all games of a year and divided it by the amount of games, essentialy calculating the variance from theoretical expected values.

I.e. if Pinnacle vig free is 1.5 decimal, (1-0.666)^2.


2011 0,1772005046
2010 0,1784442168
2009 0,1414215339
2008 0,1786097325
2007 0,1677804943

mean: 0,1686912964

Over the mentioned sample size of n~2500 this should be enough to resonably assume that WTA betting hasnt gotten sharper over the last 5 years, correct?

Last edited by -ev?; 01-30-2012 at 03:39 AM.
Official Advanced Math Help Thread Quote
01-30-2012 , 11:53 AM
Quote:
Originally Posted by -ev?
For the last 5 WTA Tennis Seasons I try to figure out if the market has overall become sharper, in order to find out if back testing relative to closing odds is justified over the whole period or only the last 1-2 years.

I have roughly 2500 games per season.

What statistical test would you use for this?

I squared the pinnacle vig free winning lines distance from 1 for all games of a year and divided it by the amount of games, essentialy calculating the variance from theoretical expected values.

I.e. if Pinnacle vig free is 1.5 decimal, (1-0.666)^2.


2011 0,1772005046
2010 0,1784442168
2009 0,1414215339
2008 0,1786097325
2007 0,1677804943

mean: 0,1686912964

Over the mentioned sample size of n~2500 this should be enough to resonably assume that WTA betting hasnt gotten sharper over the last 5 years, correct?
you should probably first take the square root of that variance-like metric so it's in the same units as the mean. then compare it to the mean. but overall that seems like a sensible approach to measuring it.
Official Advanced Math Help Thread Quote
01-30-2012 , 12:03 PM
actually, it's probably just better to take the squared difference between the opening no-vig decimal odds and the closing no-vig decimal odds.
Official Advanced Math Help Thread Quote
01-30-2012 , 01:10 PM
Quote:
Originally Posted by illfuuptonight
After hours of searching I still can't find anything on the internet pertaining to my question. So hopefully the statisticians here can lend a hand.

Simplest example:

I think a basketball player's shooting % is determined by 1-his distance from the hoop(feet), how well he is being guarded on a scale from 0 to 1, and his height(inches).

I believe that shot distribution when the player is 17 feet from the hoop, being guarded at .67 level, and is 75 inches tall goes as such:
-2% of the time he shoots at each level from 20%-29%. So, 2x10=20%
of the distribution.
-6% of the time he shoots at each level from 30% to 34%.
-4% at each level from 35% to 39%
-3% at each level from 40% to 49%
-0% at every other level.

Quick question: Is that the correct way to do this or would it be better to say something like 40% of the time he shoots at level 100% and 60% he shoots at level 0%?

Now say I believe that shot distribution when the player is 14 feet from the hoop, being guarded at .54 level, and is 77 inches tall goes as such:
-2% of the time he shoots at each level from 30%-39%.
-6% of the time he shoots at each level from 40% to 44%.
-4% at each level from 45% to 49%
-3% at each level from 50% to 59%
-0% at every other level.

Last one...I believe that shot distribution when the player is 12 feet from the hoop, being guarded at .33 level, and is 78 inches tall goes as such:
-2% of the time he shoots at each level from 35%-44%.
-6% of the time he shoots at each level from 45% to 49%.
-4% at each level from 50% to 54%
-3% at each level from 55% to 64%
-0% at every other level.

I see player who is 75 inches tall, 17 feet from the hoop, and being guarded at .67 level(first player described) miss a shot.

Now I can update that particular distribution using Bayesian Inference. But how do I update the other distributions given that same information(the miss)? It isn't reasonable to think I should have to wait to see a 78 inch player shoot from 12 feet being guarded at .33 before I can update that particular distribution.
, he said, hopeful that rsigley and/or TomG could help him out with the OP question.
Official Advanced Math Help Thread Quote
01-30-2012 , 01:35 PM
dunno maybe rsigley would want to answer. i'm not really sure. total bases still uses frequentist reasoning but there's a lot of advanced stat/math textbooks on the subject if you're willing to learn the info is there
Official Advanced Math Help Thread Quote
01-30-2012 , 07:35 PM
-ev?,

Consider log scoring and limits.
Official Advanced Math Help Thread Quote
01-31-2012 , 04:06 PM
Quote:
Originally Posted by -ev?
For the last 5 WTA Tennis Seasons I try to figure out if the market has overall become sharper, in order to find out if back testing relative to closing odds is justified over the whole period or only the last 1-2 years.

I have roughly 2500 games per season.

What statistical test would you use for this?

I squared the pinnacle vig free winning lines distance from 1 for all games of a year and divided it by the amount of games, essentialy calculating the variance from theoretical expected values.

I.e. if Pinnacle vig free is 1.5 decimal, (1-0.666)^2.


2011 0,1772005046
2010 0,1784442168
2009 0,1414215339
2008 0,1786097325
2007 0,1677804943

mean: 0,1686912964

Over the mentioned sample size of n~2500 this should be enough to resonably assume that WTA betting hasnt gotten sharper over the last 5 years, correct?
This could be explained by other factors, such as increased distribution of near-pick events, which would increase the variance.
Official Advanced Math Help Thread Quote
01-31-2012 , 05:42 PM
If I wanted to learn how to:

-get data
-run regressions
-create formulas
-????
-profit

Where would I start?

For what it's worth, the reason I'm interested isn't even close to what the OP is doing.
Official Advanced Math Help Thread Quote
01-31-2012 , 09:12 PM
Negative,

These are the avg odds on the favorite for the years:
avg smaller odds
2011 1.4416368078
2010 1.4468241578
2009 1.3874011706
2008 1.4382330472
2007 1.4656253219

This is the distance to 2 for the favorite squared

avg (2-smaller odds)^2
2011 0.3728082248
2010 0.3654913837
2009 0.4282047519
2008 0.3769435009
2007 0.4215296897

Looks like games are balanced out.


I also mistakenly used the short version for vig free, here is updated with the root:

Variance Std
2011 0.1961877702 0.4429308864
2010 0.1935183146 0.4399071659
2009 0.1948075505 0.4413700834
2008 0.1918060137 0.4379566345
2007 0.1801504604 0.4244413509
arith mean: 0.1912552205 0.4373273608

Over PM someone asked for the data source, http://www.tennis-data.co.uk/alldata.php.

Data comes without pinnacle open odds, I will use the log scoring as proposed tommorw and post results, but so far it looks like for backtesting purposes WTA data is valid for the last 5 years.
Official Advanced Math Help Thread Quote
02-02-2012 , 12:32 AM
do you really think regression can explain a complex model like sports?
Official Advanced Math Help Thread Quote

      
m