Open Side Menu Go to the Top
Register
Math/tech knowledge necessary to build a worthwhile predictive model Math/tech knowledge necessary to build a worthwhile predictive model

06-04-2019 , 02:56 PM
Cliff Notes
  • Can you build a simple predictive model that is worth anything with a basic knowledge of statistics?

TL;DR
Hi all -- I am very new to, but very interested in, the art of betting sports and would like to build a basic predictive model as part of my betting process. However, I'm wondering if it's even possible to build a basic model with any value given my basic math knowledge.

I am an accountant by trade. The highest math courses I've ever taken are business statistics and business calculus in college, and that was 15 years ago at this point (damn I got old quick).

I was, perhaps overly optimistically, hoping to develop a simple betting model using just a basic knowledge of statistics. And I have no delusions about ever developing something as complex as Emmanuel Perry's SALAD model which he uses to predict the outcome of NHL games -- but when I read his blog post describing the model, I was intimidated:

Quote:
Salad is an ensemble of 11 unique sub-models: 4 bagged logistic regression models, 1 XGBoost gradient-boosted trees model, 2 neural networks, 1 bagged naive Bayes model, 2 CatBoost gradient-boosted trees models and 1 random forest using fuzzy logic. Different feature sets were used among these sub-models in an effort to further increase diversity within the ensemble. A bagged logistic regression model was used for a stacking algorithm. My approach in building these ensemble components was loosely to optimize the mean score of the various models while minimizing collinearity. In plain terms, each of the sub-models should perform well on its own and no two sub-models should be overly alike in their output.
Can I build a model that will be useful with just a basic understanding of stats or do you need an applied statistics M.S. to really make something you can rely at all on? I'm not looking to do this professionally, just as a challenging hobby, but I'm apprehensive that building a model worth anything is possible given the jargon above.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-04-2019 , 03:09 PM
You can do anything you put your mind to, just need to want it enough.

Look deep into your self and determine if this is what you really want from your life right now. If it is, get some books and get learning.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-04-2019 , 03:22 PM
accountants have a good history of success in dfs and many analytic endeavours so i'd guess they'd also be well suited to sports betting.

i feel like ensambling is what people do when they want to throw a ton of stuff against the wall and see what sticks. it does seem to work well for a lot of tasks particularly kaggle competitions, but i'm not a machine learning guy so i can't comment on its effectiveness with regard to sports betting.

at its core all you're trying to do for game modeling is project how many runs/points a team is going to score in a given matchup. not too hard right? now that we have our goal in mind, let's find things that are predictive of it.

welcome to the arena.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-04-2019 , 04:55 PM
let me ask you this.

some people want encouragement and to be told they can do it.

other people want to be told no they can't do it because that is what motivates them.

what kind of person are you?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-04-2019 , 05:08 PM
to answer your question i'd say basic probability knowledge and an understanding of statistics are important if you're building your own stuff. it will help ground your assumptions in stuff like independence, correlations, hypothesis testing, etc. you can learn it through khan academy. you probably won't win because the stuff you build doesn't just have to be better than joe schmoe but better than the current market at the time you bet it which can distill some pretty advanced methods. but you should be doing it because it's a fun hobby and a great excuse to tinker with sports data and analysis, not to make money at it. i say pick the smallest limit, niche sport you can find and have fun with it.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 03:14 PM
Thanks, guys, for the thoughtful responses.

Quote:
Originally Posted by TomG
you probably won't win because the stuff you build doesn't just have to be better than joe schmoe but better than the current market at the time you bet it which can distill some pretty advanced methods. but you should be doing it because it's a fun hobby and a great excuse to tinker with sports data and analysis, not to make money at it.
This may be something I need to grapple with further. I agree that delving into this is likely to help me improve some skills that are useful in real life -- knowledge of stats, programming ability, data analysis skills -- but I am also not sure I'm mentally equipped to undertake this as a hobby if I'm not sure I'll be able to generate an ROI doing it.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 03:26 PM
Quote:
Originally Posted by TomG
at its core all you're trying to do for game modeling is project how many runs/points a team is going to score in a given matchup. not too hard right? now that we have our goal in mind, let's find things that are predictive of it.
Perhaps this is a dumb question....but beyond projecting an EV for runs/points/goals/whatever in a given match, wont we also need to know the distribution of those goals/points/runs/etc. per game?

If Team A gets shut out 9/10 games but in the games they do score in, they always score 100 goals, they'rev averaging 10 goals per game but still losing 9/10 games to Team B, who averages 1 goal per game but scores that 1 goal every singe game without failure.

And this is truly a dumb question....but lets say we build a model that attempts to predict the outcome of a game by projecting goals scored by both teams. Team A is projected to score 5.3 goals, Team B is projected to score 4.9 goals -- what is the projected outcome of this game? Is it 5-5, which is what common sense math rounding would seem to indicate?

Or does Team A beat Team B 5-4 because you can't score partial goals and you're only projecting a goal event occurs when your model reaches an integer value?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 04:03 PM
Quote:
Originally Posted by Monster_Zero
This may be something I need to grapple with further. I agree that delving into this is likely to help me improve some skills that are useful in real life -- knowledge of stats, programming ability, data analysis skills -- but I am also not sure I'm mentally equipped to undertake this as a hobby if I'm not sure I'll be able to generate an ROI doing it.
Well you can do your best to improve your chances of profiting by betting into overnights, reduced juice, lines shopping, etc.

Quote:
Originally Posted by Monster_Zero
Perhaps this is a dumb question....but beyond projecting an EV for runs/points/goals/whatever in a given match, wont we also need to know the distribution of those goals/points/runs/etc. per game?

If Team A gets shut out 9/10 games but in the games they do score in, they always score 100 goals, they'rev averaging 10 goals per game but still losing 9/10 games to Team B, who averages 1 goal per game but scores that 1 goal every singe game without failure.
It can happen that way you describe and good for you for thinking that way but central limit theorem should make most stuff end up normally distributed. Just as long as you understand the assumptions behind the CLT and you check our data distributions then I'd say the burden is the other side to show unique cases where the CLT doesn't hold

Quote:
Originally Posted by Monster_Zero
And this is truly a dumb question....but lets say we build a model that attempts to predict the outcome of a game by projecting goals scored by both teams. Team A is projected to score 5.3 goals, Team B is projected to score 4.9 goals -- what is the projected outcome of this game? Is it 5-5, which is what common sense math rounding would seem to indicate?

Or does Team A beat Team B 5-4 because you can't score partial goals and you're only projecting a goal event occurs when your model reaches an integer value?
Pythagorean Expectation should provide a reasonable estimate for converting between win% and two teams expected points scored. You'll either need to do your own research into the right exponent to use for the specific sport or Google search for others. It's a pretty well-explored area of research. It has its drawbacks over a single game sample size but it's just an estimate as is your estimate of projected points scored.

Or you can avoid this issue altogether by creating a game simulator. There are lots of approaches which is why it's such a fun, creative activity.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 04:27 PM
spanky says the feeling of beating the closing line is better than sex do you put the same effort into that as you do getting laid?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 04:54 PM
great questions buddy what sport are we going to start with?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-07-2019 , 11:12 PM
I think you have a huge advantage a lot of us didn't have. There's tons of "data science" courses available whether on YouTube or paying $10 for one on Udemy and textbooks about statistical theory easily available. There's also tons of libraries already developed in popular programming languages to easily implement ML techniques for you.

When I started to incorporate machine learning concepts into my model back in 2005 I had to go to UNLV and illegally log into their Wifi so I could download academic articles about the topic instead of paying $37 or some ridiculous amount for each article from the journal. Then I had to go and program all the stuff in Python myself. Now there's hundreds of textbooks that condenses all that info into easy to read formats. The field has hit a wall too in terms of progress (though some of the new toplogical data analysis techniques in theory look promising, no one has been able to find a practical use for it)

I was talking to some girl I met a couple days ago and she's doing a masters in statistics while working as a stat consultant for some horse racing syndicate (lol) and all she does is just use the scikit-learn library in Python. Pluses and minuses to it. When talking to her, it's obvious she didn't understand the theory behind the processes she was using, but she just knew "do machine learning!" and these type of libraries make it trivial to do that without understanding.

I think that's apparently in the SALAD model too. All those methods describes are different methods for different types of data. Pretty unlikely that one dataset fits all those models and then combining them to make some mega model is so bad from a statistical point of view. But the person probably read about data science methods, imported the data, and hit the machine learning button without understanding if clicking the button even makes sense.

My suggestion is you don't need a PhD level of statistical understanding, but I would first focus on classes around the idea of interpretation of data. The elements of statistical learning is a classic book (that you could probably find for free). Try and understand that and then get a data science bootcamp type course from Udemy or coursera or similar place. Without the foundation it will just be trial and error and you won't develop a deep understanding of your model and data.

Last edited by fkjlhfdkjhkj.; 06-07-2019 at 11:18 PM.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-08-2019 , 12:10 AM
sounds good man i'm in. we'll crowdsource our education together. i am not a machine learning guy so we can learn it together. let's pick out a series of classes on udemy and we can do them together, create a study group, and get to work. who's with me?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-08-2019 , 10:25 PM
Im game
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-09-2019 , 01:44 PM
Quote:
Originally Posted by TomG
spanky says the feeling of beating the closing line is better than sex do you put the same effort into that as you do getting laid?
Spanky is right
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-09-2019 , 01:47 PM
Do people really look at the closing line anymore? Or is it just a way to be like I lost all my money, but I bet closing line so I just got screwed thing

I haven't considered the closing line for years in determining if my bet was good or not.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-09-2019 , 02:03 PM
Has the closing line stopped being sharp at pinny and/or matchbook? Was it never sharp?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-09-2019 , 08:35 PM
The closing line in anything below 5k is subject to constant eviden line manipulation on stuff that would suprise you e.g La liga totals

Tbh the beat the closing line mantra started by justin7 was the n1 principle and its still a sign most likely in any major american sport,but for medium tier stuff like ATP small tourneys Over Under totals etc is becoming less and less meaningfull has Pinnacle just increases VIG and have Macdonalds limits everywhere
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-09-2019 , 11:22 PM
https://micromasters.mit.edu/ds/

It is not Udemy, but MIT offers a free auditable mini-masters in data science course through a similar open courseware site called EdX.

I think this is what I am going to work through - I like the structure of the program and the topics seem very relevant to what I hope to accomplish through building models. I particularly like that the program seems to focus on using Python instead of R for the data analysis (Python more applicable to my day job than R would be) and that it provides an introduction to machine learning. I think this will make for a good intro to the topic before diving into Elements of Statisfical Learning, which I understand to be an amazing resource but also probably a bit too heady for me to make sense of a this point.

A study group for this would be great, if anyone is interested like Tom or Hammer.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-10-2019 , 07:38 AM
Hey guys, looks like a lot of you are interested in maths. The GRE exam is the best for you all. Give it a try. I am sure it will go good.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-10-2019 , 10:19 AM
Now here's the interesting thing

is v123 a machine learning bot that searches for things to do with math to promote his scam site

or is there a real person behind it

it would be more efficient to be a bot since the probability anyone falls for it is so low it wouldn't be worth a human persons time to do it
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-10-2019 , 11:20 AM
I would love to hear what some of you guys think about the closing line stuff some more. I guess I fell into that justin7 stuff that beating the closing line in straight markets is all that matters. Looking back on my results, it certainly worked for me up until 2016. From 2017 on Ive "run bad" in straights for so long and for so much that now I dont even really bother looking at straights, unless something is ridiculous.

Its something Ive thought about a lot though. Did I just run good for a while? Or have I really "ran bad" for like 2.5 years? I've always treated pinnacle as Scripture, but that doesn't seem to be the case anymore. Was their move from the US to blame? I thought it was still pretty easy to bet on pinny from the US though.

I always did sort of question it though. I mean if these markets are so efficient, especially for like playoff NBA or NFL games, how come theres so much movement right up until game time? And not always back and forth like two guys going to war...Id sometimes see big swings in one direction minutes before kickoff without any major news. So the "true line" really changed by like 10% or more from 30 seconds ago? It didnt always make sense to me but all the smart guys just said to beat the closing line and like I said, it seemed to have worked for a while.

Does anyone else have a similar experience?
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-10-2019 , 05:29 PM
There s not much to add to what i said above.

The closing line is still the best way to gaudge the perceived EV on any given wager in major markets,with relevant limits.I dont think its possible to manipulate Asia and Cris on major stuff and get away with it(no sharp action correcting the move) combined with having enough soft outs to make the blow/manipulation work(idk the american betting lanscape,but certainly this part can be easy to circumvent with localas and the new state books etc).Then again i might be wrong,lol this is a funny example-Billy walters in the 60 minutes interview is clearly giving wrong instructions to his runner on a major market

However is evident that as Pinnacle(and CRIS) liquidity decreases,and new soft outs emerge manipulating markets is just to tempting

In stuff with less than 5k per click i would be very wary if i was still line grinding and constantly monitor manipulation signs - suspect back and forth separated by a short time frame

Any stuff below 2k click unless ridiculous edge buffer i wouldnt touch

In conclusion the American exodus back in the day combined with the pussyfication of Pinnacle manifested in tiny limits and higher vig made the market less interesting to be a good originator.Some went away,and some stayed mitigating the lower limits/turnover with more manipulation/edge size

Last edited by MisterRodriguez; 06-10-2019 at 05:45 PM.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-12-2019 , 05:26 PM
Models represent a simplified version of reality. I think you're delusional if you think you can build a more predictive model than the models that the books have, who have hired people with a better mathematical background than you, who have better access to data than you do, and who do this as a full time occupation.

The way to make money betting sports is to understand a particular sport at a deep and nuanced level and to do your homework. You're not going to beat Vegas by building a better mouse trap if you're asking these kinds of questions.

That having been said, since you're not going to listen to me anyways, I would recommend reading "Trading Bases" by Joe Peta. That'll get you off on the right foot regardless of which sport you choose to pursue.
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-12-2019 , 07:23 PM
Quote:
Originally Posted by Malachii
Models represent a simplified version of reality. I think you're delusional if you think you can build a more predictive model than the models that the books have, who have hired people with a better mathematical background than you, who have better access to data than you do, and who do this as a full time occupation.

.
You must have never met a linesmaker
Math/tech knowledge necessary to build a worthwhile predictive model Quote
06-13-2019 , 01:48 AM
Quote:
Originally Posted by Sabaneta
You must have never met a linesmaker
Huh? No, I haven't met a linesmaker. How would that possibly invalidate the underlying thesis here? Do you honestly think they're not hiring people with sophisticated quant backgrounds to help set lines (or hiring outside agencies that employ these people)? If they weren't, they'd be hemmoraging money out of every orifice from their sportsbooks.

I do happen to know quite a few people with very sophisticated backgrounds in math and computer science / machine learning, believe me when I say that an average Joe f*cking around with Excel or R will have zero edge here.
Math/tech knowledge necessary to build a worthwhile predictive model Quote

      
m