Math/tech knowledge necessary to build a worthwhile predictive model - Sports Betting

TL;DR
Hi all -- I am very new to, but very interested in, the art of betting sports and would like to build a basic predictive model as part of my betting process. However, I'm wondering if it's even possible to build a basic model with any value given my basic math knowledge.

I am an accountant by trade. The highest math courses I've ever taken are business statistics and business calculus in college, and that was 15 years ago at this point (damn I got old quick).

I was, perhaps overly optimistically, hoping to develop a simple betting model using just a basic knowledge of statistics. And I have no delusions about ever developing something as complex as Emmanuel Perry's SALAD model which he uses to predict the outcome of NHL games -- but when I read his blog post describing the model, I was intimidated:

Quote:

Salad is an ensemble of 11 unique sub-models: 4 bagged logistic regression models, 1 XGBoost gradient-boosted trees model, 2 neural networks, 1 bagged naive Bayes model, 2 CatBoost gradient-boosted trees models and 1 random forest using fuzzy logic. Different feature sets were used among these sub-models in an effort to further increase diversity within the ensemble. A bagged logistic regression model was used for a stacking algorithm. My approach in building these ensemble components was loosely to optimize the mean score of the various models while minimizing collinearity. In plain terms, each of the sub-models should perform well on its own and no two sub-models should be overly alike in their output.

Can I build a model that will be useful with just a basic understanding of stats or do you need an applied statistics M.S. to really make something you can rely at all on? I'm not looking to do this professionally, just as a challenging hobby, but I'm apprehensive that building a model worth anything is possible given the jargon above.

Quote

06-04-2019 , 03:09 PM

fkjlhfdkjhkj.

Carpal \'Tunnel

Join Date: Sep 2013 Posts: 11,270

You can do anything you put your mind to, just need to want it enough.

Look deep into your self and determine if this is what you really want from your life right now. If it is, get some books and get learning.

Quote

06-04-2019 , 03:22 PM

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

accountants have a good history of success in dfs and many analytic endeavours so i'd guess they'd also be well suited to sports betting.

i feel like ensambling is what people do when they want to throw a ton of stuff against the wall and see what sticks. it does seem to work well for a lot of tasks particularly kaggle competitions, but i'm not a machine learning guy so i can't comment on its effectiveness with regard to sports betting.

at its core all you're trying to do for game modeling is project how many runs/points a team is going to score in a given matchup. not too hard right? now that we have our goal in mind, let's find things that are predictive of it.

welcome to the arena.

Quote

06-04-2019 , 04:55 PM

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

let me ask you this.

some people want encouragement and to be told they can do it.

other people want to be told no they can't do it because that is what motivates them.

what kind of person are you?

Quote

06-04-2019 , 05:08 PM

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

to answer your question i'd say basic probability knowledge and an understanding of statistics are important if you're building your own stuff. it will help ground your assumptions in stuff like independence, correlations, hypothesis testing, etc. you can learn it through khan academy. you probably won't win because the stuff you build doesn't just have to be better than joe schmoe but better than the current market at the time you bet it which can distill some pretty advanced methods. but you should be doing it because it's a fun hobby and a great excuse to tinker with sports data and analysis, not to make money at it. i say pick the smallest limit, niche sport you can find and have fun with it.

Quote

06-07-2019 , 03:14 PM

Monster_Zero

newbie

Join Date: Nov 2017 Posts: 27

Thanks, guys, for the thoughtful responses.

Quote:

Originally Posted by TomG

you probably won't win because the stuff you build doesn't just have to be better than joe schmoe but better than the current market at the time you bet it which can distill some pretty advanced methods. but you should be doing it because it's a fun hobby and a great excuse to tinker with sports data and analysis, not to make money at it.

This may be something I need to grapple with further. I agree that delving into this is likely to help me improve some skills that are useful in real life -- knowledge of stats, programming ability, data analysis skills -- but I am also not sure I'm mentally equipped to undertake this as a hobby if I'm not sure I'll be able to generate an ROI doing it.

Quote

06-07-2019 , 03:26 PM

Monster_Zero

newbie

Join Date: Nov 2017 Posts: 27

Quote:

Originally Posted by TomG

at its core all you're trying to do for game modeling is project how many runs/points a team is going to score in a given matchup. not too hard right? now that we have our goal in mind, let's find things that are predictive of it.

Perhaps this is a dumb question....but beyond projecting an EV for runs/points/goals/whatever in a given match, wont we also need to know the distribution of those goals/points/runs/etc. per game?

If Team A gets shut out 9/10 games but in the games they do score in, they always score 100 goals, they'rev averaging 10 goals per game but still losing 9/10 games to Team B, who averages 1 goal per game but scores that 1 goal every singe game without failure.

And this is truly a dumb question....but lets say we build a model that attempts to predict the outcome of a game by projecting goals scored by both teams. Team A is projected to score 5.3 goals, Team B is projected to score 4.9 goals -- what is the projected outcome of this game? Is it 5-5, which is what common sense math rounding would seem to indicate?

Or does Team A beat Team B 5-4 because you can't score partial goals and you're only projecting a goal event occurs when your model reaches an integer value?

Quote

06-07-2019 , 04:03 PM

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

Quote:

Originally Posted by Monster_Zero

Well you can do your best to improve your chances of profiting by betting into overnights, reduced juice, lines shopping, etc.

Quote:

Originally Posted by Monster_Zero

It can happen that way you describe and good for you for thinking that way but central limit theorem should make most stuff end up normally distributed. Just as long as you understand the assumptions behind the CLT and you check our data distributions then I'd say the burden is the other side to show unique cases where the CLT doesn't hold

Quote:

Originally Posted by Monster_Zero

And this is truly a dumb question....but lets say we build a model that attempts to predict the outcome of a game by projecting goals scored by both teams. Team A is projected to score 5.3 goals, Team B is projected to score 4.9 goals -- what is the projected outcome of this game? Is it 5-5, which is what common sense math rounding would seem to indicate?

Or does Team A beat Team B 5-4 because you can't score partial goals and you're only projecting a goal event occurs when your model reaches an integer value?

Pythagorean Expectation should provide a reasonable estimate for converting between win% and two teams expected points scored. You'll either need to do your own research into the right exponent to use for the specific sport or Google search for others. It's a pretty well-explored area of research. It has its drawbacks over a single game sample size but it's just an estimate as is your estimate of projected points scored.

Or you can avoid this issue altogether by creating a game simulator. There are lots of approaches which is why it's such a fun, creative activity.

Quote

06-07-2019 , 04:27 PM

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

spanky says the feeling of beating the closing line is better than sex do you put the same effort into that as you do getting laid?

Quote

06-07-2019 , 04:54 PM

#10

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

great questions buddy what sport are we going to start with?

Quote

06-07-2019 , 11:12 PM

#11

fkjlhfdkjhkj.

Carpal \'Tunnel

Join Date: Sep 2013 Posts: 11,270

I think you have a huge advantage a lot of us didn't have. There's tons of "data science" courses available whether on YouTube or paying $10 for one on Udemy and textbooks about statistical theory easily available. There's also tons of libraries already developed in popular programming languages to easily implement ML techniques for you.

When I started to incorporate machine learning concepts into my model back in 2005 I had to go to UNLV and illegally log into their Wifi so I could download academic articles about the topic instead of paying $37 or some ridiculous amount for each article from the journal. Then I had to go and program all the stuff in Python myself. Now there's hundreds of textbooks that condenses all that info into easy to read formats. The field has hit a wall too in terms of progress (though some of the new toplogical data analysis techniques in theory look promising, no one has been able to find a practical use for it)

I was talking to some girl I met a couple days ago and she's doing a masters in statistics while working as a stat consultant for some horse racing syndicate (lol) and all she does is just use the scikit-learn library in Python. Pluses and minuses to it. When talking to her, it's obvious she didn't understand the theory behind the processes she was using, but she just knew "do machine learning!" and these type of libraries make it trivial to do that without understanding.

I think that's apparently in the SALAD model too. All those methods describes are different methods for different types of data. Pretty unlikely that one dataset fits all those models and then combining them to make some mega model is so bad from a statistical point of view. But the person probably read about data science methods, imported the data, and hit the machine learning button without understanding if clicking the button even makes sense.

My suggestion is you don't need a PhD level of statistical understanding, but I would first focus on classes around the idea of interpretation of data. The elements of statistical learning is a classic book (that you could probably find for free). Try and understand that and then get a data science bootcamp type course from Udemy or coursera or similar place. Without the foundation it will just be trial and error and you won't develop a deep understanding of your model and data.

Last edited by fkjlhfdkjhkj.; 06-07-2019 at 11:18 PM.

Quote

06-08-2019 , 12:10 AM

#12

TomG

Pooh-Bah

Join Date: Jun 2004 Posts: 4,294

sounds good man i'm in. we'll crowdsource our education together. i am not a machine learning guy so we can learn it together. let's pick out a series of classes on udemy and we can do them together, create a study group, and get to work. who's with me?

Quote

06-08-2019 , 10:25 PM

#13

Hammerzitzen

journeyman

Join Date: Dec 2011 Posts: 233

Im game

Quote

06-09-2019 , 01:44 PM

#14

MisterRodriguez

journeyman

Join Date: May 2017 Posts: 301

Quote:

Originally Posted by TomG

spanky says the feeling of beating the closing line is better than sex do you put the same effort into that as you do getting laid?

Spanky is right

Quote

06-09-2019 , 01:47 PM

#15

fkjlhfdkjhkj.

Carpal \'Tunnel

Join Date: Sep 2013 Posts: 11,270

Do people really look at the closing line anymore? Or is it just a way to be like I lost all my money, but I bet closing line so I just got screwed thing

I haven't considered the closing line for years in determining if my bet was good or not.

Quote

06-09-2019 , 02:03 PM

#16

heehaww

Pooh-Bah

Join Date: Aug 2011 Posts: 5,081

Has the closing line stopped being sharp at pinny and/or matchbook? Was it never sharp?

Quote

06-09-2019 , 08:35 PM

#17

MisterRodriguez

journeyman

Join Date: May 2017 Posts: 301

The closing line in anything below 5k is subject to constant eviden line manipulation on stuff that would suprise you e.g La liga totals

Tbh the beat the closing line mantra started by justin7 was the n1 principle and its still a sign most likely in any major american sport,but for medium tier stuff like ATP small tourneys Over Under totals etc is becoming less and less meaningfull has Pinnacle just increases VIG and have Macdonalds limits everywhere

Quote

06-09-2019 , 11:22 PM

#18

Monster_Zero

newbie

Join Date: Nov 2017 Posts: 27

https://micromasters.mit.edu/ds/

It is not Udemy, but MIT offers a free auditable mini-masters in data science course through a similar open courseware site called EdX.

I think this is what I am going to work through - I like the structure of the program and the topics seem very relevant to what I hope to accomplish through building models. I particularly like that the program seems to focus on using Python instead of R for the data analysis (Python more applicable to my day job than R would be) and that it provides an introduction to machine learning. I think this will make for a good intro to the topic before diving into Elements of Statisfical Learning, which I understand to be an amazing resource but also probably a bit too heady for me to make sense of a this point.

A study group for this would be great, if anyone is interested like Tom or Hammer.

Quote

06-10-2019 , 07:38 AM

#19

v123

stranger

Join Date: Jun 2019 Posts: 1

Hey guys, looks like a lot of you are interested in maths. The GRE exam is the best for you all. Give it a try. I am sure it will go good.

Quote

06-10-2019 , 10:19 AM

#20

fkjlhfdkjhkj.

Carpal \'Tunnel

Join Date: Sep 2013 Posts: 11,270

Now here's the interesting thing

is v123 a machine learning bot that searches for things to do with math to promote his scam site

or is there a real person behind it

it would be more efficient to be a bot since the probability anyone falls for it is so low it wouldn't be worth a human persons time to do it

Quote

06-10-2019 , 11:20 AM

#21

banned

Join Date: Mar 2017 Posts: 1,307

I would love to hear what some of you guys think about the closing line stuff some more. I guess I fell into that justin7 stuff that beating the closing line in straight markets is all that matters. Looking back on my results, it certainly worked for me up until 2016. From 2017 on Ive "run bad" in straights for so long and for so much that now I dont even really bother looking at straights, unless something is ridiculous.

Its something Ive thought about a lot though. Did I just run good for a while? Or have I really "ran bad" for like 2.5 years? I've always treated pinnacle as Scripture, but that doesn't seem to be the case anymore. Was their move from the US to blame? I thought it was still pretty easy to bet on pinny from the US though.

I always did sort of question it though. I mean if these markets are so efficient, especially for like playoff NBA or NFL games, how come theres so much movement right up until game time? And not always back and forth like two guys going to war...Id sometimes see big swings in one direction minutes before kickoff without any major news. So the "true line" really changed by like 10% or more from 30 seconds ago? It didnt always make sense to me but all the smart guys just said to beat the closing line and like I said, it seemed to have worked for a while.

Does anyone else have a similar experience?

Quote

06-10-2019 , 05:29 PM

#22

MisterRodriguez

journeyman

Join Date: May 2017 Posts: 301

There s not much to add to what i said above.

The closing line is still the best way to gaudge the perceived EV on any given wager in major markets,with relevant limits.I dont think its possible to manipulate Asia and Cris on major stuff and get away with it(no sharp action correcting the move) combined with having enough soft outs to make the blow/manipulation work(idk the american betting lanscape,but certainly this part can be easy to circumvent with localas and the new state books etc).Then again i might be wrong,lol this is a funny example-Billy walters in the 60 minutes interview is clearly giving wrong instructions to his runner on a major market

However is evident that as Pinnacle(and CRIS) liquidity decreases,and new soft outs emerge manipulating markets is just to tempting

In stuff with less than 5k per click i would be very wary if i was still line grinding and constantly monitor manipulation signs - suspect back and forth separated by a short time frame

Any stuff below 2k click unless ridiculous edge buffer i wouldnt touch

In conclusion the American exodus back in the day combined with the pussyfication of Pinnacle manifested in tiny limits and higher vig made the market less interesting to be a good originator.Some went away,and some stayed mitigating the lower limits/turnover with more manipulation/edge size

Last edited by MisterRodriguez; 06-10-2019 at 05:45 PM.

Quote

06-12-2019 , 05:26 PM

#23

Malachii

Pooh-Bah

Join Date: Feb 2005 Posts: 4,072

Models represent a simplified version of reality. I think you're delusional if you think you can build a more predictive model than the models that the books have, who have hired people with a better mathematical background than you, who have better access to data than you do, and who do this as a full time occupation.

The way to make money betting sports is to understand a particular sport at a deep and nuanced level and to do your homework. You're not going to beat Vegas by building a better mouse trap if you're asking these kinds of questions.

That having been said, since you're not going to listen to me anyways, I would recommend reading "Trading Bases" by Joe Peta. That'll get you off on the right foot regardless of which sport you choose to pursue.

Quote

06-12-2019 , 07:23 PM

#24

Sabaneta

adept

Join Date: May 2009 Posts: 1,175

Quote:

Originally Posted by Malachii

You must have never met a linesmaker

Quote

06-13-2019 , 01:48 AM

#25

Malachii

Pooh-Bah

Join Date: Feb 2005 Posts: 4,072

Quote:

Originally Posted by Sabaneta

You must have never met a linesmaker

Huh? No, I haven't met a linesmaker. How would that possibly invalidate the underlying thesis here? Do you honestly think they're not hiring people with sophisticated quant backgrounds to help set lines (or hiring outside agencies that employ these people)? If they weren't, they'd be hemmoraging money out of every orifice from their sportsbooks.

I do happen to know quite a few people with very sophisticated backgrounds in math and computer science / machine learning, believe me when I say that an average Joe f*cking around with Excel or R will have zero edge here.

Quote

Page 1 of 7

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 1 of 7

First

1 2 3 4 5 6

Last