Open Side Menu Go to the Top
Register
Crowdsource Syndicate Sports Betting Crowdsource Syndicate Sports Betting

03-16-2021 , 02:16 PM
Cool paper. I read something similar called LinNet the guy used to evaluate NBA lineups against each other by building a similar type of graph.

Snooker seems like it has more player vs player interaction than I thought. Was thinking it would be more like bowling where it's just you vs the balls. But no, there's a lot of back and forth within a frame unlike billiards. That could make the matchup graph approach you bring us a lot more useful than I thought going into this.
Crowdsource Syndicate Sports Betting Quote
03-16-2021 , 04:38 PM
Quote:
Originally Posted by dawai
News sites are reporting today that, surprisingly, John Higgins is the best snooker player ever according to a new study at University of Limerick.

https://academic.oup.com/comnet/arti...DKibfLGO8.link

This might be a good read also for other 1 vs 1 sports.
The link contains a link to a Github with tons of data in csv.
Source of most data is cuetracker.net
This is apparently big news in the UK. Higgins is solid. He recently had an epic run destroying the field in the Players Championship and defeating Ronnie O'Sullivan 10-3 in the final.

Interesting that O'Sullivan has a 37-31 heads up advantage over Higgins, which is statistically significant. The best odds I've seen for Judd Trump in the last few years he was -115 against O'Sullivan in the 2019 Northern Ireland Open, so there is not much question Trump has been the dominant player for some time. And ranking Mark Williams over Trump is really a joke. So, I don't know how much stock to put into this ratings model.
Crowdsource Syndicate Sports Betting Quote
03-16-2021 , 05:43 PM
Quote:
Originally Posted by dawai
News sites are reporting today that, surprisingly, John Higgins is the best snooker player ever according to a new study at University of Limerick.

https://academic.oup.com/comnet/arti...DKibfLGO8.link

This might be a good read also for other 1 vs 1 sports.
The link contains a link to a Github with tons of data in csv.
Source of most data is cuetracker.net
Really interesting. Only issue is that their algorithm doesn't seem to take into account over 50 variables like the Magic8Ball one, so it clearly can be improved upon.
Crowdsource Syndicate Sports Betting Quote
03-16-2021 , 06:58 PM
Quote:
Originally Posted by DicedPineapples
Snooker seems like it has more player vs player interaction than I thought. Was thinking it would be more like bowling where it's just you vs the balls. But no, there's a lot of back and forth within a frame unlike billiards. That could make the matchup graph approach you bring us a lot more useful than I thought going into this.
I'd compare it to Tennis without the extraneous physical effort. Also like chess with the forward looking strategy. Huge amount of skill and strategy involved.
Crowdsource Syndicate Sports Betting Quote
03-16-2021 , 11:59 PM
Quick and easy way to grab bookmaker lines for snooker

Code:
import pandas as pd
import requests
import datetime
from xml.etree.ElementTree import fromstring, ElementTree
def getBookmakerLines(url = 'http://lines.bookmaker.eu'):
    r = requests.get(url)
    tree = ElementTree(fromstring(r.content))
    root = tree.getroot()
    leagues = root[0]
    header = ['Date', 'Home Player', 'Visiting Player', 'Home ML', 'Visiting ML']
    data = []
    for league in leagues:
        if league.get('Description') == 'SNOOKER MATCHUPS':
            for game in league.findall('game'):
                homePlayer = game.get('htm').upper()
                visitingPlayer = game.get('vtm').upper()
                homeML = game.find('line').get('hoddst')
                visitingML = game.find('line').get('voddst')
                datestr = game.get('gmdt')
                year, month, day = int(datestr[0:4]),int(datestr[4:6]),int(datestr[6:8])
                data.append([datetime.date(year,month,day),homePlayer,visitingPlayer,homeML,visitingML])
    df = pd.DataFrame(data, columns = header)
    return df
Crowdsource Syndicate Sports Betting Quote
03-17-2021 , 01:37 AM
Frames file is up on datasnooker. The match column will correspond to the id column in matches.csv.
Crowdsource Syndicate Sports Betting Quote
03-17-2021 , 03:44 AM
Quote:
Originally Posted by DicedPineapples
Frames file is up on datasnooker. The match column will correspond to the id column in matches.csv.
Thanks, I appreciate it. I'm now having my doubts that I'm going down the right path on this though. That paper seemed to be more in the right direction.
Crowdsource Syndicate Sports Betting Quote
03-17-2021 , 08:49 AM
I finished scraping the snooker results and odds archives from oddsportal. I still need to fix a few things but after doing some preliminary cleaning, it looks like we have a sample size of over 29,000 matches going back 10 years.

I used the oddsportal closing lines (I created an account but couldn't find openers) to calculate accuracy and error metrics:
-The closing moneyline has a 66.29% classification accuracy, and a Brier Score of 0.2188.
-Elo model (with K=32) has a 65.16% classification accuracy, and a Brier Score of 0.2191.

That's close enough that I wouldn't be surprised if Elo can beat the opening lines (but obviously can't verify that statement without some openers data). I haven't had a chance to read the papers posted yet, did the Magic8Ball author mention any prediction accuracy metrics for their model? I should have more time on Friday to look closer at all that info, and will post a google drive link to the oddsportal csv so everyone can have that data too.

Last edited by rabbitcoin; 03-17-2021 at 08:57 AM.
Crowdsource Syndicate Sports Betting Quote
03-17-2021 , 09:53 PM
Quote:
Originally Posted by DefNotRsigley
ben simmons up to 2 3 pointers

can he make 2 more in the remaining 33 games? its gonna be close
Swishes the clutch 3 in OT. Up to 38% on the season.

Needs one more for the cash. EZ game, boys. Especially at +odds.
Crowdsource Syndicate Sports Betting Quote
03-18-2021 , 04:51 PM
The snooker craze lasted a day, but I'm still plugging away. What do we think of the page-rank approach but instead of just increasing the weight by 1 when a player loses to another player we increase it by the number of frames they lost by? Or maybe the average points per frame they lost by?

I'll give them both a shot, although I see some obvious problems with the second one.
Crowdsource Syndicate Sports Betting Quote
03-18-2021 , 11:59 PM
I’m deep in the lab too. An idea I had while pondering how M8B could possibly have 50 variables was to log or sqrt transform the scores to prevent blowout frames from dominating the average. Have not tried it yet.
Crowdsource Syndicate Sports Betting Quote
03-19-2021 , 06:21 PM
Had some time this afternoon to finish cleaning up the snooker data I scraped and run some backtests. For anyone interested here is the link to a google sheet of the oddsportal data:
https://docs.google.com/spreadsheets...it?usp=sharing

I made a 90/10 train-test split of that data and ran it through Elo and Bradley-Terry models. The out-of-sample wagering backtest results from the 10% test split were as follows.

Elo: wagers=1904, units risked=3586.61, p/l=+116.82 units, roi=3.26%
Bradley-Terry: wagers=1934, units risked=2147.46, p/l=+54.74 units, roi=2.55%

That's against closing lines, so betting into openers should produce better returns. Lines are already up for tomorrow's snooker matchups, but if I have time this evening I'll calculate and post the Elo projected fair lines so we can start to get an idea of what the true out-of-sample performance is. Also curious to see how Elo's performance compares to PageRank.
Crowdsource Syndicate Sports Betting Quote
03-19-2021 , 07:10 PM
Sounds like we just use an Elo model and print? PageRank sucks but I also have no idea what I'm doing - I have a bunch of PageRank scores for players based off their last ~7 years of games but no clue what to do with them.
Crowdsource Syndicate Sports Betting Quote
03-19-2021 , 11:47 PM
Here's the Elo projected fair lines for tomorrow as well as any opportunities I found on BM. I didn't actually bet any of these as I still consider this project in the test phase but let's paper trade these bets and see how it goes. I have to work all weekend so probably won't be able to follow up until next week but hopefully this will keep the wheels rolling at least:

Code:
player 1	player 2	fair ml  bet name bet odds
Bingham S.	Walden R.	-+103  walden	124
Trump J.	Dale D.	       -+1218	trump	-404
Lines O.	Day R.	       +-280	day	-225
Selby M.	Zhou Y.	       -+156	n/a    n/a
Bingham S.      Dale D.	       -+198	n/a	n/a
Trump J.	Day R.	       -+741    trump	-297
Lines O.	Zhou Y.	       +-293	zhou	-268
Selby M.	Walden R.	 -+139 walden	163
Bingham S.	Day R.	    -+121	n/a	n/a
Trump J.	Zhou Y.	       -+708 trump	-225
Lines O.	Selby M.	  +-456	selby	-437
Dale D.	        Walden R.	+-192	dale	131
Bingham S.      Zhou Y.        -+115	n/a	n/a
Trump J.	Selby M.      -+454 	trump	-151
Lines O.	Walden R.	    +-328	walden	-225
Day R.	       Dale D.	     -+164	dale	132
Bingham S.      Selby M.	+-135	n/a	n/a
Trump J.	Lines O.	-+2073	trump	-581
Zhou Y.	       Dale D.	-+172	n/a	n/a
Day R.	       Walden R.	+-117	walden	-107
Bingham S.      Lines O.	-+337	bingham	-302
Trump J.	Walden R.	-+633	trump	-273
Selby M.	Dale D.	     -+268	dale	237
Zhou Y.	       Day R.	     -+105	day	111
Bingham S.      Trump J.	+-614	trump	-200
Lines O.	Dale D.	     +-170	dale	-156
Selby M.	Day R.	     -+163	n/a	n/a
Zhou Y.	        Walden R.	+-112	walden	112

Last edited by rabbitcoin; 03-20-2021 at 12:13 AM.
Crowdsource Syndicate Sports Betting Quote
03-20-2021 , 12:12 AM
Quote:
Originally Posted by JSkelts
I have a bunch of PageRank scores for players based off their last ~7 years of games
If you have those per player, per matchup, along with which player won the matchup: we can try running it through Gradient Boost Classifier, which seems to be the sexy machine learning algo that wins all the kaggle.com data science competitions. I haven't been able to make that algo perform well on any sports data I've worked with but always worth a try, if nothing else it's always good to know which approaches don't work.
Crowdsource Syndicate Sports Betting Quote
03-20-2021 , 02:15 AM
Here's some code to convert frame probabilities to match probabilities directly without simulation if anyone finds that useful.

https://pastebin.com/raw/gM1CFHig
Crowdsource Syndicate Sports Betting Quote
03-20-2021 , 07:35 PM
Quote:
Originally Posted by rabbitcoin
Code:
player 1	player 2	fair ml  bet name bet odds
Bingham S.	Walden R.	-+103  walden	124
Trump J.	Dale D.	       -+1218	trump	-404
Lines O.	Day R.	       +-280	day	-225
Selby M.	Zhou Y.	       -+156	n/a    n/a
Bingham S.      Dale D.	       -+198	n/a	n/a
Trump J.	Day R.	       -+741    trump	-297
Lines O.	Zhou Y.	       +-293	zhou	-268
Selby M.	Walden R.	 -+139 walden	163
Bingham S.	Day R.	    -+121	n/a	n/a
Trump J.	Zhou Y.	       -+708 trump	-225
Lines O.	Selby M.	  +-456	selby	-437
Dale D.	        Walden R.	+-192	dale	131
Bingham S.      Zhou Y.        -+115	n/a	n/a
Trump J.	Selby M.      -+454 	trump	-151
Lines O.	Walden R.	    +-328	walden	-225
Day R.	       Dale D.	     -+164	dale	132
Bingham S.      Selby M.	+-135	n/a	n/a
Trump J.	Lines O.	-+2073	trump	-581
Zhou Y.	       Dale D.	-+172	n/a	n/a
Day R.	       Walden R.	+-117	walden	-107
Bingham S.      Lines O.	-+337	bingham	-302
Trump J.	Walden R.	-+633	trump	-273
Selby M.	Dale D.	     -+268	dale	237
Zhou Y.	       Day R.	     -+105	day	111
Bingham S.      Trump J.	+-614	trump	-200
Lines O.	Dale D.	     +-170	dale	-156
Selby M.	Day R.	     -+163	n/a	n/a
Zhou Y.	        Walden R.	+-112	walden	112

Snooker Elo paper trading results:
walden -1.0
trump -4.04
day +1.0
trump +1.0
zhou -2.68
walden +1.63
trump +1.0
selby +1.0
dale +1.31
trump +1.0
walden +1.0
dale -1.0
trump +1.0
walden -1.07
bingham +1.0
trump +1.0
dale -1.0
day -1.0
trump +1.0
dale +1.0
walden -1.0

risked 45.51 units
p/l +1.15 units
roi 2.53%
Crowdsource Syndicate Sports Betting Quote
03-20-2021 , 10:32 PM
Ok, here's where I'm at with snooker. First I've gone through and created pythagorean winrates based on each frames points scored/points against. Since blowouts are common I've taken the natural log of the points by frame, and converted that into frame expected winrate (exponent 3.8). Log5ing that gets very close to pinnacle lines which makes me think I'm on the right track.

My issue is that some guys have very few games played or only play in smaller tournaments but have decent frame xWinrates. My solution to that is to incorporate my PageRank rankings as another indicator of player strength, but I'm not entirely sure how to do that. Something like x * log5(PageRank ratio) + (1-x) * log5(Frame xWinrate ratio)? I'd really appreciate any suggestions as I think I'm super close.

I also tried uploading my PageRank ratings & pythag winrates to pastebin and it keeps thinking I'm publishing personal data and rejecting it. So lol on that.
Crowdsource Syndicate Sports Betting Quote
03-20-2021 , 11:37 PM
I have yet to read the log5 paper, but as someone who has worked on the other side of the counter, here's what I thought while reading your questions:

Pinnacle was recently hiring for machine learning engineers. Most other sports trader/odds compiler jobs today just require some knowledge of Excel, statistics, and some ticket writing experience. The point is I would expect that Pinnacle takes a strong data science approach and pays close attention to the classification accuracy and probabilistic prediction error for all moneyline markets it makes.

So if you think you're close, check the prediction accuracy and probablistic error (log loss or brier score) of your model. If it isn't at least on par with the same metrics from Pinnacle's lines, adjust something to see if your metrics get better or worse, and you can probably answer your own questions through trial and error. As a punter and not a bookmaker you don't have to originate a line on every matchup to make money anyway.

Last edited by rabbitcoin; 03-20-2021 at 11:43 PM.
Crowdsource Syndicate Sports Betting Quote
03-21-2021 , 03:14 AM
I appreciate the feedback and found this really helpful explainer on glms http://rstudio-pubs-static.s3.amazon...8d5449809.html. I didnt know you worked for a book but would definitely be interested in knowing how things go on that side, I'm pretty new to sportsbetting in general.

It took a few days but I have a model that would be fun to see how well it works. I'm not going to put any money down yet but lets just see how it does vs Pinny lines, models fair lines in brackets:

Mark Williams v Kyren Wilson +106 1u [-115]
Stuart Bingham v Jack Lisowski -115 1.15u [-146]
Mark Williams v Jack Lisowski -117 1.17u [-132]
Judd Trump v Kyren Wilson -183 1.83u [-214]

I guess I really like Mark Williams and hate Wilson/Lisowski?
Crowdsource Syndicate Sports Betting Quote
03-21-2021 , 07:33 AM
I don't know everything there is to know but I am a data scientist that's happy to help as I can. Have two little kids and a full time job so don't have much time but I'd be willing to answer questions or get hands on as time allows.

Did you do any sort of feature engineering with the predictors?
Crowdsource Syndicate Sports Betting Quote
03-21-2021 , 10:38 AM
Spanky has some good interviews with Vegas and offshore bookmakers, traders etc on his Be Bettor Bettors podcast. Check out the one with Pat Porada who has traded sports in Europe, US, & Asia.

Cannabusto, based on what you read in the forums here, do you think doing any of the kaggle.com sports data related competitions (NFL Big Data Bowl?) is good practice for learning how to build a profitable betting model, or are those competitions too niche?
Crowdsource Syndicate Sports Betting Quote
03-21-2021 , 11:22 AM
I haven't done much with sports betting or kaggle but I will say kaggle has a reputation for not building useful skills because the projects often are niche and seem to come down to eeking out another .0001 percent of prediction accuracy, which usually isn't a practical concern in the real world.

I think grabbing your own data and making your own way doing something that interests you is a much better way to go.
Crowdsource Syndicate Sports Betting Quote
03-21-2021 , 01:52 PM
Quote:
Originally Posted by JSkelts
Mark Williams v Kyren Wilson +106 1u [-115]
Stuart Bingham v Jack Lisowski -115 1.15u [-146]
Mark Williams v Jack Lisowski -117 1.17u [-132]
Judd Trump v Kyren Wilson -183 1.83u [-214]
Well, I've started a clean 4-0. If you dont like that then you dont like WST Pro Series Snooker! At least I think I did, I didn't watch the games.

Mark Williams v Kyren Wilson +106 1u W +1.06
Stuart Bingham v Jack Lisowski -115 1.15u W +1u
Mark Williams v Jack Lisowski -117 1.17u W +1u
Judd Trump v Kyren Wilson -183 1.83u W +1u

+4.06u on 5.15u risked

Also @cannabusto - yep, I talked a little bit about what I did in a few posts earlier but essentially I created a pythagorean expected winrate for a frame per player that was adjusted for blowouts, as well as a frame specific (better than what that paper did imo) pagerank rating per player. Who knows if it makes any sense or if I'm just good at coinflipping.
Crowdsource Syndicate Sports Betting Quote
03-22-2021 , 02:08 AM
Ronnie O'Sullivan v John Higgins -135
Crowdsource Syndicate Sports Betting Quote

      
m