Fantasy Football Optimization in R - Computer Technical Help

Hey everyone, getting my feet wet with R and am running into a wall.

Currently I have some R code that will look through my data set that includes (Player, Position, Projection, Salary, Team, Opponent). This is for FanDuel so I currently have it selecting 1 QB, 2 RBs, 3 WRs, 1 TE, 1 K, and 1 Defense while staying under the $60K salary cap. So I can get it to spit out the optimal line up for me, can also loop it to get say the best 10 line ups for me.

The problem I'm running into is that I want the lineup it spits out to have a few more constraints.

For example I want just 1 of the WRs/TE to be on the same team as the QB.

Anyone got any advice? I have been searching all over for this answer and can't find anything.

Here is the code I have so far

Code:

name <- myData$Name
pos <- myData$Pos
pts <- myData$Projection
cost <- myData$Salary
team <- myData$Team
num.players <- length(name)

f <- pts

var.types <- rep("B", num.players)

maxPts <- 1000
lineup_no <- 10

Lineups <- vector("list", length(lineup_no))
for(i in 1:lineup_no)
{

A <- rbind(as.numeric(pos=="QB")
           , as.numeric(pos=="RB")
           , as.numeric(pos=="WR")
           , as.numeric(pos=="TE")
           , as.numeric(pos=="K")
           , as.numeric(pos=="D")
           ,cost
           ,f)

dir <- c("=="
         ,"=="
         ,"=="
         ,"=="
         ,"=="
         ,"=="
         ,"<="
         ,"<=")

b <- c(1
       , 2
       , 3
       , 1
       , 1
       , 1
       , 60000
       , maxPts)

library(Rglpk)

sol <- Rglpk_solve_LP(obj = f
                      , mat = A
                      , dir = dir
                      , rhs = b
                      , types = var.types
                      , max=TRUE)

score <- sum(pts[sol$solution >0])
Lineup <- myData[sol$solution == 1,]
Lineup<-Lineup[order(Lineup$Pos),]
print(Lineup)
print(score)
lineupNumber <- i
print(lineupNumber)
Lineups[[i]] <-Lineup
maxPts <- score - .01
}

Thanks!

Quote

10-26-2016 , 07:16 AM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

This seems like the perfect use of algorithms that solve the knapsack problem https://en.m.wikipedia.org/wiki/Knapsack_problem. As far as getting a quarterback and receivers on the same team you'll probably have to solved those cases individually first optimizing for quarter back and then using that to limit your receiver pool. Then looking at optimizing for receivers and then finding the quarter back. Choose whichever nets you more points on average. There may be better algorithms that's just the first that comes to mind.

Sent from my SM-G900R4 using Tapatalk

Quote

11-01-2016 , 06:20 AM

clowntable

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 45,557

As an aside I'd probably use Python for this problem domain. At least I have found that I tend to use R for statistical quick checks and Python whenever I need more plumbing (I'm assuming you want this to hook up to the site automatically eventually, generate the underlying CSV or whatever you use automatically etc.).

If you're not too invested in R, check out Anaconda (Python bundle that includes all the scientific computing stuff).

Quote

11-01-2016 , 09:21 AM

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

I'm not super invested in R and have done some stuff in Python before however I have never been able to find any good(for me) resources on utilizing Python for fantasy sports optimization.

Coming from a heavy Excel background the code above makes a lot of sense to me since it is works similar to Excel's solver.

Trying to use this for fantasy football and fantasy golf and have another question if anyone has a good solution.

Let's say for fantasy golf I want to create 50 teams. My problem now is limiting exposure. The 50 best lineups may all include 1 particular player and I don't want that. So what I was thinking is I could get my code to create a data frame of 1000 teams where each row contains the 6 golfers on the team and the team's projected score. Then I could use that data frame and get the code to select the 50 best rows such that no player's team is selected more than 50% of the time. Anyone got any ideas on how to achieve that?

Quote

11-01-2016 , 11:26 AM

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

I wonder if you could adjusted a teams score negatively based on the number of times it's constituent players have been picked.

Edit: That way the logic that chooses the optimum team only has to be concerned with choosing the total scores for that team.

Quote

11-01-2016 , 06:11 PM

Mihkel05

banned

Join Date: Jul 2015 Posts: 4,835

There is basically no difference between R and Python for this. (I prefer R for better data visualization options. Python for ease utilizing other non optimization stuff. Personally use Python. Immaterial either way.) Both are going to be very slow compared to high performance options. (Whatever Khronos technology basically.)

You'll need to find the general optimization name and look for resources based on that. Most of the stuff written about fantasy sports is written by dullards who just don't care that they could be computing stuff 1000x faster.

As an aside, what you're doing is suitable for H2H (kinda) and terrible for tournaments.

Quote

11-01-2016 , 06:19 PM

Mihkel05

banned

Join Date: Jul 2015 Posts: 4,835

Also, couldn't you just run some sort of iterative count for each player to check the list of teams? Seems like after the list is sorted, just running a loop with a count/check to output the appropriate teams would be simple?

I could be misunderstanding your question.

Quote

11-01-2016 , 10:13 PM

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

Quote:

Originally Posted by Mihkel05

Can you expand on this?

I have a list of teams I want to choose from. Just not sure how to best pick the teams given my player exposure limits.

I could somewhat limit exposure in Excel but it wasn't very good I don't think. For example let's say I have player A and player B who are both great picks and I want them each on a maximum of 10 out of 20 line ups.

Well I don't really want to pick just the 10 best player A line ups bc player B might be on all of them as well. Now those players are too highly correlated for my liking plus the other 10 teams I have to make won't have either of those decent players on them.

I'm thinking what I need is a way to select X amount of teams from a list based on player exposure such that the point projection for the last team selected is as high as possible.

Like in the above example if I just take my 10 best line ups with players A and B and make 10 more line ups without them, the 20th line up I make is probably going to be worse than the 20th line up when my ownership is a bit more spread out.

Hope that makes some sense

Last edited by NxtWrldChamp; 11-01-2016 at 10:25 PM.

Quote

11-02-2016 , 06:25 AM

Mihkel05

banned

Join Date: Jul 2015 Posts: 4,835

Yes, you would need to keep a separate count and check for each player.

Keep in mind this solution is a pretty gross oversimplification of the problem that you should be trying to solve.

Quote

01-12-2017 , 12:07 PM

#10

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

Anybody have ideas on how to improve solver times?

They way I have it now is the solver fetches my best team, logs the projected score, and then to find the next best team one of the solver constraints is set to "max score" <= "previous projected score - .01".

As the process goes on this slows it down considerably as I believe what is happening is that for lineup 500, the solver is finding the first 499 that don't fit the constraint before publishing the 500th best lineup.

Any ideas?

Quote

01-12-2017 , 04:14 PM

#11

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by NxtWrldChamp

Typically memoization would solve this problem for you (i.e. cache your team results from previous trials in a table).

Quote

01-12-2017 , 04:16 PM

#12

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by just_grindin

Typically memoization would solve this problem for you (i.e. cache your team results from previous trials in a table).

Trying to edit my previous post was weird on my phone. By table I just meant some sort of lookup datastructure that costs memory but not time to look up.

Sent from my SM-G900R4 using Tapatalk

Quote

01-12-2017 , 04:37 PM

#13

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

Feel like the issue has to do with the constraint in the solver. The previous team result is already saved, and easily accessible.

The solver uses the previous result -.01 to get a constraint that the next team's projection cannot be more than that.

I believe the solver continually goes in order so i.e to get team #3 first it solves for team 1 and realizes that doesn't match the score criteria, then it solves for team 2 and also realizes that doesn't fit the criteria, finally it gets to 3 and all of the criteria are good to go. This is fine when trying to make a small amount of teams but appears to become very cumbersome wants you reach hundreds and thousands of teams.

Quote

01-12-2017 , 11:27 PM

#14

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Quote:

Originally Posted by NxtWrldChamp

Is this for the golf teams? I guess I'm confused on what exactly you're doing. You want to find the top 10 teams and you want their score to be 1/100th lower than the previous best team?

Quote

01-12-2017 , 11:42 PM

#15

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

Yes. It uses similar logic to the football code above. I can't figure out a good way to limit exposure % as I build teams so the work around I have is to build a pool of say 1000 viable teams. So the solver runs 1 time and spits out the best projected team, say 400 projected points. I then loop the solver but add in the constraint that the next team can't have a projection higher than 399.99. It follows this process for however many teams I want to build.

Then say I want to pick 50 teams, from my pool of 1000, but Tiger Woods can only be on a maximum of 25, and player X can only be on a max of 20, and player Y can only be on a max of 20 but has to be on at least 10. Etc etc. I'll do the same for NFL.

I have all that configured to work how I want it to, just the initial building of the team pool can be extremely slow. In an ideal world I guess rather than looping the solver it would just give me the best team and then keep going to give me the next best team right.

Right now I feel if I build 5 teams the solver goes

1
1x
2
1x
2x
3
1x
2x
3x
4
1x
2x
3x
4x
5

Was hoping there was a way it could just go

1
2
3
4
5

As you can see how exponentially it slows down as it has to check all the previous teams vs the new "projected points muat be less than X" constraint.

On phone so hopefully that makes sense.

Quote

01-13-2017 , 10:17 AM

#16

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

Is your initial output sorted by score? I guess I'm not understanding the necessity of comparing every return value to every other return value? If it's sorted by score it should be easy to retrieve what you're looking for. Sorting algorithms are much faster than what you're doing.

Sent from my SM-G900R4 using Tapatalk

Quote

01-13-2017 , 11:52 AM

#17

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

Maybe I am not explaining this well but not really sure how to explain it differently than I have before.

All I am trying to do is take a group of 140 golfers, who each have a Salary and Projection, and build X amount of teams as quickly as possible.

It is currently set up so that my code first creates a data.table for the amount of teams I want to create, lets say 100.

The solver then runs and retrieves the best possible team, that team, their combined salary, and their combined score are placed in the data.table.

The score is logged so that the solver can run again, yet this time create a team who's projection is just slightly less than the team it just built. This process continues until I end up with 100 teams in my data.table, going in descending order based on projected points with the best team at the top.

The solver slows down considerably as you build more teams and I feel the constraint of the team's projections having to be lower than the previous teams projection is the culprit. It appears the solver finds every team above the constraint first, realizes the projection is too high and then starts over.

Examples:

For example if I just go build 1 team trying to max out projected points it does it in the blink of an eye. Say this team has a projection of 400 pts.

If I again just go to build 1 team, the 2nd best team, the solver is again trying to max out points but to avoid a duplicate answer I put a constraint that the projection has to be less than 399.99 pts(ie the previous score - .01). This returns very quickly as well however I believe the solver finds that best team first, determines that 400 pts is over the constraint of 399.99 and then returns the next best team, say a team projected to score 399.5 pts.

Here's where the big problem is. If I again just go to build 1 team, but make the projected pts constraint such that the team cannot score more than 350 pts the solver slows to a halt. There may be 3000 combinations of teams between the best team(400pts) and my constraint of 350 pts. It appears the solver is going to find all 3000 of those teams first, determine that each of their pt projections are too high before returning a team that only scores 350 pts.

Was wondering if there was a way to avoid all of that lost time.

Quote

01-13-2017 , 04:17 PM

#18

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

That makes more sense. The item picking wasn't as clear to me before or how your constraint was being used. I thought you just generated all teams and had a score available to search. Let me think about this some more. Thanks for corresponding.

Sent from my SM-G900R4 using Tapatalk

Quote

01-13-2017 , 09:25 PM

#19

whosnext

Carpal \'Tunnel

Join Date: Mar 2009 Posts: 6,732

I don't know if this would work in your situation, but something I have utilized in somewhat similar situations is to add a huge penalty for any solution that violates a requirement (e.g., more than one TE/WR on same team as QB).

The same idea might work in your looping over solver solutions. After each loop, add a huge penalty for the team that was just found.

Something weird appears to be occurring in your present algorithm since it should not be taking so long to find your top 100 teams if it can find your top team so fast.

Quote

01-18-2017 , 11:16 AM

#20

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

One interesting thing of note that I have discovered through trial and error.

The speed of the solver varies significantly depending on the decimal place the projections are taken out to.

For example if all of the projections are taken out 1 decimal place (ie 90.5) and my next team solver constraint is just "previous team score - .1" the solver is extremely fast.

If the projections are taken out 2 decimal places (ie 90.54) and my next team solver constraint is "previous team score -.01" the solver slows down quite a bit. Seems a bit weird that such a small change could have such a large impact.

Quote

01-18-2017 , 05:44 PM

#21

Mihkel05

banned

Join Date: Jul 2015 Posts: 4,835

http://stackoverflow.com/questions/2...nt-output-in-r

That may provide some insight, but your algo probably is poorly formed.

Quote

01-18-2017 , 06:04 PM

#22

NxtWrldChamp

Carpal \'Tunnel

Join Date: Feb 2008 Posts: 9,082

There is no doubt that my algo is poorly formed. I'm totally self taught aka throw stuff against the wall until it sticks.

Last edited by NxtWrldChamp; 01-18-2017 at 06:10 PM.

Quote

02-24-2017 , 04:27 PM

#23

Lawnmower Man

Pooh-Bah

Join Date: Feb 2015 Posts: 4,457

Are you still working on this? I think already have R scripts that do what you want or could get there with some minor tweaking.

Quote

02-24-2017 , 04:35 PM

#24

Lawnmower Man

Pooh-Bah

Join Date: Feb 2015 Posts: 4,457

Also, are you using RGLPK or something else? Can't imagine 100 golf lineups would take too long, but yeah, that loop gets slower and slower. I always assumed it was because of bunching in the distribution as opposed to resolving but maybe your theory is correct.