Open Side Menu Go to the Top
Register
Data Science As A Career? Data Science As A Career?

04-09-2017 , 03:57 PM
Tackled the ridge regression section in my text book this morning and cleared up quite a few things lol.

Quote:
Originally Posted by Sholar
Yes. One way to think of it in terms of the "bias-variance" tradeoff (which you probably haven't encountered yet...) another is as a Bayesian prior that the predictors are useless (and so your prior is that they take a zero coefficient).

We've dug into bias-variance tradeoff a bit. My understanding is that variance refers to how much your model would change given a different training set. A very flexible method (KNN w/ K = 1) will have high variance, while something like linear regression will be low variance. Bias refers to error introduced from model assumptions (e.g. linear relationship between response an predictors). Here, flexible methods result in low bias while more constrained methods have higher bias.

This doesn't seem like a good intuition...it's more that they reduce error "more" than they contribute to a penalty...

...and this seems even more strange. How are they getting "inflated"?

Yeah... so I was not understanding ridge regression very well. For whatever reason I wasn't really thinking in terms of setting lambda then calculating the set of coefficients that minimize the objective function. I was tunnel visioned on the penalizing term and thinking, "I understand I want to penalize large coefficients to reduce over-fitting, but why do I want to let my small, insignificant coefficients stick around easier?" Somehow the thought didn't occur me, "they're already small and insignificant, you don't need to penalize coefficients that aren't contributing to your model in the first place."I was thinking about the MSE/Penalty trade off, but for those coefficients there's no real trade-off lol.
Thanks for the comments, Sholar. I think it's really helpful for me to engage in conversation about these things. It's really easy to sit in class and nod along thinking you understand something, but when forced to try to explain it yourself realize that's not the case.
Data Science As A Career? Quote
04-10-2017 , 10:40 AM
I think you're understanding it better now. Just to be clear, variance is error caused by the model being too sensitive to tiny changes in training data.

Can you explain what the issue is with overfitting? What happens when a model is overfit? This is key to understanding why to use methods like ridge and LASSO instead of OLS regression.
Data Science As A Career? Quote
04-13-2017 , 11:33 PM
Quote:
Originally Posted by cannabusto
I think you're understanding it better now. Just to be clear, variance is error caused by the model being too sensitive to tiny changes in training data.

Can you explain what the issue is with overfitting? What happens when a model is overfit? This is key to understanding why to use methods like ridge and LASSO instead of OLS regression.

The issue with overfitting is that your model starts to capture the variance in your training set. For example, let's say the true relationship between your variables is y = x + norm(0,1). We know that a linear model is appropriate, but fitting a 5th order polynomial to your training set will probably produce a "better" fit. But when you then go to your test set, your 5th order polynomial might behave worse than a standard linear model if either the training or test set had a significant outlier, for example.
Data Science As A Career? Quote
04-14-2017 , 09:29 AM
That's a very good explanation. Although the test set outlier would affect the prediction accuracy if either model. It's all about avoiding modeling the noise in the training set.

Sent from my SM-G930P using Tapatalk
Data Science As A Career? Quote
04-14-2017 , 03:59 PM
Quote:
Originally Posted by cannabusto
That's a very good explanation. Although the test set outlier would affect the prediction accuracy if either model. It's all about avoiding modeling the noise in the training set.

Sent from my SM-G930P using Tapatalk
Good point lol
Data Science As A Career? Quote
06-14-2017 , 01:46 PM
Finished up my machine learning class last week. Thought it was a great class and really enjoyed it. I am, however, kind of conflicted on how I feel about certain aspects of ML.

In the last two to three weeks we started talking about neural nets, random forests, bagging, boosting, ensembles. Caveat: We didn't cover them in depth, so I'm sure I'm missing a lot of the nuance, but sometimes it felt a little like "throw **** at a wall and see what sticks." Very early on in the class we talked about inference vs prediction, and I guess I'm more drawn to inference problems? When I think about what I want to be able to do as a data scientist, it's to have someone bring me a problem and to be able to tell them what's happening in their data, not just build a neural net that's uninterpretable.

Am I way off base here?
Data Science As A Career? Quote
06-14-2017 , 02:03 PM
If the stakeholder cares about finding out what drives the outcome, then you would never fit a RF.

If the stakeholder doesn't care about that but instead cares about predicting that outcome with great accuracy, then maybe you would.
Data Science As A Career? Quote
06-14-2017 , 02:13 PM
Right, it's all about your customer's needs. I guess my question is really about the data science job market and what the majority of companies are looking for.
Data Science As A Career? Quote
06-14-2017 , 03:12 PM
I have no idea about that. I would guess inference often matters though.
Data Science As A Career? Quote
06-14-2017 , 07:59 PM
In my industry, clients don't like black boxes. One of our big selling points is that our models (and to a degree, their development) is pretty transparent. FWIW, we tend to go down the random forest / logistic path most of the time.
Data Science As A Career? Quote
03-18-2019 , 04:36 PM
Quote:
Originally Posted by Priptonite
Just accepted admission to University of Washington's Data Science Masters program, starting in September. Really excited to get started. Can provide updates/feedback if anyone is interested.
Graduated! I know I didn't post much along the way but happy to do some post-mortem question answering. Got a job as a junior data scientist at an insurance company, starting in May. Wouldn't have been my first choice of industry but they seem to have a great learning culture and I think it'll be a good spot as someone newer to the field. The projects I learned about while interviewing were also more interesting than I anticipated.

More interestingly, I interviewed with some baseball teams, which was a lot of fun but ultimately didn't quite pan out.
Data Science As A Career? Quote
03-18-2019 , 10:52 PM
Nice, congrats!
Data Science As A Career? Quote
03-19-2019 , 08:25 AM
What were the interviews like? What did they expect you to know?

What did the program not cover that you wish it did? Do you think they covered any stupid stuff? For example, stepwise regression is something that's pretty stupid.

Which class was most interesting and why?

If you happen to work with Flo, pm me.
Data Science As A Career? Quote
03-19-2019 , 05:59 PM
What were the interviews like? What did they expect you to know?

Not sure if you’re referencing baseball or insurance company, so I’ll answer both.

Insurance company was 3 stages:
- Automated phone screen (record responses to questions). Mostly why do you want to work here, tell us about yourself, what salary are you interested in, etc.
- 1hr phone call with 2 data scientists. Some logic/algorithm questions and a brief case study. Logic/algorithm questions were pretty straightforward. Some discussion on algorithm optimization and Big O. Case study was given a situation. How would you approach this problem, what would your target variable be, what additional data would you want if available, what modeling framework might you use, etc.
- In person loop. 3 sections: behavioral, algorithm whiteboarding, more in-depth case study. Nothing too surprising here. I thought part of my whiteboarding was pretty weak and is definitely something I’ll need to practice for future job interviews. I think I just rushed into the problem too quickly, missed little things here and there that should’ve been obvious. It was only my second time whiteboarding in an interview setting. The case study was more in depth. In addition to similar questions from the phone screen, they included some plots of available data. They’d ask based on what I see in the plots, would I use this variable and how?

For baseball, one thing was consistent across all teams – a take home assignment. Some were a single, more open-ended question for you to explore. Others might be 5-10 questions targeting different skills. Interviews varied widely by team. The most common question was “if you had access to whatever data you wanted, what would you explore?” But otherwise there were few common threads. One team grilled me on technical details of some baseball-related modeling I had done. Other teams asked me more about why I chose the projects I did, but didn’t ask much about the technical details. The biggest surprise to me was lack of interest in current baseball knowledge or team-specific situations. It wasn’t non-existent, but I would be studying a team’s prospects only to not have them ask me anything about the organization. And in meeting and chatting with current analysts, there are definitely huge baseball nerds but there are also some who didn’t really know or care about baseball prior to the job.

What did the program not cover that you wish it did? Do you think they covered any stupid stuff? For example, stepwise regression is something that's pretty stupid.

My primary “complaint” was that we only had one quarter of machine learning. In that class we spent a ton of time on the fundamentals (positive), but not as much on real-world implementations. So it would’ve been nice to have a secondary ML class or – and this is my second issue – electives that let us dive deeper into a certain area of interest (NLP, time series data, neural networks, etc). I don’t know that we covered anything stupid… but there were certainly classes that I felt could’ve been taught differently/better. Honestly I wish the program had been more rigorous across the board. We had very few tests and some classes had remarkably easy homeworks. I think more repetition of certain concepts or test that force you to go back and study would’ve been good.

Which class was most interesting and why?

The easy answer here would be machine learning, but I’ll pick a curveball. We had a “Human-centered Data Science” class that focused on ethical issues in data science. A lot of people found it useless because it wasn’t enhancing their technical skills, but I found it to be fascinating. It covered a lot of topics that I never would’ve thought of if I hadn’t taken the class.
Data Science As A Career? Quote
03-20-2019 , 10:28 AM
Thanks, great answers. Mind going more in depth on the algorithm whiteboarding? What did that consist of?

One ML class does seem weird. I figured there'd be a regression course, a classification course, and at least one more that covered some or all of: clustering, NLP, time series, deep learning, etc.
Data Science As A Career? Quote
03-20-2019 , 12:37 PM
There were 3 parts to it, but I didn’t reach the 3rd. First was something along the lines of: you’re given a string input, find the sum of all substrings. So provided “123” you would return 1+2+3+12+23. Not a difficult problem at all, but my lack of whiteboarding experience really slowed me down. The difference between actively coding something where you’re able to test/change iteratively and writing whiteboard code under pressure was surprisingly big. I’d set up my loops, start going through a test case, then realize I never created a variable to store the total sum… go back, erase stuff, re-write stuff, etc. My solution was fine, but between small hiccups and not being able to test quickly, I spent a lot of time.

The second part was essentially SQL queries, though they said you could use any language. They gave me table schemas and asked how I would find various things (top x products sold in December, all customers who bought y and z). I’m really comfortable with SQL and these were fairly simple queries, so this section was a breeze.

With ~10min to go they wanted to give me time to ask questions, so I didn’t do the third problem. No idea what it was.
Data Science As A Career? Quote
03-20-2019 , 12:44 PM
Cool, so basically it was testing one's data munging skills.

I still don't reallllllly get it from the employer perspective--I don't memorize how to do much of that stuff. I google how to do stuff I've done before so so often. And then I find the solution really quickly and off I go. Or I look at past code. I very rarely just code off the top of my head. I'd suck at whiteboarding too.
Data Science As A Career? Quote
03-20-2019 , 01:00 PM
Yeah, and it must not have been toooo important to them given that I think I did ~mediocre lol.
Data Science As A Career? Quote
03-20-2019 , 06:06 PM
Ya I am same way canna. Like I remember where similar code or implementations are and then use them as a guide.
Data Science As A Career? Quote
03-20-2019 , 08:35 PM
I'm glad I'm not a fraud. Imposter syndrome is so real in tech
Data Science As A Career? Quote
07-18-2019 , 06:51 AM
^^ this just slipped through, data science yay o/
Data Science As A Career? Quote

      
m