Quote:
Originally Posted by Gospy
If you can explain how your data is structured and how you append it that would be helpful. What does the raw data look like when you pull it? And how many exports do you have?
(Are your rows unique by player name or no?)
If it's only a few thousand rows and a couple files and you want to do analysis, it's probably easiest to just stick with Excel.
But if you want to append a lot of different exports then it might be easier to write a single load script for SQL and just run it every time you have a new file. And then write queries to structure the data how you want for your analysis.
Here's the website I've normally been using:
So what I've done is used one of those for my first year. I cut it from that site and paste it into Excel. I take out just about everything but the Name, Team, GP, G, A, P. I take the name row and use Text to Columns so I can create separate First Name and Last Name columns (doing a little manual work on the half dozen multi-name entries like James van Riemsdyk). When I add years, I do the same thing with the current year in a separate spreadsheet, sort each by last name/first name, then put the two side-by-side. The new list will have some new names, and will be missing some from the old. So I go through each list and insert blank rows so that in the end, the names common to both lists (which is by far the majority) will be on the same rows in each spreadsheet. Then I cut the data from the newer sheet and paste it in columns to the right of the data in the existing sheet.
As for data manipulation, I calculate points per game, multi-year averages for GP, weighted averages for points per game, and then a more complicated forumla to project points for the upcoming season. This is the part that makes me hesitant to go to a database.
Right now, my spreadsheet has <1000 rows (I drop off some of the lowest producing names), and probably well under 100 columns. Each year, it grows by <100 rows and around 10 columns, so it would take many years to go beyond two or three thousand rows and a couple hundred columns.
Maybe what I really need is just a better way to import and combine the data.
Two other future projects in the spreadsheet vs. database category:
Movie collection - I'd like to put my movies in a spreadsheet (geek alert!), which would just be a simple listing of title, year, director, genre, key actors, etc. No data manipulation required, and this would be a pretty simple list. Since I'm more comfortable with a spreadsheet, that seems the simple solution. But if something else I'm doing needs me to get a database program, maybe that would be better for this.
Movie database - this is something I'm considering doing down the road for a website. It would be a database of award-winning/nominated movies, and I guess the unique/key fields would be the actual award. For example - 1976 best picture Academy Award. I'd want the year and award/nomination to be separate fields, but both would be required together to make the entry unique, if that makes sense. IE there will be multiple 1976 award winners, multiple best picture winners, but only one 1976 best picture winner. However, there would be multiple 1976 best picture nominees, each of which should probably have its own unique record. Not sure what is the best way to organize this. This database wouldn't require any calculations aside from a record count (eg, how many awards/nominations did movie x or actor y get), but I would want to be able to sort/count by a number of different fields. Database or spreadsheet? I suppose the answer might depend on how I use it with the website. If I am keeping the data off site and just publishing the results of queries on the website (eg static listings of different queries like all the best picture winners, who has won the most best actor awards, etc.), will a spreadsheet suffice? I assume that if I want the data to be part of the site and able to be manipulated by site visitors, it would need to be a database?
Sorry to lump all of this in one post; I expect many were thinking tl; dr less than halfway through. But I'm asking all the questions together so I can decide whether I need database software at all - if I need it for the last project, then maybe I'd go ahead and get it and put my personal movie list in a database as well, for example.
Any input or advice appreciated.