Just finished reading part 1, great read. I like the nested np.where and was a great intro to using pandas for me.
I did find a minor issue in Nascar-Tournaments-clean.csv. So for tournaments with multiple name aliases, you mentioned you wanted to delete the aliases. Looks like you got
[ id = 1511 ] Daytona - Three on One Matchups - Group F
[ id = 1517 ] Daytona 500 - Three on One Matchups - Group F
(along with groups A-E) both in that clean csv.
To try and find more of these, I computed the
edit distance between every pair of tournament names that have the same startDate.
Got this,
https://gist.github.com/dicedpineapp...07fe5ff8d2b7f7
Was going to go through tomorrow and try to find more.