The problem is multi-fold. The least of which is that no one at my job, despite building their own computers and being pretty darn good at Excel, is technical beyond that point.
I have to maintain and track a crap-ton of product listings: over 200,000 individual listings and about 25,000 photos. The tools of this trade? You guessed it: Excel and FTP. Yeah, it is as miserable as it sounds. FileZilla is a roaring piece of trash.
The crux of it all is that some company everyone here has heard of likes to use flat files. Whoever created those things... /rant. The plus side is that they aren't strictly required for my day-to-day use, but I still have to stick with the template files. I have 4 different files that have to be properly filled out -- unless you like reading indecipherable error messages -- and they each have over 200 columns. The count, as best I remember is 212, 208, 214, 208 columns.
And finally, as I'm sure jj can attest, is that once an excel file gets over 100,000 cells, things start going sideways, which means freeze-ups, data corruption, hours of lost work, phantom values, etc etc etc.
This whole issue is compounded by the fact that I only need to fill out 60 to 70 columns on each template file, so I have a TON of blank space. As an aside, I get decent amount of entertainment watching other people try to use these files. First instinct, of course, is to grab a mouse and use that little bar on the bottom right. I'm really good at navigating these things right now, but that's because I sort of remembered the the columns I often need and use ctl-f. My boss was watching me and he says: "Why don't you use a mouse? Its so much faster."
So, in order to keep my sanity, I started working on the database. V1 was a steaming pile of trash, as it was simply two tables, both of the same category, one at 214 columns and the other at 210, and was designed to be select * and hop along. Unfortunately, I was trying to update the 210 column table from the 214 column table. That didn't go well at all. I let the update go for 90 minutes and it just didn't happen. The results reflected the 30 minutes of effort I put into doing the database, so no loss.
Since there are two categories (lets pretend it is cups and shirts) I work with, and I have 4 tables, that they can be paired into fraternal twins: they even have the same skus. Unfortunately, these pairs are only about 80% the same and the columns are entirely in different order. This is sort of a good thing but sort of a bad thing, as v1 made obvious.
First division is deconstructing the two class of tables into pairs, and then knowing that they mostly share the same information, I can deconstruct these into smaller tables. There are smaller tables here:
Dynamic -- Values that are fairly distinct to each SKU, but don't change that often. ex. product description.
Evergreen -- Values that are common and will never change. ex. where to do I work again?
Null -- It's just a place-holder for all of the columns that are normally NULL, so it is really just a huge blank table with a column of skus.
With this collection of tables, I can now write a view that extracts the data that is needed and pushes out the two template files I need, in the correct order.
I can break this down one more step. Despite so many listings, there aren't that many model numbers, so there is no need to keep 200,000 rows of sku & pictures. If I can break the skus down into model numbers, I can have a model_number -> pictures relation, and then have a sku -> model_number look up table (LUT). The pictures table will only have about 1,000 rows. The LUT will be similar, so I have:
Code:
dynamic (sku, product_name, a, b, c, d...)
Code:
evergreen (sku, info1, info2, info3...)
Code:
null (sku, nv1, nv2, nv3...)
Code:
pictures_via_sku (sku, pic1, pic2, pic3...)
Code:
pictures_via_model_number (model_number, pic1, pic2, pic3...)
-- unfortunately, I don't have all the model numbers prepared yet, but this will be a HUGE win once it is completed. The pictures, names, and paths change constantly ---
Code:
sku_model_lookup_table (sku, model_number)
the pks are all bound to the SKU. There is actually a serial counter and sub-class column on the dynamic table, which precedes the SKU in the order, so the dynamic table looks like this. The counter serves no other purpose but to sort the products, since it does matter. The ptype simply serves as a simple way to sub-select. I could just as well do a LUT on product types as well... but YAGNI:
Code:
dynamic (pnumber, ptype, sku, product_name, a, b, c, d...)
finally:
Code:
view1 on
select
sku, a, b, c, d, e, f....
from all_them_tables
Code:
view2 on
select
sku, d, a, c, f, g, r, info1....
from all_them_tables
and that is the intended structure... The LUT will be quite valuable for many other areas I have to work with, but hopefully this gives a decent overview of the situation. Oh, right, did I forget to mention that I have to do to the above 2x?
Interestingly, the view generates the template file I need in about 17 seconds. The huge monster relation took up to 3 times as long. The problem is that it still takes about 5 minutes for a full exportation, which I guess is a habit I can easily live without.