I have kind of an interesting anecdote/thought about SQL that I just wrestled with most of the afternoon.
tl;dr - Your first impulse with SQL is probably wrong (but you should probably try it anyway)
I had 2 tables, something like
Code:
create table outer (
outer_id integer primary key,
thing varchar(100)
unique index(thing)
)
create table inner (
inner_id integer primary key,
outer_id integer references outer(outer_id)
thing2 varchar(100)
unique index(outer_id, thing2)
)
So the basic idea is, you can only have 1 row for each "thing" in the outer table. Each row in the inner table references a row in the outter table, and you can have just one row per outer_id+thing2. It doesn't really matter what they are but think of it like may outter is a list of animals and inner is a list of body parts or something like that.
I have a bunch of pairs like
('thisthing1', 'thisthing2)
and I need to put them all into these 2 tables as is appropriate.
My first thought, which I assumed would suck (but which I did anyway because it might be good enough and it was easy) was, to step through each thing I wanted to insert and say
1. do I already have an entry in outer? If so, id = that entry, else, insert entry and get id from that
2. insert into inner using this id, ignoring/throwing away any duplicates
This sucked. So I tried changing it so that I replaced 2 with
2. do we already have this entry in inner? If so, skip, else insert
This really sucked, even worse.
So finally I decided to do something like this
1. do a bulk select from my outter tablethat gets me all the entries from the database where thing1 is from my big list.
2. do a bulk insert of any of these that aren't in my list
3. do #1 again (because I need to get the IDs of all the inserted rows)
4. do a bulk select from my inner table similar to #1
5. do a bulk insert of anything that's missing
This is from memory but that's the basic idea. The code is longer, it's somewhat harder to read and understand.
It's 100x faster, too.