Open Side Menu Go to the Top

11-01-2015 , 09:00 PM
Quote:
Originally Posted by blackize5
Yeah I checked out wiki and another article and couldn't get a clear answer.

Guess it's "contested"
Yeah, it's controversial, but suppose this:

you are making a list of famous singers.

Dave Matthews
Michael Skype
Madonna
Cher

If you follow the rule entirely, you end up with two tables, one called two_name_singers and another called single_name_singers.

Next, you are handed:
John Cougar Mellancamp

I guess you now need a table called singers_with_middle_names

Next, you want to make a table called bands_singers, which expresses the bands each singer is in (they can be in more than one band).

You can't FK from one table to three tables. Not only that, you are bound to end up with singer_id duplication, so good luck joining across tables.

You *could* make a table of singers that has a column of singer_id, singer_name, and name_type (constrained to first, middle, last), and that keeps no nulls in the column and follows "the rule," but now you are stuck mapping each id to the correct combination of first, middle, and last, yuck.

I'm pretty sure you'd rather just have a table of singer_id, first_name, middle_name, last_name and would be fine to deal with nulls, yeah?

I consider using NULLs proper normal form as long as they aren't used in a PK, but I'm no expert on the gritty details of this stuff.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
11-01-2015 , 09:25 PM
Quote:
Originally Posted by daveT
Yeah, it's controversial, but suppose this:

you are making a list of famous singers.

Dave Matthews
Michael Skype
Madonna
Cher

If you follow the rule entirely, you end up with two tables, one called two_name_singers and another called single_name_singers.

Next, you are handed:
John Cougar Mellancamp

I guess you now need a table called singers_with_middle_names

Next, you want to make a table called bands_singers, which expresses the bands each singer is in (they can be in more than one band).

You can't FK from one table to three tables. Not only that, you are bound to end up with singer_id duplication, so good luck joining across tables.

You *could* make a table of singers that has a column of singer_id, singer_name, and name_type (constrained to first, middle, last), and that keeps no nulls in the column and follows "the rule."

I'm pretty sure you'd rather just have a table of singer_id, first_name, middle_name, last_name and would be fine to deal with nulls, yeah?
And what about Latin singers, who could have like 5 or 6 names?

I think a better design would be:

Code:
Singers
-------
id
fullname_id

FullNames
-------
id

NameParts
---------
id
name
position

Name_NameParts
--------------
id
namepart_id
fullname_id
The names can then be reconstructed (fairly easily) using GROUP_CONCAT in MySql or the atrociously named XML_PATH in SqlServer.

You could also just stick "fullname_id" in NameParts if you don't care about the fully normalized version.

Is there still something comically complex about what we have to do to model names well in a DB? Yes. But that's because DB's force you to think at the level of implementation detail. But these details can be hidden easily enough behind a repository, so that you deal with it one time, and thereafter only have to deal with application code that looks like "user.full_name"....
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:26 PM
Inserting (1, 2, 3) and (1, 4, 3) on Name_NameParts, assuming 2 and 4 are the same names with different positions would create an anomaly, right?

Another way could be:

Code:
names
---------
name

singers
----------
id, name, position
pk(id, name, position)
fk(name) -> names(name)
Still the same problem, unfortunately. Can't really discount people using a name like "Joe Joe Joe" or whatever.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:32 PM
I've always felt like name should be one field and left at that. That's what we did and I think it works great except for a very small number of use cases (Salesforce for example needs to distinguish between the two).

And even then it should probably just be:

1. Full name
2. Addressable name (as in the name you use for addressing the person directly).
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:35 PM
The only thing you have to worry about is a name made up of parts with duplicated positions. The non-fully normalized version I suggested makes that easy to solve with indexes. I assume you could find a way to do it the fully normalized version too, though then the constraints span multiple tables.

In any case, you can enforce such rules at the application level too. Most interesting applications have rules whose complexity either cannot be expressed in most db software, or can only be expressed with enormous complexity. So I don't think it matters anyway, even though in this particular case you can solve it in the db pretty easily.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:37 PM
Quote:
Originally Posted by jjshabado
I've always felt like name should be one field and left at that. That's what we did and I think it works great except for a very small number of use cases (Salesforce for example needs to distinguish between the two).

And even then it should probably just be:

1. Full name
2. Addressable name (as in the name you use for addressing the person directly).
For most applications this is a good solution. But it all depends on your needs, so if we're discussing theory I just assume we're talking about a case where the need is there.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:41 PM
Fair enough.

The pragmatist part and programmer part (aka the irritating arguer part) in me wants to point out that there is never a need to store name parts like you're discussing.

Store the name and parse it on the other end of you want individual name parts.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:56 PM
You have to be sure the input is correct, which, as you said, would be solved at the coding level. You'd then be able to split on some symbol that isn't in any name in the entire world to break the name apart when you query from the db. Kind of trading one headache for another, IMO.

edit to add: maybe not trading headaches, really, unless you really want to be querying name parts, but I imagine most of us don't work in genealogy.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 10:59 PM
Quote:
Originally Posted by jjshabado
Fair enough.

The pragmatist part and programmer part (aka the irritating arguer part) in me wants to point out that there is never a need to store name parts like you're discussing.

Store the name and parse it on the other end of you want individual name parts.
If you need a first name you need to have some confidence that you can always either get a first name or nothing. Something that will keep you from saying "Hey The!" in your marketing emails.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-01-2015 , 11:13 PM
Quote:
Originally Posted by jjshabado
Fair enough.

The pragmatist part and programmer part (aka the irritating arguer part) in me wants to point out that there is never a need to store name parts like you're discussing.

Store the name and parse it on the other end of you want individual name parts.
I could add, "the unimaginative part" :P

How about names like "Mary Ann Smith" or "Joost Van Der Sloot"?

How about a big data application whose purpose is to find trends in billions of names, and needs fast sorting and counting on first, last, and middle, such that DB technology would succeeds where string parsing would become a performance bottleneck?

How about some other thing neither of us can even think of right now?

And if you're still not convinced it's possible, then just substitute "storing multiple names" for "storing multiple something else" with a parts-whole relationship, and a belongs-to relationship to some other thing.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 05:51 AM
I'm all for a single field for a name. There are too many edge cases, especially if you work for international clients (spanish people e.g. have two last names and commonly a middle name).

Quote:
Originally Posted by kerowo
If you need a first name you need to have some confidence that you can always either get a first name or nothing. Something that will keep you from saying "Hey The!" in your marketing emails.
Who puts "The" in their name field? The Great Gonzo?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 07:01 AM
More importantly, why is it of such significance to make no blunders while addressing someone in a marketing email?

I don't give a rip what people call me using a personal (as in "for me eyes only"), digital form of communication when the only interesting piece of information is the product/service they are trying to peddle.

Last edited by kazana; 11-02-2015 at 07:02 AM. Reason: just being facetious, obv
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:13 AM
Quote:
Originally Posted by gaming_mouse
And on the eighth day, he rested, and gazed upon his extra bit, and knew that it was good.
SQL dayofweek is 1-based, tho. 0th bit straight doesn't exist, homie.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:24 AM
Quote:
Originally Posted by jjshabado
I've always felt like name should be one field and left at that. That's what we did and I think it works great except for a very small number of use cases.


Seriously, I can't believe you posted that. Real talk, my respect for you has dropped a bit.

Just as one example, if you want to list people as "Surname, FirstName" which you do, like all the time, you can't do it because people have names like "Robert Van Winkle Junior" where you can't tell where the surname is.

I'm guessing you work in some field where people's names are tangential. I worked for a couple years with a company that did software for schools, where names are everything. Doing it like you suggest is the beginning of a descent into madness.

Never violate 1NF. I regret doing it for the days of week thing. Me doing that actually caused a production bug in that system, predictably enough.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:27 AM
Quote:
Originally Posted by gaming_mouse
For most applications this is a good solution.
It never is. Best case it obviates the need for like 5 seconds worth of string concatenation code. Worst case it makes it impossible to do what you are trying to do with names.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:28 AM
Quote:
Originally Posted by Wolfram
I'm all for a single field for a name. There are too many edge cases, especially if you work for international clients (spanish people e.g. have two last names and commonly a middle name).


Who puts "The" in their name field? The Great Gonzo?
The Andersons etc, see it fairly often in utility user data.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:30 AM
Quote:
Originally Posted by kazana
More importantly, why is it of such significance to make no blunders while addressing someone in a marketing email?

I don't give a rip what people call me using a personal (as in "for me eyes only"), digital form of communication when the only interesting piece of information is the product/service they are trying to peddle.
We have a customer who expects us to re-run 300,000 paper reports to filter out a handful of minor errors... Fortunately it usually happens before we send the reports to the printer.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:39 AM
Quote:
Originally Posted by kerowo
If you need a first name you need to have some confidence that you can always either get a first name or nothing. Something that will keep you from saying "Hey The!" in your marketing emails.

My point is more that the information you need is never (?) the 'first name'. That's a Western approximation for 'casual addressable name'. So store that. Store the name values that relate to your use case.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:43 AM
Quote:
Originally Posted by ChrisV




Seriously, I can't believe you posted that. Real talk, my respect for you has dropped a bit.



Just as one example, if you want to list people as "Surname, FirstName" which you do, like all the time, you can't do it because people have names like "Robert Van Winkle Junior" where you can't tell where the surname is.



I'm guessing you work in some field where people's names are tangential. I worked for a couple years with a company that did software for schools, where names are everything. Doing it like you suggest is the beginning of a descent into madness.



Never violate 1NF. I regret doing it for the days of week thing. Me doing that actually caused a production bug in that system, predictably enough.

I worked on educational software for years. If sorting by surname is your use case, then store that. I clearly said there are cases where it's important. But there's no need to separate out each part of the name because there's no use case for that.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:46 AM
Quote:
Originally Posted by jjshabado
My point is more that the information you need is never (?) the 'first name'. That's a Western approximation for 'casual addressable name'. So store that. Store the name values that relate to your use case.
I think storing data based on use cases is invariably a really, really bad idea. If you have "addressable name", store that, but store it in addition to full name.

The ideal with data storage is to store data in as much detail as it is possible to gather, in normalised form. In my experience cutting corners on this saves trivial amounts of time and can lead to huge headaches later on.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:46 AM
Quote:
Originally Posted by gaming_mouse
I could add, "the unimaginative part" :P



How about names like "Mary Ann Smith" or "Joost Van Der Sloot"?



How about a big data application whose purpose is to find trends in billions of names, and needs fast sorting and counting on first, last, and middle, such that DB technology would succeeds where string parsing would become a performance bottleneck?



How about some other thing neither of us can even think of right now?



And if you're still not convinced it's possible, then just substitute "storing multiple names" for "storing multiple something else" with a parts-whole relationship, and a belongs-to relationship to some other thing.

What about those names? My point isn't that people don't have weird names. It's that your use case is almost never the parts of a name. Parts of a name are highly correlated with our use cases (like first name being the casual name you use when addressing the person) but not perfectly correlated.

You shouldn't run a big data application like that against a db. So not a problem!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:47 AM
Quote:
Originally Posted by ChrisV
I think storing data based on use cases is invariably a really, really bad idea. If you have "addressable name", store that, but store it in addition to full name.



The ideal with data storage is to store data in as much detail as it is possible to gather, in normalised form. In my experience cutting corners on this saves trivial amounts of time and can lead to huge headaches later on.

Yes. I said to store full name as well. That's the one field that should always be there because it's the actual raw data.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 08:49 AM
Quote:
Originally Posted by jjshabado
I worked on educational software for years. If sorting by surname is your use case, then store that. I clearly said there are cases where it's important. But there's no need to separate out each part of the name because there's no use case for that.
Designing your system based on use cases is short-sighted. If you can easily gather surname and given names independently, store them that way regardless of whether there's any reason to do so. Requirements change and you might need access to surname in the future. If you CAN store it separately, there's virtually no downside to doing so and huge potential future upside.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 09:21 AM
But there are significant downsides when it comes to internationalization.

I was just giving my personal hot take on the issue - but looking it up, it seems like I'm not crazy: http://www.w3.org/International/ques...personal-names

Quote:
In some cases you want to identify parts of a name so that you can sort a list of names alphabetically, contact them, etc. Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose.
Obviously a lot of this has to do with how data is coming into your system. But in most cases the person designing the database has that control (directly or indirectly).

If you're doing something like importing a list of kids from a school, then maybe you need to do something different. But I'd still say there's a compelling argument for keeping the core use cases of your database identified (typically sorting and addressing) and doing the mapping of incoming data to use case once at import. (Edit: Particularly if you can do that mapping without destroying any data)
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-02-2015 , 09:24 AM
Quote:
Originally Posted by jjshabado

You shouldn't run a big data application like that against a db. So not a problem!
To avoid future getting yelled at - I'm assuming we're limiting the scope of this discussion to a transactional database where we care about normalization.

Obviously you could run a big data application off of a reporting database - but then this whole conversation is pointless because we don't want normalization. And the correct answer here is probably to denormalize the name field into each of the fields you might want to query (full, first, last, parts, etc.).
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

      
m