Open Side Menu Go to the Top

11-15-2013 , 06:33 PM
_dave_, nice post
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
11-15-2013 , 06:40 PM
Quote:
Originally Posted by Nchabazam
There are definitely some fields I could condense, but I think it suffers from making it harder to understand for newer devs.
i think your instinct is right: unless you have compelling reason not to, it's better to err on the side of less cognitive load.

having said that, you should make sure the fields of the form all correspond to a single conceptual entity, as they often won't and forms don't always map 1:1 to db tables. if you were collecting my name, email, address, and past employers, for example, storing them all in a single table would be a bad idea.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 07:30 PM
I think the issue that a lot of people face when dealing with databases is that it is so dependent on your app (i.e. how you access and use the data).

Database design is crucial to the performance of you application and is almost impossible to change later on. You need to think carefully about what data you index, and how you split your tables.

Regarding the number of columns, when you run a query, the database loads the affected rows into memory. So, if you expect to be running queries that return a ton of rows, it makes sense to keep the number of columns in that table small (this is likely the issue that _dave_ is having with PT). However, if your queries are going to be returning a couple of rows max at a time, having 25 columns will not cause a big memory footprint and might actually speed up the query due to the overhead of creating joins.

Some great points are coming out of this discussion, but please think very very carefully about how you will access/use your data before making a decision.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 08:14 PM
I just want to throw in that with decent technologies and tools it's not too bad to change an existing schema.

If down the line you think your data needs to be modeled a different way then you can just batch export/import records to the new format. I can't imagine this not working unless you had millions upon millions of records and then it would still work but you would probably want to run this task on a dedicated machine.

Edit:
To attach some #s. On an amazon ec2 micro instance it takes about 15 seconds to delete + repopulate a new elastic search index with a few thousand blog-type documents. It takes quite a bit longer using postgres through rails but it has nothing to do with either of those being slow, it's just creating a ton of queries because I'm doing it individually and going through all of activerecord's hoopla.

With bulk operations and no callbacks you could likely get this done in seconds too with direct queries and since you're migrating old data to new data you know your data is accurate so that would be perfectly reasonable.

Last edited by Shoe Lace; 11-15-2013 at 08:27 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 08:24 PM
<cough>Mongo*</cough>



* Or equivalent...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:13 PM
Quote:
Originally Posted by Shoe Lace
I just want to throw in that with decent technologies and tools it's not too bad to change an existing schema.

If down the line you think your data needs to be modeled a different way then you can just batch export/import records to the new format. I can't imagine this not working unless you had millions upon millions of records and then it would still work but you would probably want to run this task on a dedicated machine.

Edit:
To attach some #s. On an amazon ec2 micro instance it takes about 15 seconds to delete + repopulate a new elastic search index with a few thousand blog-type documents. It takes quite a bit longer using postgres through rails but it has nothing to do with either of those being slow, it's just creating a ton of queries because I'm doing it individually and going through all of activerecord's hoopla.

With bulk operations and no callbacks you could likely get this done in seconds too with direct queries and since you're migrating old data to new data you know your data is accurate so that would be perfectly reasonable.
I don't think they mean the database itself so much as the program logic built on top of it.

It doesn't hurt to learn a bit of Boyce-Codd in regards to modeling a database. It isn't always logical and obvious. I think that using the algorithm for BCNF and other forms helps mitigate much of what _dave_ is describing.

I've built 3 significant databases now and I honestly still feel like an utter idiot. I think that, especially in this day and age of every single website being db-backed, it is shocking how little weight is put into getting a legit DBA.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:32 PM
i'm tired and it's friday so i'ma skip over your claim that your data model is so whatever that it needs tables with 25+ columns. it seems impossible to me but perhaps you are the exception that proves the rule.

the thing i'm really after is this:

Quote:
Originally Posted by splashpot
my company is a pretty large successful e-commerce company and it's working for us.
i have worked at a few "pretty large successful e-commerce companies" -- some you would have heard of -- and just because you can torture some absurd data model that some dumbass wrote in 2003 the day before they needed a demo so they could get seed money instead of dying DOES NOT MEAN that it's a good or even non-terrible idea to do so.

designs are good or bad because of how fit they are for a given purpose, how easy they are to reason about and modify, how well they perform, or dozens of other things. they are not good or bad because someone at $LARGE_FAMOUS_COMPANY did it that way one time.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:35 PM
and to be clear this has nothing to do with data models, really. people do all kinds of dumb **** all the time, even at your googles and apples and microsofts. flickr ran their whole stack on ruby on rails and a single mysql instance until an absurdly late point in their growth. "it worked for them", so that model is totally cool right?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:44 PM
Quote:
Originally Posted by tyler_cracker
and yet you didn't write a single comment anywhere else. also, that comment displays the exact opposite of "an understanding".
I hadn't written a single comment elsewhere because I copy and pasted the basic scanner stuff from a previous program and I usually do comments once the program is complete (would obv do it as I go along with more complex stuff).

What's with the attitude? Are you socially ******ed?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:50 PM
It seems like it might be entertaining to explain my favorite bad table design story, although it doesn't involve large numbers of columns at all. (Or it may not be entertaining, but rather just Tl;DR. Sorry in advance :P)

It involves a table that stores time intervals that are used for things like business hours. The user interface allows you to create intervals where you specify

- the day of the week
- the start time (in hours and minutes as separate inputs, using 24 hour time i.e 14:30)
- the end time (ibid)

Pretty simple right?

Well, the db table has columns that look like this

columntypedetails
dayintvalues 0-7 i.e sunday-saturday
startdatetimethe value is always 2000-01-01 hh:mm:00
enddatetimewhere the hh:mm come from the hours/minutes inputs

So it turns the hours minutes into datetimes using Jan 1, 2000

The problem with this is that the db layer of the application automatically skews datetimes when they are inserted/selected from the db based on the difference between the timezone of the user and the timezone of the db server. The code that actually uses the intervals to perform calculations reads them from the db, where the times are skewed back to local, and extracts the hours, minutes. And keep in mind there is a separate day of the week column.

So the reason this ends up being horrifyingly obnoxious is that there are contexts in which different users are editing these intervals from different time zones, or when the timezone of the account changes after they've created the intervals. Since the data in the db is relative to the timezone it was entered from, that effectively changes the values without actually touching them. or to be more accurate, the problem is that the UI makes it seem like you're entering values that are relative to whatever the timezone configuration is set to (you enter 8pm, you don't enter 8pm GMT) but in reality what gets entered is an absolute time based on the current timezone. It also causes fun things when the timezone changes cause an interval that was entirely within a single day in timezone A, now suddenly starts one day and ends the next, in timezone B.

This problem has existed in this particular application for like ~10 years and rather than fix the db (it should just store int day, int hours, int minutes) what happened instead is I wrote a ton of code in like 2009 to "normalize" intervals that get out of whack due to being edited from "the wrong timezone" and basically told the users to not do things that should intuitively work based on the UI. It's hilariously terrible.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:55 PM
Quote:
Originally Posted by well named
- the start time (in hours and minutes as separate inputs, using 24 hour time i.e 14:30)
i already hate it.

Quote:
The problem with this
a bold claim!

that's a pretty impressive system -- write it up with a little more drama and it's a slamdunk for tdwtf.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 10:57 PM
I take your point :P

the particularly amusing problem...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-15-2013 , 11:07 PM
Dont let that be a habit though (commenting at the end). I totally developed that as a habit and it really sucks to have to go back through hundreds-thousands of lines of code and add applicable comments. Learn from my mistakes!!!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 02:24 AM
well_named, i've experienced the opposite problem with some inherited code, where timestamps are stored in the database in local time and it's an absolute disaster and makes everything ten times harder (even worse, it's not 100% consistent and some tables store things in UTC). i think in general always storing datetime in the db as UTC is good practice and makes thing *much* easier. the displaying of date in local time is really a "view" concern.

your case is obviously a different beast.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 02:27 AM
Quote:
Originally Posted by tyler_cracker
flickr ran their whole stack on ruby on rails and a single mysql instance until an absurdly late point in their growth. "it worked for them", so that model is totally cool right?
Flickr never ran on Ruby on Rails. Is it possible you're thinking of Twitter?

(Having said that, your point is definitely valid and I'm sure there are many other things in Flickr's history that would illustrate it nicely...)
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 02:53 AM
yeah i guess i'm thinking of some other photo sharing site's architecture . i recall a very specific set of slides but it turns out that googling for "flickr stack" and "flickr architecture" and "flickr slides" turns up a bunch of things which are not what i want. i found one of panchenko's talk from qcon but it's not what i'm thinking of.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 04:35 AM
Quote:
Originally Posted by tyler_cracker
i'm tired and it's friday so i'ma skip over your claim that your data model is so whatever that it needs tables with 25+ columns. it seems impossible to me but perhaps you are the exception that proves the rule.

the thing i'm really after is this:



i have worked at a few "pretty large successful e-commerce companies" -- some you would have heard of -- and just because you can torture some absurd data model that some dumbass wrote in 2003 the day before they needed a demo so they could get seed money instead of dying DOES NOT MEAN that it's a good or even non-terrible idea to do so.

designs are good or bad because of how fit they are for a given purpose, how easy they are to reason about and modify, how well they perform, or dozens of other things. they are not good or bad because someone at $LARGE_FAMOUS_COMPANY did it that way one time.
Lol chill out dude. I already said I'm not qualified to judge whether or not what my company is doing is good, bad, or terrible.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 10:15 AM
Quote:
Originally Posted by daveT
I don't think they mean the database itself so much as the program logic built on top of it.
Yeah maybe. I find myself adding or removing small amounts of columns on existing tables pretty often and those changes are really not bad at all at the app level.

I never had to transform a 175 column monster table into 25 highly normalized tables. I imagine the work required would be easier than building the 25 highly normalized tables from the start though.

It's usually much easier to change code once you know some version works and you have the tests to back it up. If at ground 0 and had the choice, I'd aim for the better setup of course but I wouldn't spend months trying to find the "perfect" schama. At some point I'd try to settle on something reasonable and go from there.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 11:37 AM
Quote:
Originally Posted by Shoe Lace
I never had to transform a 175 column monster table into 25 highly normalized tables. I imagine the work required would be easier than building the 25 highly normalized tables from the start though.
My guess is that if they have a 175 column monster table they don't have a nice service abstraction and they have at least half a dozen (and maybe an order of magnitude more) separate clients all accessing the tables.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 12:40 PM
Quote:
Originally Posted by splashpot
I already said I'm not qualified to judge whether or not what my company is doing is good, bad, or terrible.
sure, but in that very same sentence you went for APPEAL TO PRETTY LARGE SUCCESSFUL COMPANY.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-16-2013 , 06:35 PM
Quote:
Originally Posted by Shoe Lace
Yeah maybe. I find myself adding or removing small amounts of columns on existing tables pretty often and those changes are really not bad at all at the app level.

I never had to transform a 175 column monster table into 25 highly normalized tables. I imagine the work required would be easier than building the 25 highly normalized tables from the start though.

It's usually much easier to change code once you know some version works and you have the tests to back it up. If at ground 0 and had the choice, I'd aim for the better setup of course but I wouldn't spend months trying to find the "perfect" schama. At some point I'd try to settle on something reasonable and go from there.
With a decent schema, It isn't too hard to change things up a bit.

FWIW, the normal form algorithms start with a mega relation and then works out how to deconstruct the data into a "proper" schema.

http://infolab.stanford.edu/~ullman/...reldesign.html

https://www.youtube.com/watch?v=eMnWvJk0guI

https://www.coursera.org/course/db

Quite interesting stuff, actually. It is very close to what you learned about in SICP irt to working with "wishful" thinking, but explores how to actually create what you are wishing for, and shouldn't take months... :[
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-17-2013 , 12:18 AM
Quote:
Originally Posted by _dave_
25 columns probably not a big deal, but it's never a bad idea to think about your structure if you have the time.

One example that's always grinded my gears is PokerTracker 3/4, which if you've ever used you'll know can get horrendously slow at most operations after the hand count gets moderate, when using mechanical disks. Imo a great deal of this is due to the size of the tables, which can end up needing to read multiple GB of data from disk, which is slow.

Look at the structure of one of the primary tables, "tourney_hand_player_statistics" - a 237 column monstrosity. Observing it in PGAdmin you can see it is broadly split up in to 4 areas - preflop, flop,turn and river. A quick query run on my database (mostly HU! it's far worse in other games where a player may have a much lower VPIP) suggests only 42% of hands ever see a flop, and then <30% for turn and river. So here is a fine example - the vast majority of the many gigabytes read for common queries is just "couldn't happen" - irrelevant, completely unneccesary storage. There is no reason to store the effective stack size on the river for a player who folded preflop, for example.

So, ask yourself - are there columns in my 25 that rarely contain data? if so, maybe think about breaking that out in to another table.

edit: of course splitting this example in to seperate tables makes it no longer so easy to query "average saw river", joins (or trickery) will be needed - which is a tradeoff that may or may not be worth making, as that'll impact performance too.
Just to give another data point, I've experienced the opposite problem with HM1. My custom report was taking ~2 mins to run, and I realized that if I removed just a few columns from the report, avoiding joins to certain tables, it would be done in ~ 5 secs.

The street-specific information is stored on individual tables FLOP, TURN, RIVER, which makes sense, but the primary table is split into two, a KEY_COLUMNS table and a MISC table with the same number of rows (a lot of rows - one row for each player-hand combo). The tables have about 40 and 25 columns respectively. My report just needed a single column from the MISC table, but it would add 90 secs to the query, because it was joining the two biggest tables in the db.

Note that joining the main table with say, the RIVER table didn't take as long since the RIVER table is much smaller, and that design also saves a lot of space.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-18-2013 , 02:19 PM
has anyone used http://deployd.com/?

it looks cool, but i'm wondering how fast and stable it is

EDIT: upon further investigation, the project appears dead

Last edited by gaming_mouse; 11-18-2013 at 02:44 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-18-2013 , 08:38 PM
If you're a mod here you can select all the deleted threads on the first page and merge them into one to get rid of them...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
11-18-2013 , 09:10 PM
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

      
m