Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

07-26-2012 , 12:21 PM
Seems to make sense, this is my lack of experience using non relational databases coming through I guess!

edit: So would a non-relational database look something like:

Seat1Username | Seat2Username .... Seat10 Username | ButtonUsername | UTGUsername .... COUsername | FlopPotSize | TurnPotSize | RiverPotSize | WinnerUsername

etc etc? I can see that this would be easy to generate reports on but you'd have to have a lot of columns (is this the right sort of design?)

I can see the benefit of that, but wouldn't very basic relationships make this design better? Instead of username in each field have a userID, would possibly make querying faster (I'm assuming where userID = int is quicker than where username = string) and would make the db size smaller etc

Perhaps a heavily denormalised relational database only with basic relationships would be the best design?

Last edited by Gullanian; 07-26-2012 at 12:28 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 12:28 PM
Just wanted to chime in on the JSON and noSQL discussion.


-I'm a statistician currently using couchdb, a document database for a research project. It is definitely what you want to use for statistical analysis
software

-I was an online professional poker player for 6 years

-Reason to use couchdb is MapReduce- this is just a function that visits all your documents,

-I have not yet tried the to use Mr_Wooster's JSON parser but will do so. I saved many of my hand histories from my years of play
(1 million Pot Limit Omaha hands, 100,000 hands each from 7 other games)

-With Couchdb you can just write Map and Reduce queries in Javascript

-Couchdb takes up 10X the storage space of MySQL and the queries are initially slower, but are fast once you've done them once

Here's an example from my current database

Code:
function(doc) {
    if(doc.permalink && doc.acquisition.price_amount) {
       doc.offices.forEach(function(offices) {
           if(offices.longitude && offices.latitude) {
        emit(doc.permalink, doc.acquisition.price_amount+'\t'+doc.acquisition.acquired_year+'\t'+ offices.longitude+'\t'+offices.latitude)
         }        
    });
  }
}
This creates key/value views . Then in python you can

Code:
server = couchdb.Server('http://user:password@localhost:5984')
db = server['database_test']

#returns the keys and values from couch db view

company_location = db.view('company/name_location_acquired')
f = open('C:/name_location_acquired.txt','w')

for r in company_funding:
    try:
        f.write('%s\t%s\n' % (r.key, r.value))
    except:
        continue

f.close()
-What noSQL can do is have a flexible schema that can answer questions a RDMS just couldn't

-The worrying thing is that most new technologies in order to be successful in the market have to be introduced as cheaper, target a group that is currently unserved, or as a hybrid of the old and new

-NoSQL in many ways is NOT CHEAPER(maybe on development time)

-it will be more expensive hard drive space wise and learning time for users and other ways to use a noSQL database but there may be functionality that can give insights a SQL database currently doesn't

-The solution to this may a hybrid application that uses sql for standard features and noSQL for new analysis features.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 12:39 PM
Quote:
Originally Posted by Gullanian
Correct me if I'm wrong, but I see no reason to use a non-relational database for storing HH's if the purpose is to generate reports/stats on those HH's.

To generate a simple report as to your profit in the dealer position would take a lot of time to generate on a non-relational database but a well designed relational database with indexes set up correctly would be easier to write queries for AND should be faster to actually calculate the data.

Why would you use a non-relational database for this task? It just looks like it would make everything harder and slower. The only benefit I can see is faster insertions? But I mean that's not really going to be much of an issue, in real time you might get up to 100 hands per hour @ 30 tables = 0.8 hands inserted per second. And if you're inserting bulk data from past sessions people are usually fine waiting for an hour or so for that.
Probably also easier to maintain since SQL can be a bitch at times.

Aaaand while we're on topic...if we're talking about a HUD would it make any sense to use an in memory DB and then write to an SQL DB after the session...i.e. session starts, get all player names load that data into in memory DB and update the in memory DB while playing then write the session hands to SQL-DB...smth like that.
Basically remove SQL stuff while playing hands except at startup/end of session.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 12:41 PM
Quote:
Originally Posted by jjshabado
I agree with your general point, but I don't think its quite as strong as you say. There are times where you'll want to do small joins and that's totally reasonable and acceptable.

For example, it could be totally reasonable to have a HH collection and a player collection. If you have a query where you want to calculate some statistics about a small group of players (like those at your table) you would be fine. You could look up the player data, query the HH for the players you want doing whatever super complex stuff you want, then 'join' the data in memory where you match the players to the appropriate results.
Ok... but in your example you are running 2 queries on a NoSQL database. 1) get the player data 2) query for the HH which match the player data. In SQL you could do this in 1 query.

In this case, there is probably not much between the two, but if you had another collection called stats, and another called sessions etc... suddenly, you need 5-6 separate noSQL queries and performance starts have an impact.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 12:43 PM
@LA_Price: OK, this is where my experience of NoSQL starts to get over-stretched. Map-Reduce is a great tool, but dont you have to iterate it over every single item in the collection? I.e. could you do a map reduce only of hands which had a player in list x?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 01:16 PM
Quote:
Originally Posted by MrWooster
Ok... but in your example you are running 2 queries on a NoSQL database. 1) get the player data 2) query for the HH which match the player data. In SQL you could do this in 1 query.

In this case, there is probably not much between the two, but if you had another collection called stats, and another called sessions etc... suddenly, you need 5-6 separate noSQL queries and performance starts have an impact.
Sure, I agree with all of this.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 01:19 PM
@MrWooster: Map Reduce the first time does go over every document, but yes you set a condition in you code like if (player == 'bigraiser') so it will only include documents that meet that condition. It then creates this query as a "designed document".

Here's what I think are the functional possibilites for a MapReduce functioning database. You could start to insert classifications on players(like LAG, TAG,etc) or classify board textures(revision the original or designed documents with python code). This means to professional players with a couchdb database could query how often a LAG player raised them on the turn in the past month at 200 No-Limit. Something they would want to know when deciding whether to fold or call, and not something a SQL database can do.

Going over every record makes the first query extremely slow relative to SQL(I often execute code then go make coffee). But, after the first query Couchdb uses incremental MapReduce in that it only goes back to new or updated documents so all future queries are fast.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 02:41 PM
Don't mean to hijack thread, but quick q:

How hard is it to write a program that takes info from a certain website and pastes it into excel?

I've been manually typing out all available players daily for my fantasy games and I want to automate the process. I just know Python basics fwiw but obviously am gonna have to learn more to get this done.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 03:04 PM
@LA_Price Thanks for the clarification. I am slightly confused why you couldnt run that query on an SQL database tho?

The incremental Map Reduce is definitely a big feature as I guess that after the first query, subsequent ones would far outperform equivalent SQL queries as they are only looking at a tiny subset of the documents in the collection. How long does CouchDB cache the initial MapReduce for? Is it indefinite or will it re-run it over the entire collection after a certain amount of time?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 03:51 PM
@MrWooster: actually you might be able to(thinking about it) but it would require schema redesign if you didn't think about putting a player or flop classifications column in from the start(this is where noSQL starts to shine).

With a document you can just add a classification like {"player_type":"LAG"} into just a few of the documents or none of the documents. Then just write the map functions.

As far as I know CouchDB doesn't cache queries, but rather saves them as design documents and just checks for changes when you request the view. I just re-ran a month old query on a 95,000 document database which took 1 second it (took about 15 minutes initially).

10 annoying things about Couchdb

Quote:
All reads go to disk

CouchDB doesn't cache anything. It does delay commits if you want it to, so that it doesn't hit the disk on every document update, but it sure as heck doesn't cache anything in memory. This is both curse and blessing. It keeps the memory footprint of CouchDB incredibly small, no doubt. Considering they're targeting mobile devices it makes a lot of sense, plus, accessing flash-based storage is a lot cheaper than spinning disks.

But, on the other hand, when I have the memory available, why not use it? I know caching is a hard problem to solve. CouchDB is also made for high concurrency, no doubt, but my disks aren't necessarily. Sure, I could buy faster disks, but if you really think about it, memory is the new disk, plus, tell Amazon to offer faster network storage for EC2, please do, maybe that'd already help. CouchDB somewhat relies on the file system cache doing its magic to speed up things, but I really don't want to rely on magic. You could put an HTTP-level reverse proxy like Varnish in front of CouchDB though, that'd be a feasable option, but that adds another layer to your infrastructure.

In all seriousness, I'd love to see some caching introduced in CouchDB. I won't say it's an easy feature to implement, because it sure isn't, but it doesn't need to be something fancy, I just would like to see CouchDB use some of my memory for data that's read more often than it's written. But until then, Varnish to the rescue!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 03:54 PM
Quote:
Originally Posted by MrWooster
@LA_Price Thanks for the clarification. I am slightly confused why you couldnt run that query on an SQL database tho?

The incremental Map Reduce is definitely a big feature as I guess that after the first query, subsequent ones would far outperform equivalent SQL queries as they are only looking at a tiny subset of the documents in the collection. How long does CouchDB cache the initial MapReduce for? Is it indefinite or will it re-run it over the entire collection after a certain amount of time?
You can get the map reduce jobs to work over a cluster of couch instances very easily too.

It also has:

Really awesome replication
An api that works over http
Capable of serving static assets without a web server

I still don't like it for every day use though. Personally I don't like the http based api. It's cool but I'd rather have native drivers to work off of. Having to write views to do any type of querying kind of sucks too but in a high end production env. it wouldn't be as bad.

I would definitely use couch if I wanted to write an application that could work offline and then flawlessly sync up when the connection is back.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 03:54 PM
Ye, certainly SQL database design is much more important initially as its difficult to change later on. I am quite intrigued by the MapReduce functions, I have read a bit about them regarding MongoDB but didnt realise how powerful they were.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 04:27 PM
Interesting conversation, guys. I have no idea what to think, but I have considered using MongoDB but I don't really need web scale . All kidding aside, it looks like some interesting stuff and appears to have excellent support and packages for Python...

But I have to wonder: why did FPDB decide to use syntax 3 different DBMS?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 04:53 PM
I have a site with a few hundred .gif files with a white background. I'd like to change them to transparent .png files. Is there a way to do this other than grinding through them one at a time?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 05:14 PM
Quote:
Originally Posted by e i pi
I have a site with a few hundred .gif files with a white background. I'd like to change them to transparent .png files. Is there a way to do this other than grinding through them one at a time?
probably imagemagick
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 05:38 PM
All this discussion of noSQL databases is great, fascinating imo.

I still don't get it though lol. Seems like I'm currently imagining them as a huge table, more alike to excel sheet, that can just add columns dynamically if something is imported with a previously unknown column. which iirc isn't outside the realm of possibility in postgres, but would be slow(er) to query than a traditional normalised schema.

And the query above "how often did a LAG raise me on the turn at 200nl in the last month" - doesn't seem much different at all. in noSQL you add a new attribute to some player records, to define TAG/LAG. You can do the same in postgres in multiple ways. if it's a rarity, inline the definition of LAG into a query i.e. SELECT raise_on_turn FROM normal_schema WHERE player.vpip > 60 and player.pfr > 40, or use that to make a create a temp table of LAGs for joining. Or do it in a nosql style, add the column with an "ALTER TABLE player ADD COLUMN classification DEFAULT null; UPDATE player SET classification = 'LAG' WHERE player.name IN ('Samoleus', 'BldSWtTrs');" then query using that in the WHERE, index it if it's gonna get hit a lot. Or make a new "table player.id UNIQUE, classification" that can be joined on to player table 1-1 and used for filtering, if you'd rather not alter the main player table. or probably many other ways too.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 05:44 PM
Quote:
Originally Posted by Mariogs37
Don't mean to hijack thread, but quick q:

How hard is it to write a program that takes info from a certain website and pastes it into excel?

I've been manually typing out all available players daily for my fantasy games and I want to automate the process. I just know Python basics fwiw but obviously am gonna have to learn more to get this done.
varies from very easy to a bit tricky, depending on the source html / if it's queried later with ajax / authentication needed to get at it.

proper python method is probably "beautiful soup", look that up.

I tend to be lazy and get the webpage I want in a firefox, view selection source (is there a way to "view selection source" in chrome?), save that html to a .txt and write a dirty regex to pull out what I'm wanting, write to csv for easy import to excel.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 05:45 PM
I don't quite get it either I think, I'm visioning a giant table with lots of columns, each describing something very specific with lots of repeating data. If you had enough columns describing everything and all your queries basically did was add stuff up I think it would be pretty good and fast. Is that what is meant really?

Also, is the choice really binary? Non-relational or relational? I could really see some very basic relationships would be beneficial, querying on a userID is a lot faster than a username, and smaller on disk.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 05:48 PM
You can't group up nosql too because there's a few different types.

There's key value stores, document database, column family dbs and graph databases. All of those are non-relational but the way they work are much different.

A column store is sort of like a giant table with rows and columns but each row can have its own set of columns. In your brain you could think of it as an excel sheet sort of.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:00 PM
All this DB talk reminds me I need to switch to MariaDB for most of the MySQL projects I use.

As for the differences between relational and non-relational, a good DBA friend of mine said: "Non-relational DB projects are generally trying to solve problems that do not exist."

I tend to agree. Though obviously I use k:v and document databases in my work, MySQL/MariaDB (plus memcached if you really need it) works for 99% of cases - people just want to use MongoDB/Cassandra/Redis/whatever to say that they use it.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:02 PM
I wish I had never heard of Cassandra...
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:03 PM
A Document storage system is different than a giant table with lots of columns. A standard example in Mongo is a Blog Post. To store that in Mongo you could have:

Code:
{
  _id : ObjectId("4e77bb3b8a3e000000004f7a"),
  when : Date("2011-09-19T02:10:11.3Z",
  author : "alex",
  title : "No Free Lunch",
  text : "This is the text of the post.  It could be very long.",
  tags : [ "business", "ramblings" ],
  votes : 5,
  voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
  comments : [
    { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
      comment : "I agree." },
    { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
      comment : "You must be joking.  etc etc ..." }
  ]
}
Quote:
It's interesting to contrast this with how we might design the same schema in a relation database. We would likely have a users collection and a posts collection. But in addition one would typically have a tags collection, a voters collection, and a comments collection. Grabbing all the information on a single post would then be a little involved.

* From: http://www.mongodb.org/display/DOCS/...sign-Blogposts
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:05 PM
Quote:
Originally Posted by kyleb
I tend to agree. Though obviously I use k:v and document databases in my work, MySQL/MariaDB (plus memcached if you really need it) works for 99% of cases - people just want to use MongoDB/Cassandra/Redis/whatever to say that they use it.


I mean, CS is full of things that work 99% of the time - but I don't know many things that are the best option even close to 99% of the time.

Edit: Poor quoting on my part. They eye roll was for "Non-relational DB projects are generally trying to solve problems that do not exist." from a DBA.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:09 PM
Quote:
Originally Posted by jjshabado
A Document storage system is different than a giant table with lots of columns. A standard example in Mongo is a Blog Post. To store that in Mongo you could have:

Code:
{
  _id : ObjectId("4e77bb3b8a3e000000004f7a"),
  when : Date("2011-09-19T02:10:11.3Z",
  author : "alex",
  title : "No Free Lunch",
  text : "This is the text of the post.  It could be very long.",
  tags : [ "business", "ramblings" ],
  votes : 5,
  voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
  comments : [
    { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
      comment : "I agree." },
    { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
      comment : "You must be joking.  etc etc ..." }
  ]
}



* From: http://www.mongodb.org/display/DOCS/...sign-Blogposts

So if you wanted to query for all Tags, would you have to look through every single blog post to find out what tags are defined? Or do you store a collection of your tags elsewhere?

edit: or are those tag names, keys to retrieve information about a tag (like a foreign key)?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-26-2012 , 06:17 PM
db.posts.distinct("tags")

Edit: So you'd query the documents and not create a separate collection. If you're expecting a large result set I think you need to use a map-reduce job.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m