Open Side Menu Go to the Top

07-24-2012 , 09:00 PM
Quote:
Originally Posted by clowntable
Talking about databases (a topic I really should improve on given I have only used MySQL,Postgres and ZODB). If you would build some sort of poker software that has to handle boatloads of hand histories (smth like HEM or maybe the pokerserver itself) today, what kind of DB would you use for the HHs? HEM uses postgres but it feels like some other type of DB could be better for this. But then you'd probably have to interface with postgres et. anywys for statsdata and so forth.
Interesting topic for sure, I've thought about it a lot and I can't imagine much better than postgres. Hard to know what you mean, since the database servers in HEM or PT don't do anything with the HHs, other than store a copy for backup purposes. When you say "for stats data" - that's almost the entirety of the product!

So I don't know, maybe you mean something else, and maybe a completely different approach would call for something else entirely.

But if you're wondering if something could be done better than currently using an alternative database platform, it would be interesting to hear where you think postgres is failing these trackers. I believe HEM2 uses some form of hybrid strategy, but I haven't studied it yet. PT database I have studied quite extensively, not so much with 4 but it seems mostly unchanged.

Long way down the road I hope to find time to learn python, so I can conduct such experiments on FPDB and perhaps make it my dream tracker. but obv need to learn python first!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
07-24-2012 , 09:43 PM
Quote:
Originally Posted by _dave_
Interesting topic for sure, I've thought about it a lot and I can't imagine much better than postgres. Hard to know what you mean, since the database servers in HEM or PT don't do anything with the HHs, other than store a copy for backup purposes. When you say "for stats data" - that's almost the entirety of the product!

So I don't know, maybe you mean something else, and maybe a completely different approach would call for something else entirely.

But if you're wondering if something could be done better than currently using an alternative database platform, it would be interesting to hear where you think postgres is failing these trackers. I believe HEM2 uses some form of hybrid strategy, but I haven't studied it yet. PT database I have studied quite extensively, not so much with 4 but it seems mostly unchanged.

Long way down the road I hope to find time to learn python, so I can conduct such experiments on FPDB and perhaps make it my dream tracker. but obv need to learn python first!
Schema design is much more important than the database being used.

An RDBMS is the only conceptual way I know how to deal with the number of individual records required for poker. You need to be able to produce individual hands and aggregate data

Join us _dave_ - you won't regret learning python
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-24-2012 , 10:01 PM
I'm not convinced that a relational db is the best choice. I kind of think the decision to use postgres for PT or HEM was because viable alternatives didn't exist.

For starters, if he were just storing and querying hand histories then a document based DB is definitely the best fit. That will allow you to store everything in a non-rigid format.

With postgres you would need to do all sorts of multiple table grossness to try and normalize a poker hand because every hand is so different even if you limit it to 1 game. Your queries would also be monstrous and probably not very performant once your DB has grown to a huge size.

With a document based db you would just parse the HH and format the hand with json. Some hands will have a turn/river, others will not. Some hands will have 5 people to a pot, others will not. Awesome and super simple.

Then querying hands would be really easy too because you have everything broken up.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-24-2012 , 10:10 PM
Quote:
Originally Posted by Shoe Lace
With a document based db you would just parse the HH and format the hand with json. Some hands will have a turn/river, others will not. Some hands will have 5 people to a pot, others will not. Awesome and super simple.
how many times have i had A9? by position? facing a raise?

seems like those kinds of questions are going to be tough and/or expensive to pull out of json blobs.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-24-2012 , 10:31 PM
Quote:
Originally Posted by tyler_cracker
how many times have i had A9? by position? facing a raise?

seems like those kinds of questions are going to be tough and/or expensive to pull out of json blobs.
You can set indexes on certain keys with some document dbs. It wouldn't be that bad at all. I should also add that writing a query (or map/reduce job) to do that would be awesomely easy to write.

It would take literally 10 seconds to write a mongodb query to pull out what you want. I'm rusty with SQL and it would depend on how you have your tables broken up but would it be easy to do the same in postgres?

Last edited by Shoe Lace; 07-24-2012 at 10:40 PM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-24-2012 , 10:54 PM
Quote:
Originally Posted by Shoe Lace
I'm not convinced that a relational db is the best choice. I kind of think the decision to use postgres for PT or HEM was because viable alternatives didn't exist.

For starters, if he were just storing and querying hand histories then a document based DB is definitely the best fit. That will allow you to store everything in a non-rigid format.

With postgres you would need to do all sorts of multiple table grossness to try and normalize a poker hand because every hand is so different even if you limit it to 1 game. Your queries would also be monstrous and probably not very performant once your DB has grown to a huge size.

With a document based db you would just parse the HH and format the hand with json. Some hands will have a turn/river, others will not. Some hands will have 5 people to a pot, others will not. Awesome and super simple.

Then querying hands would be really easy too because you have everything broken up.
Hand history data is already highly structured, and most queries don't care about individual hands only the aggregated data.

Most common query: Total profit. Requires the total of all monies committed v all monies won for all hands. Even with a couple of million hands this isn't a particularly costly query on and rdbms. I cant comment on how expensive it would be nosql.
Next most common queries: Total profit for a subset of hands by opponent, time, session, gametype, limit and position in any combination. Depending on options you would still need to hit all hands.

You could do all of this from a document based database - I have no doubt about this. You can also do this with an rdbms, and you could argue that is what rdbms is best at.

A document based approach is better for some types of hh data, but your example above is not one of them imho.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-24-2012 , 10:57 PM
Quote:
Originally Posted by Shoe Lace
You can set indexes on certain keys with some document dbs. It wouldn't be that bad at all. I should also add that writing a query (or map/reduce job) to do that would be awesomely easy to write.

It would take literally 10 seconds to write a mongodb query to pull out what you want. I'm rusty with SQL and it would depend on how you have your tables broken up but would it be easy to do the same in postgres?
I've heard that mongodb has web scale too, you'll probably need some of that.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 03:10 AM
PostgreSQL is an amazingly powerful RDMS. There's not many queries that can't be done with it. I'm running a database with 15 tables that range from a few dozen tuples to 500,000 tuples. The database isn't well-normalized -- thanks to the database I extracted from any my dis-interest in doing so -- and there hasn't been many queries I haven't been able to create.

I've had to check things like sales by territory, product conversion, average size of orders, sample-to-sales conversions, product and customer profit analysis, and all sort of things that you'd think would be difficult at first glance, but really end up being fairly simple to do. All of this stuff is basically the same as one would face when doing poker queries. It's similar because there are many tuples with null sales, variable price-results, and tons of odd or missing information.

I'm not going to claim that noSQL is worse than PostreSQL, but I will claim that using noSQL because you think queries on PostgreSQL is hard or too slow doesn't seem like a good excuse: the difficulty would arise because you don't understand how queries work in regular SQL, and there isn't a good reason to assume that noSQL will make things better. Saying that there is a concern because you don't have a game with 9 players at a full table is like saying you can't do full-scale sales queries because one territory didn't make any sales on Monday.

I also question, since noSQL doesn't use SQL as the query language, why you'd want to deal with noSQL for smaller projects outside of academic interest. I mean, noSQL of course makes perfect sense for Twitter or Facebook, but to do data comparisons, you are forced to create your own queries anyways, and most of those queries will revolve around join, natural join, and querying sub-tables and temporary tables, which are the very foundations of SQL. I'd rather have all of this built-in, and I'm pretty sure HEM and PT would rather have this built-in as well.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 07:07 AM
I think almost all the performance is going to come down to having a well designed database. I would work with whichever dbms you are most comfortable with, and then once it's up and running experiment with supporting other dmbs which shouldn't be too difficult assuming you're not doing anything fancy.

The question is if you want to go relational or non relational I suppose, I don't have any experience with non relational databases but I would assume that relational databases would be preferable as they would allow you to build complex reports easier?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 07:48 AM
Designing a database to store HH's would be quite a fun exercise! If we had a set of data, and an objective to fetch 3 different reports. I'm in if anyone else wants to!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 08:17 AM
With the nosql approach for this there would be no real "database design". All you would need to work on is the hand parser. Let the hand parser worry about how the hand is going to be structured.

Then just write it as is directly to the database and setup any indexes.

It seems silly to try and normalize a poker hand. I don't know what the HEM or PT schemas look like but I'm guessing there's like 10 tables and most queries are a short novel of joins/etc..

It's funny how you guys with no experience using any k/v or document db say it's worse but you haven't even used one. How can you make a conclusion that something is inferior or not correct when you don't know anything about it?

Quote:
I mean, noSQL of course makes perfect sense for Twitter or Facebook, but to do data comparisons, you are forced to create your own queries anyways, and most of those queries will revolve around join, natural join, and querying sub-tables and temporary tables, which are the very foundations of SQL.
Well, with the popular document based nosql solutions no... you won't be doing joins or querying temporary tables.

All you do is filter keys and the rest of the query tool set will depend on which data store you chose. Depending on how you decide to organize your data it may require 2 queries instead of 1 (trading flexibility for performance in this case), it would come down to the needs of the application.

I know of a production level app that has over 75 million documents and a popular area of the site requires 2 queries to render a specific part of a page but between proper indexing and caching the performance hit of having to do 2 queries instead of 1 is irrelevant.

Also complexity is the devil. When you go beyond developing throw away projects you will see this. Having to maintain super long and complex SQL queries sucks balls for lack of a better term.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 11:15 AM
Quote:
Originally Posted by Gullanian
Designing a database to store HH's would be quite a fun exercise! If we had a set of data, and an objective to fetch 3 different reports. I'm in if anyone else wants to!
Isnt this pretty much what the FPDB does?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 11:17 AM
Btw, I have written a JSON parser for the FPDB hand parser. You pass it a HH file, and it returns a JSON object of the hand histories. It still needs a bit of work (about 90% there), but happy to share it.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 01:02 PM
Whats the advantage of JSON over XML?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 01:20 PM
Quote:
Originally Posted by myNameIsInga
Whats the advantage of JSON over XML?
JSON has the advantage of being a 1-1 mirror of an actual data structure, whereas XML has no 1-1 equivalent in programming languages.

This makes it very easy to manipulate JSON data with a small amount of parsing overhead. If you are using XML, you need to extract the data out and construct it into a data structure understood by your programming language, before you can actually do anything useful with that data.

XML does have some advantages. It provides a very high level of granularity, and can also be validated against a DTD.

Typically, JSON is used for simple data structures and XML is used for more enterprise type solutions where a large amount of very critical data is being transmitted, requiring strict validation.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 01:34 PM
Thanks, Wooster!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 03:53 PM
sng_jason you seem to know your stuff. I'm really lacking behind on hardware trends and so forth but is it ever going to be feasible to have HDDs at the speed of memory or close enough so we get around the whole issue of loading it in the first place?

All I know is we are basically maxed out on the clock cycle front thus more cores is what we get now. No such possibility for being maxed out on how fast memory can get and then HDDs approaching that speed?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 03:56 PM
Quote:
Originally Posted by Ryanb9
That is a very expensive book, $91 for a used copy on amazon but it looks so good >.<
Yeah it's pricey..2nd edition is essentially the same as 3rd though so if you can get that cheaper jump on it. Old school library :P

First is quite different so don't get that one
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 04:05 PM
Quote:
Originally Posted by Shoe Lace
I'm not convinced that a relational db is the best choice. I kind of think the decision to use postgres for PT or HEM was because viable alternatives didn't exist.

For starters, if he were just storing and querying hand histories then a document based DB is definitely the best fit. That will allow you to store everything in a non-rigid format.

With postgres you would need to do all sorts of multiple table grossness to try and normalize a poker hand because every hand is so different even if you limit it to 1 game. Your queries would also be monstrous and probably not very performant once your DB has grown to a huge size.

With a document based db you would just parse the HH and format the hand with json. Some hands will have a turn/river, others will not. Some hands will have 5 people to a pot, others will not. Awesome and super simple.

Then querying hands would be really easy too because you have everything broken up.
That was kind of my thought process but like I said I'm utterly out of touch with what is viable in DB land these days. I guess I get that pretty much all tracker stuff might require postgres and the likes though.

But what about say developing a poker server. I think in that case you mostly want to store the HHs, send them on request, check for cheating and maybe use it for acount balance checks or something. Seems like a document store type DB would make a bit of sense in that case
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 04:19 PM
Can anyone help me with this SQL query? I've been working on it for > 10 hours and I don't have anything that will run in a reasonable amount of time. I'm at my wits end.

Query In English: My company sends out daily emails. I want to select all users who have not clicked on one of their last 15 emails.

Tables:
Users, Emails

Relevant Columns:
users.id
emails.user_id
emails.clicks (The amount of times a user has clicked on THIS email)
emails.number (The nth time an email was sent to a user. This is zero-based and unique to each user. This increases by 1 for each email sent to a user. So if a user's highest email number is 55, I would want to see if he clicked on an email from email numbers 41-55 inclusive AKA the last 15 emails).

Here's my current query:
Code:
SELECT u.id, e.number 
FROM users u INNER JOIN emails e ON u.id = e.user_id 
GROUP BY u.id 
HAVING e.number > (MAX(e.number) - 15) AND SUM(e.clicks) = 0;
This works, but gets exponentially slower as it runs on more rows. Any ideas on how I could make this faster/better?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 04:58 PM
First of all, it appears that the users table is useless here. So, drop that. Second, you're mixing aggregate functions and non-aggregates of the same thing (e.number > (MAX(e.number) - 15)). That can make SQL behave in strange and slow ways in my experience.

Here's the hacked together query that I would try first for this. Certainly not the best possible, but it just makes two passes through the data, so it certainly won't grow exponentially:

Code:
SELECT emails.user_id FROM
(
  SELECT emails.user_id, SUM(clicks) as click_count FROM
  emails, 
  (SELECT emails.user_id, MAX(emails.number) as mail_num FROM emails GROUP BY emails.user_id) as maxes
  WHERE  maxes.user_id = emails.user_id AND maxes.mail_num <= emails.number + 15
) as foo
WHERE click_count = 0
Basically, you find the current e-mail num for each user. You count the clicks on e-mail nums after that user.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 05:00 PM
Quote:
Originally Posted by clowntable
But what about say developing a poker server. I think in that case you mostly want to store the HHs, send them on request, check for cheating and maybe use it for acount balance checks or something. Seems like a document store type DB would make a bit of sense in that case
I kinda think it would be the opposite - you'd generate HH data from a much much more detailed DB backend, since you're gonna want to know all manner of data not contained in HH files such as timings, click coordinates, IP addys and such. Take a peek in pokerstars.log.0 for the client side data on a single hand. Probably want more on the server side!

Sounds like a document store would be very useful for the storing/ caching HH files though.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 05:04 PM
Quote:
Originally Posted by _dave_
I kinda think it would be the opposite - you'd generate HH data from a much much more detailed DB backend, since you're gonna want to know all manner of data not contained in HH files such as timings, click coordinates, IP addys and such. Take a peek in pokerstars.log.0 for the client side data on a single hand. Probably want more on the server side!

Sounds like a document store would be very useful for the storing/ caching HH files though.
This is definitely the way to do it, but it's gotten a few different poker sites in trouble when they screw up the software and accidentally send administrative hand histories to players....
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 05:07 PM
Quote:
Originally Posted by NoahSD
First of all, it appears that the users table is useless here. So, drop that. Second, you're mixing aggregate functions and non-aggregates of the same thing (e.number > (MAX(e.number) - 15)). That can make SQL behave in strange and slow ways in my experience.

Here's the hacked together query that I would try first for this. Certainly not the best possible, but it just makes two passes through the data, so it certainly won't grow exponentially:

Code:
SELECT emails.user_id FROM
(
  SELECT emails.user_id, SUM(clicks) as click_count FROM
  emails, 
  (SELECT emails.user_id, MAX(emails.number) as mail_num FROM emails GROUP BY emails.user_id) as maxes
  WHERE  maxes.user_id = emails.user_id AND maxes.mail_num <= emails.number + 15
) as foo
WHERE click_count = 0
Basically, you find the current e-mail num for each user. You count the clicks on e-mail nums after that user.
I actually do need the users table. I'm using a WHERE clause on the users table as well but I left that part out. Going to try to use a sub-select like you suggest.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
07-25-2012 , 05:18 PM
depending on your engine/indexes
Code:
SELECT emails.number ORDER BY emails.number DESC LIMIT 1
may be significantly faster than a MAX().

tbh this is probably not an issue any more. I still tend to type it out of habit lol.

Actually trying to find a link for this, looks like it was optimised away in postgres 8.1, and is mostly down to the way postgres handles function calls so may not apply to any other engines, I have no idea. maybe interesting read though: http://grokbase.com/t/postgresql/pgs...d-desc-limit-1
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **
$25m Guaranteed WPM on CoinPoker
Join the action now
Daily Rewards • Splash Pots • CoinRaces
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

      
m