|
|
| Poker Theory General poker theory |
04-04-2009, 02:35 PM
|
#1
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Hand History Database for Research (Beta)
Note: This is pre-announcement, so if you decide to go for this at this point, you would partially be a "beta-tester" for the whole process, software, and data involved.
In a nutshell
We are happy to announce that we are giving away free, for research purposes, nearly one billion real money poker hand histories, played on some major PokerSites this and last year, plus supporting software to read these.
In details
We want to provide these hands strictly for research. Therefore, one has to be published author of at least one Poker AI conference paper, and it should be possible for us to verify that, in order to be eligable to obtain these hands. Please PM me here, or e-mail me at findbg@gmail.com for further details.
The billion hands database contains Limit, Pot Limit and No Limit Texas Hold'em and Omaha cash games hand histories, limits from NL2 to NL100000. The inflated size of these hands is in the range of one terrabyte. Therefore the hands are offered in proprietary parsed format, together with Java library for reading the hands (possibility to export them in plain text will come later). The source code of the Java library are offered for free as well (Under GPL v3).
This format, as well as our own tools makes it possible to run analysis over hundreds of millions of hands on a mainstream PCs, where otherwise one might need to spend upto hundred thousands bucks for setup able to handle the same task via conventional RDBMS.
We support no opponent profiling policy. Therefore we are taking measures to prevent usage of this database for opponent profling. We are obfuscating the name of the pokersites from which these hands were obtained, tablenames, player Ids and hand Ids. We are also randomly modifying the time the hand was played (time is shifted with difference of some seconds, to still make possible the extraction of time-dependant player patterns).
Note: The end user licence agreements (EULA) of some of the poker sites from which these hands were obtained are against datamining. We have inquired these pokersites for permission to redistribute these hands in the described manner. We have no response from all of the sites at this time, but we have not been rejected either. Despite that we never formally and techically agreed to the EULAs of these pokersites, we still believe we did what is possible to comply with the intention and spirit of this EULA and we will further cooperate with the poker sites if they require us to do so, to eliminate doubts, if any, that the hands will be used exclusively for research purposes, but not to augment real money play.
If you chose to apply, please fillout the below form, and send it back by PM or e-mail.
Application process
Please provide the following information. - Name:
- University:
- Position:
- E-mail /university email/:
- Paper published /need to respond to verification send to Author's e-mail of that paper/:
- Home page:
- Purpose of request:
Alternative ways to verify you academic credibility would work as well, but will not be lighter than the above.
Please, also indicate that you agree with the following terms: - All hand histories are provided to you for personal use. You must not redistribute them to third parties under any circumstences. You may use them personally without restrictions or obligations (We would be happy if you cite pokerai.org/pf3 as source of these hands in your academic work).
- All software for parsing the hand history database is provided under GPL v3 license. Any redistribution is bounded by this license.
I agree to these conditions/Type yes, and your first name as signature/:
*****
Regards,
Indiana && PeppaPig
|
|
|
04-04-2009, 05:09 PM
|
#2
|
|
Pooh-Bah
Join Date: Oct 2003
Location: SEC
Posts: 5,060
|
Re: Hand History Database for Research (Beta)
Indiana, are you actually affiliated with a research institution? Here I thought you were just a botter.
|
|
|
04-05-2009, 04:31 AM
|
#3
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
I am, but this has not relation to the above initiative, or poker in general.
|
|
|
04-08-2009, 04:04 PM
|
#4
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
Here is an example of what kind of things you can easily do (it took us just 10 minutes to implement this example).
This and other examples are just part of the software distribution.
|
|
|
04-09-2009, 01:28 PM
|
#5
|
|
enthusiast
Join Date: Feb 2009
Location: the hinterlands
Posts: 61
|
Re: Hand History Database for Research (Beta)
Interesting.
1. Why must one be published? What if you're a retired computer programer who's interested in poker and likes playing with large amounts of data?
2. Do the hand histories include player chat? (Observer chat is not as interesting.)
3. Is there any API other Java? (perhaps C or, dare I say, Fortran?)
4. Is your canonical HH format available? Do you have a transform into XML? (Not the proprietary compressed format, but what it expands into.)
5. Does the database of player names include country?
6. Do you also have HHs from tournaments? If so, will that be forthcoming?
|
|
|
04-09-2009, 03:57 PM
|
#6
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
Quote:
Originally Posted by sapientia
Interesting.
1. Why must one be published? What if you're a retired computer programer who's interested in poker and likes playing with large amounts of data?
2. Do the hand histories include player chat? (Observer chat is not as interesting.)
3. Is there any API other Java? (perhaps C or, dare I say, Fortran?)
4. Is your canonical HH format available? Do you have a transform into XML? (Not the proprietary compressed format, but what it expands into.)
5. Does the database of player names include country?
6. Do you also have HHs from tournaments? If so, will that be forthcoming?
|
1- We want to make sure that these hands are used only for research, and not to augment real money poker play. Being a published author sufficiently satisfies this. We might come with alternative ways to verify this, but this has to make it reasonably sure that it is the case. Being retired programmer isn't good enough for that, apologies for this.
2- No. Somewhere in the future we can include that separately, but for now we don't see good reason to do it.
3- It is only Java for now. I will invite that gets the software to work on a C# port.
4- It will expand to format used by popular sites, or just proprietary one that looks like one used by popular sites. Anything else (XML, etc.) is an option as well.
5- No. It does not even include player names. Players names are obfuscated to numbers from 1 to 1000000+.
6- No. Might be forthcoming, but for the next few weeks/months it is cash games only.
|
|
|
04-09-2009, 07:22 PM
|
#7
|
|
enthusiast
Join Date: Feb 2009
Location: the hinterlands
Posts: 61
|
Re: Hand History Database for Research (Beta)
Quote:
Originally Posted by indianaV8
1- We want to make sure that these hands are used only for research, and not to augment real money poker play. Being a published author sufficiently satisfies this. We might come with alternative ways to verify this, but this has to make it reasonably sure that it is the case. Being retired programmer isn't good enough for that, apologies for this.
|
Well, first of all, I don't play cash games. Besides, given all the obfuscation, I'm not sure how the HHs can be used to augment real money play. Perhaps those doing research using this data, currently published authors or not, could provide results for publication on pokerai or some other poker research site -- maybe the Alberta group would be interested in hosting results.
Quote:
Originally Posted by indianaV8
2- No. Somewhere in the future we can include that [chat] separately, but for now we don't see good reason to do it.
|
But this is an ideal research topic. How does chat, presence and quantity, correlate with chatter's results? How about with chattee's results? How much does quantity of chat correlate with aggression? Does a prolific chatter spur chat from the rest of the table? And if so, does that cause a player who's not prone to chat to lose? Etc. One thing I've learned from 20+ years of research -- never throw data away or make it difficult to get at, always have it at hand.
Quote:
Originally Posted by indianaV8
3- It is only Java for now. I will invite that gets the software to work on a C# port.
|
ok.
Quote:
Originally Posted by indianaV8
4- It will expand to format used by popular sites, or just proprietary one that looks like one used by popular sites. Anything else (XML, etc.) is an option as well.
|
ok.
Quote:
Originally Posted by indianaV8
5- No. It does not even include player names. Players names are obfuscated to numbers from 1 to 11000+) 000+.
|
Right. But is there a table with player_id (number from 1..1000000+) and known player info -- country, first date played, last date played, number of sessions, number of hands, each of the previous by blind level, number of sites played on, etc.
Quote:
Originally Posted by indianaV8
6- No. Might be forthcoming, but for the next few weeks/months it is cash games only.
|
ok. I hope you have tourney results, so as to map player_id to standings and winnings.
|
|
|
04-09-2009, 08:30 PM
|
#8
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
1- The problem here is not to cluster the types of research (or researchers) but to come with reliable ways to verify the usage of these hands. I will eventually think how to enable this in the future to more people (e.g. provide people with the software and small sample database, and if they want to develop examples that run over the full database, they have to submit them, and we'll get back the results to them).
2- I agree, this is good point. There are however further issues with distribution of the chat (privacy, encoding of hands becomes much bigger - currently we encode one hand in 75 bytes on average). Maybe I can think of providing summary information about the chat per player.
|
|
|
04-21-2009, 09:38 AM
|
#9
|
|
newbie
Join Date: Apr 2009
Posts: 29
|
Re: Hand History Database for Research (Beta)
Quote:
Originally Posted by indianaV8
Here is an example of what kind of things you can easily do (it took us just 10 minutes to implement this example).
This and other examples are just part of the software distribution.

|
Indiana,
Great cause, and a fine solution to present data without (IMO) breaching the EULAs.
Could you please post the same chart for profitable players.
To define profitability you can simply choose [monthly p&l]>zero or [yearly p&l]>zero, the yearly one is much more reliable of course...
I have been reading 2+2 but never posted before, perhaps it is time I start to...
Thank you
SJ
|
|
|
04-21-2009, 02:52 PM
|
#10
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
Quote:
Originally Posted by sloppyJohn
Indiana,
Great cause, and a fine solution to present data without (IMO) breaching the EULAs.
Could you please post the same chart for profitable players.
To define profitability you can simply choose [monthly p&l]>zero or [yearly p&l]>zero, the yearly one is much more reliable of course...
I have been reading 2+2 but never posted before, perhaps it is time I start to...
Thank you
SJ
|
Hi,
Yes, I can. In fact I had this in mind, but I am delayed for various reasons, I still plan to do this.
To find "profitable" players is not that easy to define. Many many (most) of the people play very little hand - so this is close to 50/50. If you take just the long term players (that played over 100k hands, e.g.) you can argue that players will play that many hands only if they are winners - and you get again buggy statistics.
So what you can do? I'm sure this was discussed already on 2+2 but I don't have the time to dig it out. If someone points me or summarize it what is the best way to calculate winning players %, I can do that.
Otherwise, what I came with is the following:
1) Graph for players clustered to how many hands they played (this has all the issues discussed above)
2) Amount of hands played by winning players as % of all hands played. I'm not sure if this is improvement and solves the above issues, as it does not take into account that for small amount of played hands there are many small winners and approx the same amount "slightly bigger" losers.
3) % of players that won 95% of all money won. For example, if we know that 10% of the players won 90% of the money, that's something, although I don't believe this would be the figure. You still have the "long tail" of winning players that played little hands
Finally - some combination of the above. E.g. the above approaches, but on players that have over 2k, or 5k, or 10k hands, to ensure at least some statistical significance.
|
|
|
04-22-2009, 01:54 AM
|
#11
|
|
Carpal \'Tunnel
Join Date: Mar 2006
Posts: 18,584
|
Re: Hand History Database for Research (Beta)
What are you researching exactly?
|
|
|
04-22-2009, 03:51 AM
|
#12
|
|
newbie
Join Date: Apr 2009
Posts: 29
|
Defining a profitable player
How about the following definition:
If someone has been successful in maintaining profit for over 50k hands they are considered profitable (this includes breakeven player who apperantly end up earning rakeback). this is a soft definition and will include errors, for example someone with 1$ profit after 50k hands, which he got after earning 1,500$ in one hand. but I think that for the purpose of understanding the behaviour of profitable players it sufices.
A more percise solution could include relative BB profit. for this we need to calculate avg. BB for each player ((∑#hands x each BB ever played)/Total #hands). than we can realize common rakeback (~27%) per avg. BB and define a losing/breakeven/profitable player by BB/10,000 hands in categories of avg. BB sizes.
I really feel that the first option is good enough for this goal.
What do you think?
John
P.S.
If you accept the soft definition of profitable players, this will also figure up the estimation of profitable players % in the population of players
|
|
|
04-22-2009, 02:29 PM
|
#13
|
|
old hand
Join Date: Dec 2006
Location: Stuttgart
Posts: 1,385
|
Re: Hand History Database for Research (Beta)
Just to understand, you want % of profitable players that played over 50K hands divided by all players with over 50K hands? Or it is rather the profitable players over 50k hands as part of the total players (no matter how many hands they played)?
Keep in mind that the amount of players that played less than 50k hands is much more than the one that played more than that (for the samples that I checked so far).
@Lego05 - Why do you ask?
|
|
|
04-22-2009, 03:03 PM
|
#14
|
|
grinder
Join Date: Jun 2008
Posts: 685
|
Re: Hand History Database for Research (Beta)
this is extremely interesting. however, on the sites you're tracking, what percentage of hands are you tracking? what sites are you tracking? since when have you been tracking?
thanks
|
|
|
04-23-2009, 04:59 AM
|
#15
|
|
newbie
Join Date: Apr 2009
Posts: 29
|
Re: Hand History Database for Research (Beta)
Indiana,
these are two different searches:
1) # hands profitable players play on avg. every month
2) something that allways intrigues the industry, % profitable players out of the poker players population
I think the begining to both of these is in defining a profitable player, statistically wise.
John
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:02 PM.
|