Just wanted to chime in on the JSON and noSQL discussion.
-I'm a statistician currently using couchdb, a document database for a research project. It is definitely what you want to use for statistical analysis
software
-I was an online professional poker player for 6 years
-Reason to use couchdb is MapReduce- this is just a function that visits all your documents,
-I have not yet tried the to use Mr_Wooster's JSON parser but will do so. I saved many of my hand histories from my years of play
(1 million Pot Limit Omaha hands, 100,000 hands each from 7 other games)
-With Couchdb you can just write Map and Reduce queries in Javascript
-Couchdb takes up 10X the storage space of MySQL and the queries are initially slower, but are fast once you've done them once
Here's an example from my current database
Code:
function(doc) {
if(doc.permalink && doc.acquisition.price_amount) {
doc.offices.forEach(function(offices) {
if(offices.longitude && offices.latitude) {
emit(doc.permalink, doc.acquisition.price_amount+'\t'+doc.acquisition.acquired_year+'\t'+ offices.longitude+'\t'+offices.latitude)
}
});
}
}
This creates key/value views . Then in python you can
Code:
server = couchdb.Server('http://user:password@localhost:5984')
db = server['database_test']
#returns the keys and values from couch db view
company_location = db.view('company/name_location_acquired')
f = open('C:/name_location_acquired.txt','w')
for r in company_funding:
try:
f.write('%s\t%s\n' % (r.key, r.value))
except:
continue
f.close()
-What noSQL can do is have a flexible schema that can answer questions a RDMS just couldn't
-The worrying thing is that most new technologies in order to be successful in the market have to be introduced as cheaper, target a group that is currently unserved, or as a hybrid of the old and new
-NoSQL in many ways is NOT CHEAPER(maybe on development time)
-it will be more expensive hard drive space wise and learning time for users and other ways to use a noSQL database but there may be functionality that can give insights a SQL database currently doesn't
-The solution to this may a hybrid application that uses sql for standard features and noSQL for new analysis features.