Open Side Menu Go to the Top
Register
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** ** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD **

05-15-2011 , 07:26 PM
Only one of your comparisons actually has a concrete reason. I can understand the bad blood with IE, but you should try to at least explain what you hate about Chrome or Safari.

You say FF4 is slow, buggy and ugly. It hasn't crashed once on me and I've been using it since it was released. When I occasionally use Chrome, I don't notice any difference in speed. As for it being ugly, you can make it look very close to what it looked like before, but I do think the corners are now slightly rounder.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-15-2011 , 08:22 PM
I switched to Chrome for a while, but it doesn't like my iMac. Runs fine on the Air though. Weird.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-15-2011 , 08:28 PM
I use Firefox all day as my testing browser, and the only time it has crashed was actually caused by Firebug being unable to handle the lolhuge DOM changes that were being done in Javascript, and crashing in the HTML tab update
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 09:05 AM
Just in case some of you don't read Dilbert, today's is topical: http://www.dilbert.com/strips/comic/2011-05-16/
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 11:24 AM
I could learn python and scrape(?) the title and first picture of all wikipedia articles into a database, correct?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 11:26 AM
Yes you can, but scraping is always quite messy, and also be very careful you do not breach wikipedias content T&C. I think you are allowed to redistribute content but only with proper attribution, which a lot of content scraping sites don't do.

If you are scraping it for the purpose of redistribution, ask yourself why? As a means of gaining more traffic to your site, as well as being a slightly blackhat way of doing it which is very annoying for visitors, be aware that google is currently making big efforts to penalise these sorts of sites (look at the recent Panda update). Because all those sites are complete junk with absolutely no value to anyone.

However if it's for some other academic use such as data analysis go ahead I say.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 12:29 PM
Thanks, Gullanian.

What do you mean by messy?

I am hoping to learn a language this summer and python seems a good place to start. My challenge to myself was to scrape wiki article titles and the first pic into a database since each article has a standard /wiki/ address (ex: http://en.wikipedia.org/wiki/Poker) though not all have pictures.

Would using python and mysql be the easiest for a beginner?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 12:32 PM
By messy all I mean really is hard to maintain, depending on the complexity of a scraper, and what measures sites take to make it more difficult, you can quite quickly end up with hard to maintain code.

I don't have experience with Python, nor much with MySQL I'm afraid, but from what I've read they should be ok.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 01:35 PM
Thanks again. Obv I have no idea what I'm doing.

One of my endgames is going to be scraping a site such as 5dimes (sportsbetting) and sorting all bets available by time into a database (action junkie x bankroll management = entertainment at a do nothing summer job) if that makes any difference.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 01:46 PM
Be careful with what you scrape, I imagine something like a betting site will probably not be too happy about people scraping odds as they probably sell that information. I'm not too sure on the legality of it so you might want to check it out.

Anyway if you do do it, it does sound fun. Perhaps find a free API as an alternative, there may be a free time delayed one for sports odds.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 02:13 PM
Betfair seem to encourage bots (they're good for liquidity) so they probably allow scraping or maybe even expose an API.

For the sportsbetting project Python + MySQL sounds great. The wikipedia one sounds like you don't really need a database since you are only collecting two pieces of information for each record, it would be simplest to write it in pure python.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 02:14 PM
Betfair has a free time-limited API you can look into. Not hard to navigate around.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 06:10 PM
Thanks again, everybody. It sounds like I should skip over the wikipedia thing. (Or try it first?)

As an American with 5dimes as my main book (forget lineshopping for a second, Kyle) if I wanted to be able to see every bet they offered sorted by time what would be the best way to do it? I don't need to see line changes, just to sort by what time the game starts.

For example: say its 9 am and I want to know what bets are available in the next hour.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 06:15 PM
Using a language's HTML DOM parser. 5d does not offer an XML/JSON feed of their data, so you will be scraping their screen to do this. This requires authentication to see some of their lines, which can be annoying.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 06:27 PM
but your avatar! postgresql > mysql lol, particularly when dealing with timestamps.

I've done a few of these, nothing so comprehensive as a multi-sportsbook line scraper / arb finder or whatever i'm imagining.

Python is probably fine, and a project like this is gonna be good for learning a good chunk of a language. I've always used PHP since it's CURL functions (for scraping html) and DOM parser or regex are great.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 06:36 PM
My latest SO ticket about PHP's mail() function in combination with redirected subdomains can be found here, if anyone wants to take a crack at it:

http://stackoverflow.com/questions/6...king-strangely
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 08:22 PM
The obvious question here is whether or not your sendmail (or whatever it's using) is set up & working correctly. IM(limited)E, mail() fails when the underlying configuration is wrong
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-16-2011 , 08:30 PM
Apparently had something to do with a server config on my host provider's end.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 01:25 AM
I kind of feel like if you're scraping a large amount of information off of a site you should probably be using a document store instead of a relational db.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 04:12 AM
Quote:
Originally Posted by kyleb
My latest SO ticket about PHP's mail() function in combination with redirected subdomains can be found here, if anyone wants to take a crack at it:

http://stackoverflow.com/questions/6...king-strangely.
Quote:
This question was voluntarily removed by its author
Something wrong with your question?
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 04:25 AM
He's just deleted it because he probably feels the question doesn't serve much purpose as he found the solution apparently according to the comments.

I am an uber fish when it comes to mailservers and networks, they give me massive headaches! Nothing more frustrating than a mailserver not sending emails, networks not working for no reason, makes my head asplode!
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 07:08 AM
I tried to set up my own mailserver once. Eventually got it working, and it ran for a couple of months until it died. + it was so badly configured.

Now I just use google apps, and point mail.mydomain.com to google.

This means I have me@mydomain.com email, hosted by google. Works amazingly well and its completely free.

http://www.google.com/apps/intl/en-GB/group/index.html
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 08:11 AM
Quote:
Originally Posted by Shoe Lace
I kind of feel like if you're scraping a large amount of information off of a site you should probably be using a document store instead of a relational db.
It depends how you want the data.

If you're looking for specific data points from the page (ie, just title & image), then that belongs in a database.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 08:14 AM
For something as described I'm happy putting it all into a DB until it starts getting really really big. Using FSO to store this sort of data would just make it more complicated that it needs to be probably.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote
05-17-2011 , 10:07 AM
I'm using FF4, good enough for everything and I think it's better than FF3. It's also gotten a bit cleaner with the minitabs and revamped menu.

Quote:
If you're looking for specific data points from the page (ie, just title & image), then that belongs in a database.
Images in a database is kind of meh though. I'd probably just have an images directory on the FS and use a python script to rename the images to wikiarticlename.imagetype i.e. you'd have poker.jpg; some_long_stuff.png etc

Last edited by clowntable; 05-17-2011 at 10:17 AM.
** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** Quote

      
m