** UnhandledExceptionEventHandler :: OFFICIAL LC / CHATTER THREAD ** - Page 15 - Computer Technical Help

Only one of your comparisons actually has a concrete reason. I can understand the bad blood with IE, but you should try to at least explain what you hate about Chrome or Safari.

You say FF4 is slow, buggy and ugly. It hasn't crashed once on me and I've been using it since it was released. When I occasionally use Chrome, I don't notice any difference in speed. As for it being ugly, you can make it look very close to what it looked like before, but I do think the corners are now slightly rounder.

Quote

05-15-2011 , 08:22 PM

#352

kerowo

lolcat

Join Date: Nov 2005 Posts: 37,190

I switched to Chrome for a while, but it doesn't like my iMac. Runs fine on the Air though. Weird.

Quote

05-15-2011 , 08:28 PM

#353

Zurvan

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 36,255

I use Firefox all day as my testing browser, and the only time it has crashed was actually caused by Firebug being unable to handle the lolhuge DOM changes that were being done in Javascript, and crashing in the HTML tab update

Quote

05-16-2011 , 09:05 AM

#354

Zurvan

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 36,255

Just in case some of you don't read Dilbert, today's is topical: http://www.dilbert.com/strips/comic/2011-05-16/

Quote

05-16-2011 , 11:24 AM

#355

Wamp

journeyman

Join Date: Feb 2011 Posts: 364

I could learn python and scrape(?) the title and first picture of all wikipedia articles into a database, correct?

Quote

05-16-2011 , 11:26 AM

#356

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

Yes you can, but scraping is always quite messy, and also be very careful you do not breach wikipedias content T&C. I think you are allowed to redistribute content but only with proper attribution, which a lot of content scraping sites don't do.

If you are scraping it for the purpose of redistribution, ask yourself why? As a means of gaining more traffic to your site, as well as being a slightly blackhat way of doing it which is very annoying for visitors, be aware that google is currently making big efforts to penalise these sorts of sites (look at the recent Panda update). Because all those sites are complete junk with absolutely no value to anyone.

However if it's for some other academic use such as data analysis go ahead I say.

Quote

05-16-2011 , 12:29 PM

#357

Wamp

journeyman

Join Date: Feb 2011 Posts: 364

Thanks, Gullanian.

What do you mean by messy?

I am hoping to learn a language this summer and python seems a good place to start. My challenge to myself was to scrape wiki article titles and the first pic into a database since each article has a standard /wiki/ address (ex: http://en.wikipedia.org/wiki/Poker) though not all have pictures.

Would using python and mysql be the easiest for a beginner?

Quote

05-16-2011 , 12:32 PM

#358

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

By messy all I mean really is hard to maintain, depending on the complexity of a scraper, and what measures sites take to make it more difficult, you can quite quickly end up with hard to maintain code.

I don't have experience with Python, nor much with MySQL I'm afraid, but from what I've read they should be ok.

Quote

05-16-2011 , 01:35 PM

#359

Wamp

journeyman

Join Date: Feb 2011 Posts: 364

Thanks again. Obv I have no idea what I'm doing.

One of my endgames is going to be scraping a site such as 5dimes (sportsbetting) and sorting all bets available by time into a database (action junkie x bankroll management = entertainment at a do nothing summer job) if that makes any difference.

Quote

05-16-2011 , 01:46 PM

#360

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

Be careful with what you scrape, I imagine something like a betting site will probably not be too happy about people scraping odds as they probably sell that information. I'm not too sure on the legality of it so you might want to check it out.

Anyway if you do do it, it does sound fun. Perhaps find a free API as an alternative, there may be a free time delayed one for sports odds.

Quote

05-16-2011 , 02:13 PM

#361

RoundTower

ɹǝʍoʇpunoɹ

Join Date: Feb 2005 Posts: 14,563

Betfair seem to encourage bots (they're good for liquidity) so they probably allow scraping or maybe even expose an API.

For the sportsbetting project Python + MySQL sounds great. The wikipedia one sounds like you don't really need a database since you are only collecting two pieces of information for each record, it would be simplest to write it in pure python.

Quote

05-16-2011 , 02:14 PM

#362

anononon

Guest

Posts: n/a

Betfair has a free time-limited API you can look into. Not hard to navigate around.

Quote

05-16-2011 , 06:10 PM

#363

Wamp

journeyman

Join Date: Feb 2011 Posts: 364

Thanks again, everybody. It sounds like I should skip over the wikipedia thing. (Or try it first?)

As an American with 5dimes as my main book (forget lineshopping for a second, Kyle) if I wanted to be able to see every bet they offered sorted by time what would be the best way to do it? I don't need to see line changes, just to sort by what time the game starts.

For example: say its 9 am and I want to know what bets are available in the next hour.

Quote

05-16-2011 , 06:15 PM

#364

anononon

Guest

Posts: n/a

Using a language's HTML DOM parser. 5d does not offer an XML/JSON feed of their data, so you will be scraping their screen to do this. This requires authentication to see some of their lines, which can be annoying.

Quote

05-16-2011 , 06:27 PM

#365

_dave_

_Pooh_Bah_

Join Date: Feb 2005 Posts: 13,146

but your avatar! postgresql > mysql lol, particularly when dealing with timestamps.

I've done a few of these, nothing so comprehensive as a multi-sportsbook line scraper / arb finder or whatever i'm imagining.

Python is probably fine, and a project like this is gonna be good for learning a good chunk of a language. I've always used PHP since it's CURL functions (for scraping html) and DOM parser or regex are great.

Quote

05-16-2011 , 06:36 PM

#366

anononon

Guest

Posts: n/a

My latest SO ticket about PHP's mail() function in combination with redirected subdomains can be found here, if anyone wants to take a crack at it:

http://stackoverflow.com/questions/6...king-strangely

Quote

05-16-2011 , 08:22 PM

#367

Zurvan

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 36,255

The obvious question here is whether or not your sendmail (or whatever it's using) is set up & working correctly. IM(limited)E, mail() fails when the underlying configuration is wrong

Quote

05-16-2011 , 08:30 PM

#368

anononon

Guest

Posts: n/a

Apparently had something to do with a server config on my host provider's end.

Quote

05-17-2011 , 01:25 AM

#369

Shoe Lace

Pooh-Bah

Join Date: Sep 2004 Posts: 3,654

I kind of feel like if you're scraping a large amount of information off of a site you should probably be using a document store instead of a relational db.

Quote

05-17-2011 , 04:12 AM

#370

MrWooster

veteran

Join Date: Mar 2007 Posts: 2,984

Quote:

Originally Posted by kyleb

Quote:

This question was voluntarily removed by its author

Something wrong with your question?

Quote

05-17-2011 , 04:25 AM

#371

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

He's just deleted it because he probably feels the question doesn't serve much purpose as he found the solution apparently according to the comments.

I am an uber fish when it comes to mailservers and networks, they give me massive headaches! Nothing more frustrating than a mailserver not sending emails, networks not working for no reason, makes my head asplode!

Quote

05-17-2011 , 07:08 AM

#372

MrWooster

veteran

Join Date: Mar 2007 Posts: 2,984

I tried to set up my own mailserver once. Eventually got it working, and it ran for a couple of months until it died. + it was so badly configured.

Now I just use google apps, and point mail.mydomain.com to google.

This means I have me@mydomain.com email, hosted by google. Works amazingly well and its completely free.

http://www.google.com/apps/intl/en-GB/group/index.html

Quote

05-17-2011 , 08:11 AM

#373

Zurvan

Carpal \'Tunnel

Join Date: Apr 2005 Posts: 36,255

Quote:

Originally Posted by Shoe Lace

I kind of feel like if you're scraping a large amount of information off of a site you should probably be using a document store instead of a relational db.

It depends how you want the data.

If you're looking for specific data points from the page (ie, just title & image), then that belongs in a database.

Quote

05-17-2011 , 08:14 AM

#374

Gullanian

Carpal \'Tunnel

Join Date: Dec 2006 Posts: 14,014

For something as described I'm happy putting it all into a DB until it starts getting really really big. Using FSO to store this sort of data would just make it more complicated that it needs to be probably.

Quote

05-17-2011 , 10:07 AM

#375

clowntable

Carpal \'Tunnel

Join Date: Jun 2006 Posts: 45,557

I'm using FF4, good enough for everything and I think it's better than FF3. It's also gotten a bit cleaner with the minitabs and revamped menu.

Quote:

If you're looking for specific data points from the page (ie, just title & image), then that belongs in a database.

Images in a database is kind of meh though. I'd probably just have an images directory on the FS and use a python script to rename the images to wikiarticlename.imagetype i.e. you'd have poker.jpg; some_long_stuff.png etc

Last edited by clowntable; 05-17-2011 at 10:17 AM.

Quote

Page 15 of 1603

First

5 10 11 12 13 14 15 16 17 18 19 20 25 35 65 115 215 515 1015

Last

Post Reply Subscribe

...

Page 15 of 1603

First

5 10 11 12 13 14 15 16 17 18 19 20 25 35 65 115 215 515 1015

Last