Open Side Menu Go to the Top
Register
Concurrency: Good Links and Sources Please Concurrency: Good Links and Sources Please

04-24-2013 , 02:43 PM
I'm writing some concurrency programs at the moment, but feel fairly amateur at the task. So, I'm looking for links, books, and code about concurrency or what I consider some some related ideas: Asynchronos Programming, Non-Blocking code, Cluster Management, Parallel Computing.

Right now the script I'm writing is in python, but I'm interested in the ideas, and know enough of other languages (Java, C, C++, R, Javascript, Ruby) that I could port any good ideas from other languages.
Concurrency: Good Links and Sources Please Quote
04-24-2013 , 03:44 PM
This is a great video series:C++11 Concurrency
It teaches some C++ tricks, but he covers the core concepts of concurrent programming very well.
Concurrency: Good Links and Sources Please Quote
04-24-2013 , 04:01 PM
from what i've read, multi-threading is hard to be done properly

i was recently introduced to ZeroMQ, which seems to be designed as a messaging protocol, but claims to work great for concurrency

http://www.zeromq.org/
Concurrency: Good Links and Sources Please Quote
04-24-2013 , 07:55 PM
Great topic. I'll confess I haven't really looked at the C++ 11 concurrency model yet. Concurrency is broad topic what do you want to do exactly?
Concurrency: Good Links and Sources Please Quote
04-24-2013 , 10:48 PM
I'm in exploration mode at the moment. I don't really know exactly what I'm looking for but like that judge said about porn "I know when i see it".

I'm using gevent (a python library) to coordinate some of the tasks at the moment, but I confess to not really having a good grasp on many of the concepts. I'm more looking at things that I think might work and trying to change a few variables. I guess I'm looking for good structural choices to make and likely errors that I'll need to handle, but I'm also ready to be surprised by the Rumsfeld stuff "unknown unknowns".

The job the script I'm writing does is to farm out a computationally heavy clustering tasks to nodes on ec2 instances. Each one is listening to a redis database that is updated with events like user A->views->listing B. The goal is to create a flexible and fast recommendation engine based on actions (not words or reviews).

I'm knowingly not using hadoop for this portion, though I plan to use it later, because the aim at the moment is have it update user recommendations in under 1 minute (which so far is working).

Thanks for the links so far.

Last edited by LA_Price; 04-24-2013 at 10:53 PM.
Concurrency: Good Links and Sources Please Quote
04-25-2013 , 06:40 AM
Thanks. So if I understand correctly, looking at this at a high level, it is highly desirable to do the computations concurrently, you have a time requirement you'd like to meet, you have synchronization requirements for data and among the tasks. Am I on the right track?
Concurrency: Good Links and Sources Please Quote
04-25-2013 , 06:56 AM
I'd really have to recommend Hadoop and/or a higher level language like pig. You're going to lose the 1 minute requirement but you could still get near-real time results.

Writing all of this logic seems really expensive. And even when you have the basics down you might need to handle things like a bad data row, a bad task, a bad machine, etc screwing stuff up. The error handling in these cases is ridiculous.

If you need the 1 minute response time I agree you probably don't want Hadoop. But if you can relax that a bit PM me and I can tell you a bit more about one option that should be a lot easier.
Concurrency: Good Links and Sources Please Quote
04-25-2013 , 09:10 AM
Quote:
Originally Posted by jjshabado
I'd really have to recommend Hadoop and/or a higher level language like pig. You're going to lose the 1 minute requirement but you could still get near-real time results.

Writing all of this logic seems really expensive. And even when you have the basics down you might need to handle things like a bad data row, a bad task, a bad machine, etc screwing stuff up. The error handling in these cases is ridiculous.

If you need the 1 minute response time I agree you probably don't want Hadoop. But if you can relax that a bit PM me and I can tell you a bit more about one option that should be a lot easier.
The 1 minute requirement is "soft" I'm guessing. If it comes close to meeting it most of the time would be ok is my take.
Concurrency: Good Links and Sources Please Quote
04-25-2013 , 09:46 AM
Me too. The problem with Hadoop is that it has a lot of overhead so you'll never get a job to run in a minute. But you can likely have a very robust solution that can run in ~10 minutes with little to no dev/ops work.
Concurrency: Good Links and Sources Please Quote
04-25-2013 , 10:31 AM
Adios yes you are on the right track.

There's some redundancy in the design of this program, in that if any write fails that's ok--recommendations will be updated at some later time.

The problem is that for any user the calculations for a given user have to be synchronous, but once clustered any given user can be calculated asynchronously (but is limited by processing cores).

So suppose that there are 10,000 users with some action history. 1000 get updated. k-means clustering of the 10,000 is very fast (less than 5 seconds typically).

The clustering logic (statistical reasoning really) that creates recommendations off of the clustering is slow because it has to lookup each users information one after the other and calculate the "users who viewed this also view this". The lookup could be asynchronous but the processing again is limited by computer cores.

My solution to this is to send two machines the 10,000 user action history and an assignment of users to calculate (say 500 from the 1000 updated). With redis this is quite fast. Now they can each be updated in <1 minute, where before they had to wait till the previous user was done processing.

@jjshabado I'll likely use Hadoop for this later task that "reconciles" each of these clusterings into a reduced set of groups. Bascially each clustering, if I were to try and combine them, doesn't make sense outside of its own user set and calcuations. Toby Segaran wrote about this in Beautiful Data .I have a script that does this "reconcilation" and is perfect for Hadoops' time window.
Concurrency: Good Links and Sources Please Quote
04-26-2013 , 12:36 PM
That go presentation with the gophers is pretty solid.
Concurrency: Good Links and Sources Please Quote
05-08-2013 , 05:05 PM
LA_Price - Not sure if this is useful to you or not but figured I'd give you the link: Recommender Systems for Free
Concurrency: Good Links and Sources Please Quote
05-10-2013 , 09:18 AM
Quote:
Originally Posted by jjshabado
LA_Price - Not sure if this is useful to you or not but figured I'd give you the link: Recommender Systems for Free

Thanks jj, yes I'm familiar with Hilary Mason, having watched her excellent intro to machine learning on O'Reilly probably a year and a half a ago. I've also read Drew Conway's Machine Learning for Hackers. They're smart coders.

That being said we're being pulled, based on the data and what's proving to more effective in recommendation tests, into an event-driven system. This means that for the most part hadoop doesn't really fit well, and the recommender is actually a small part of delivering recommendations ( a flexible and time aware data model, concurrent program execution, fast response to incoming events are equally important).

Imagine the difference as two different jobs.

In one job you are trying to design a poker bot to decide what to do at the poker table, and are tracking recent history to decide actions (is someone on tilt?). This system should be designed to be event driven.

In another scenario you are tracking monthly sales for a restaurant to help decide food order size. There are some things (like sporting events) that cause fluctuations, but there's a schedule ahead of time, so you know about those things. The sales fluctuate monthly in predictable long term patterns. This system is perfect for Hadoop.
Concurrency: Good Links and Sources Please Quote

      
m