Hadoop / MapReduce alternatives for parallel computing?

11 points by tom_pinckney 16 years ago · 11 comments

Reader

Deepak Singh of Amazon Web Services maintains a great list of (cloud-focused) parallel computing frameworks and platforms: http://deepaksingh.net/Resources/Computing_in_the_Cloud

He's on twitter, too: http://twitter.com/mndoci

tom_pinckneyOP 16 years ago

In the name of completeness, there are also great packages like OpenMPI and OpenMP.
At least for my particular applications, though, there's either 1) a steep learning curve for programmers 2) language support issues 3) they're designed for batch processing.
Personally, I find shared memory interfaces the easiest to program when there're complicated data access patterns. But that just might be personal preference.
- amock 16 years ago
  
  Shared memory interfaces are easy to use, but they don't support the same platform as Hadoop and MapReduce because you can't efficiently split them up across machines. With a distribute system like Hadoop you can build a cluster of cheap machines and spread the computation across them. With a shared memory architecture you have to scale up with multi-million dollar machines like SGI's altix line. So if you want to be able to use a cloud of cheap computers you need something like MPI or Hadoop.
  - tom_pinckneyOP 16 years ago
    
    memcached is a poor-man's distributed shared memory system for clusters. We've been layering on top of it to try and fix deficiencies with things like client-side caching, persistence in case memcached drops objects etc.
    But I was curious if other people had similar problems and how they were solving them.
    
    jerf 16 years ago
    
    If you're trying to "fix" memcached's dropping of objects after a time, you shouldn't be using memcached. You should be using something actually designed to persist things. Turning a cache into a persistent store, or vice versa, is a dangerous game, and of the two cache -> persistent store is the worse.
    
    antirez 16 years ago
    
    Sounds like your life would be simpler with Redis if you are using memcached to take state about a computation. Atomic operations on lists and persistence are two good points about it in this context.
    
    tom_pinckneyOP 16 years ago
    
    Yeah, redis is pretty interesting.
    I think we'd have to add some sort of client side caching on top of it so that we're not fetching the same objects over and over. Tend to saturate our network if we don't do that.
    The other thing is that I think we'd have to add some sort of object migration so that when redis servers come up or go down we could re-balance where things are stored.
    
    hyuen 16 years ago
    
    I have tried to use hadoop at lizten.in, but it seems overkill specially for small deployments like the basic slicehost option. I tried memcached and sure, it is a good fix for some page rendering issues, but for elements that you want to cache forever (or a relatively long time), I prefer to create a blob table in mysql and store the entries as if it was a key/value pair. I am sure there are more sophisticated persistent datastores, but I guess the idea is the same. Maybe with some high performance storage, such as SSD's and something like berkeleydb it would be feasible to keep a relatively big cache.

wmf 16 years ago

What do you think about VoltDB? Have you applied for the beta?

tom_pinckneyOP 16 years ago

Anything Stonebreaker does is interesting, but I don't know enough about VoltDB to have anything to say. Definitely curious where it goes.

Settings

Hadoop / MapReduce alternatives for parallel computing?

Keyboard Shortcuts