Dryad - Distributed computing infrastructure from Microsoft Research
research.microsoft.comI wonder if the benefits of this product outweigh the cons of the costs of all the MS Windows 2008 licenses you need to get to run Dryad.
This would be cool if it actually offered something tangible that the existing distributed computing infrastructures lacked. It's like they've basically invented Hadoop.NET, so now it's something new and great. Congrats Microsoft, you are once again late to the party.
Disclaimer: I work at Microsoft (but this is all my opinion).
Dryad has been discussed publicly since around 2006 (iirc). Hadoop's earliest public release I can find is from 2007 (but I'm sure it was around before that).
I just want to put this into perspective. Microsoft isn't suddenly implementing MapReduce 6 years after Google released the paper. They realized as soon as the paper was released that it was a good idea. (Although Microsoft had some decent distributed systems already in place)
Thanks for your input, but I don't buy this argument. Public discussion and actual working software releases are two completely different things. In addition, considering cash on hand, I expect Microsoft to innovate and bring new ideas to market, not simply re-invent the status quo within their own platforms.
Sorry, should have qualified 'discussing'. By the time Microsoft started talking about it, they had a functioning system. The turn around was pretty damn quick.
As far as innovation goes, read up on Dryad. They do a few pretty interesting things. It goes above and beyond what MapReduce does. The computation is expressed as a giant dynamic directed graph (the graph can change during computation). Each node is a program that feeds into other programs, but fault tolerance and all the other messy bits of distributed programming are abstracted away from the programmer. Think of it as a more generic MapReduce that allows a broader set of computations to easily be performed (put another way, MapReduce provides a subset of the computations possible with Dryad. I suspect given enough cleverness you could get MapReduce to do everything Dryad does, but it'd be pretty hacky)
Also, check out the LINQ support. It is probably one of the coolest things they've done with it.
I still assert that there's a difference between internal prototyping and public release. Microsoft is JUST NOW releasing Dryad. It doesn't matter if they were talking about it for 50 years, what matters is the implementation and availability.
I understand that it's more than just Hadoop.NET. Perhaps I broad brushed, but fundamentally, these are tweaks from what exists. I am not saying that it is purposeless, but I am saying is that I expect more from Microsoft. In order to justify the cost of purchasing Microsoft products, it must create an order of magnitude increase in value, which I'm not seeing.
In essence: neat, but marginal.
Ah, see there's the confusion. Dryad has been used internally since day one, I didn't even know you could get it outside of Microsoft. I don't think it was initially intended to be a commercial project, but more of a "We do billions of processing tasks a day, here's a way for us to do it faster and cheaper."
Anyway, yea I agree... it's definitely not as earth-shattering as MapReduce was. It has some neat integration with other Microsoft technologies but I never really thought of it as a commercial product.
I don't know why people are down voting you. You had valid points and concerns. I voted you back up to help out a bit (you shouldn't lose karma over a perfectly legitimate discussion on the pros and cons of a technology)
In case of large-scale cluster management, data storage and execution technologies Microsoft chose the same path as Google and bunch of other companies - they develop and use these technologies in house, but do not distribute them to the public.
You can see some such technologies "leaking" to public through Windows Azure on this Dryad release from MS Research, but these are side cases.
Michael Isard (one of the Dryad guys on the linked page) published a paper on Autopilot (Microsoft internal cluster management software) in 2007, but you can't buy Autopilot even now.
How exactly are they reinventing the status quo when it predates the status quo?
"I expect Microsoft to innovate and bring new ideas to market"
And risk their dominant position by creating a new market they can't control as well as the last one?
Don't hold your breath.
the data model here is actually more powerful than just map reduce.