Ask HN: What would you compute on 2000 badly behaved worker nodes?
I work for CrowdProcess, and we built a distributed computing platform that runs on Web Workers (badly behaved, i.e. volatile connections with some latency), to monetize websites replacing advertising.
We already have +2000 nodes and have been getting pretty good speedups in Monte-Carlo based algorithms. We also have the simplest ever API for a distributed computing platform.
What else would you build using it ? At a previous day job we used n-queens to benchmark distributed computing, as it is disgustingly parallelizable and produces nice visual results. Thanks for your answer!
Do you know about any real life application for that problem? If you're freezing to death inside of a computer cluster n-queens will save your life. laughing out loud, here. took me some seconds to understand this subtle one. Sigh. This doesn't seem slightly immoral to you guys? This is a fairly common idea, and it usually gets shot down. I am surprised you guys made it this far into the process. Unless maybe there is some user opt-in model? For example, do you know how much more expensive this is (e.g. Wh/Tflop) than traditional datacenter grid computing? Or, how you are essentially charging users without their knowledge? I'm sure the legal system will love that one. There is an opt-in: http://cdn.crowdprocess.com/opt-in.html (only one website requested it so it's not in English so far). It's as immoral as advertising, maybe even less. In advertising you show up at a web page and see tons of things that you did not want to see or did not bring you to that web page, sometimes shift your focus and annoys you. It's the same with CrowdProcess, except instead of annoying you, we annoy one CPU core. We believe that while being more expensive than traditional datacenter grid computing, it may be less expensive because it only has to outperform ads. We don't compute on all the CPU cores, of course, only on one. We actually ask websites to tell they're a part of this, but we cannot control what they do because they can simply display:none. >>We actually ask websites to tell they're a part of this, but we cannot control what they do because they can simply display:none. You could certainly just check to see they're using it properly. Do a screen scrape or even have someone hit the page every month or two, ban anyone abusing it by not notifying users. that may be harder to do efficiently than building the entire platform and we're an extremely small team. I'm sure some day we'll do it but can't prioritize that now. Really? How many websites are signed up / do you expect to sign up? Can you not spare 5 minutes per site per month to make sure there's a notification and/or opt-in? Or come up with an automated way to check it. Or use mechanical turk and pay somebody $0.50 per site to check for you. If you can't prioritize something as important as running an ethical (and law abiding - take a very close look at the ramifications of unauthorized computer usage, which I think it could be argued you're doing with this platform) business, then you really shouldn't be in business. It isn't as much a matter of ability to verify, but to enforce. Currently the platform is supplied by quite a few websites (over 100), and the best way to get them to adequately communicate this is through proper incentives. The incentives must be: if you do not comply, your content won't be monetized (as would happen with ads). > It's as immoral as advertising, maybe even less. You've actually managed to convince yourself that, haven't you? It isn't because: 1: the user is paying for the electricity being wasted by you. A tab left open could be significantly detrimental. 2: it will cause real problems for mobile users who will be wondering why their battery's flat. We -really- don't want to seem sketchy and immoral. We plan to stop computing after a certain amount of time (still to figure out, so far we don't have that many tasks running for it to be significant), and we completely block mobile phones and tablets while the Battery Status API isn't present in all devices (http://www.w3.org/TR/battery-status/ only Firefox implements it currently) Could you give me a list of all your CDN domains so I can blacklist you? Sure! You'll want to block ss.crowdprocess.com An opt-out is under way, one website requested an opt-in but we haven't translated that yet: http://cdn.crowdprocess.com/opt-in.html What you're saying is that you have a botnet. Look at botnet owners. That's actually pretty good advice, except it may be hard to find legal and moral things to run. We wanted to find the cure for cancer and not produce rainbow tables or do DDoS attacks. It's also hard to find legal and moral justification for constructing a botnet in the first place. Well we noticed that flash ads sometimes take up even more than 100% of a CPU (meaning it can spawn threads and use multicore processing), video ads perhaps even more since they may get to be gpu accelerated, as CSS3 animations. We figured if people are spending this much CPU cycles for advertising, than why not clean up all the advertising and use the CPU cycles for some protein folding and finding a cure for cancer to make a website owners, visitors and a group of researchers happy ? If you can use GPUs then I think scrypt-based coin mining will be the most profitable thing to do. If not, then you need to find problems that are relatively fast on CPUs compared to GPUs, parallelizable, and low bandwidth. It will be a small intersection, but there will likely be something. We're not yet the GPUs. We're definitely searching for that small intersection you've mentioned. Do you have any clues how to find it? CPLEX and Gurobi cost a lot but are used by big companies to solve mixed integer linear programming problems. You can exploit parralelism in the MIP part. Operations use MIPs a lot. Usually via an excel spreadsheet :s However people expect a good interface to these problems, its not trivial Hmm. It's not clear from your documentation, is it possible to use xmlhttprequest through the web workers and get the response? Because having thousands of systems as distributed web crawlers would be really really cool. It's not possible, sorry. It's possible if you ask us to access a certain address, but to the outside world it's not allowed. It would be pretty cool to have distributed web crawlers but it would also be extremely dangerous if someone decided to use CrowdProcess to do a DDoS. Seems like that could be handled by your API, if you throttle requests by request domain. But, no worries, just curious. coin miners where is any coin using scrypt? that's a pretty good one, especially since mining Litecoins is still profitable on EC2 (at least last time I checked). I was planing to write something like that on the weekend but couldn't, will have to do it pretty soon... I checked the FAQ, but didn't see an answer -- how do you prevent malicious actors from returning bogus data? We thought of sending puzzles to the worker that would have to really be computed by the VM and would take some amount of time until it was possible to be faked, or would have changed by the time a human could decipher the puzzle and return the expected result, but so far we're only ignoring bad actors and sometimes comparing results from different actors until a quorum is found among results