Ask HN: What would you compute on 2000 badly behaved worker nodes?

17 points by pedrombafonso 12 years ago · 28 comments · 1 min read

I work for CrowdProcess, and we built a distributed computing platform that runs on Web Workers (badly behaved, i.e. volatile connections with some latency), to monetize websites replacing advertising.

We already have +2000 nodes and have been getting pretty good speedups in Monte-Carlo based algorithms. We also have the simplest ever API for a distributed computing platform.

What else would you build using it ?

patio11 12 years ago

At a previous day job we used n-queens to benchmark distributed computing, as it is disgustingly parallelizable and produces nice visual results.

pedrombafonsoOP 12 years ago

Thanks for your answer! Do you know about any real life application for that problem?
- patio11 12 years ago
  
  If you're freezing to death inside of a computer cluster n-queens will save your life.
  - lucastx 12 years ago
    
    laughing out loud, here.
    took me some seconds to understand this subtle one.

chris_va 12 years ago

Sigh. This doesn't seem slightly immoral to you guys?

This is a fairly common idea, and it usually gets shot down. I am surprised you guys made it this far into the process. Unless maybe there is some user opt-in model?

For example, do you know how much more expensive this is (e.g. Wh/Tflop) than traditional datacenter grid computing? Or, how you are essentially charging users without their knowledge? I'm sure the legal system will love that one.

joaojeronimo 12 years ago

There is an opt-in: http://cdn.crowdprocess.com/opt-in.html (only one website requested it so it's not in English so far).
It's as immoral as advertising, maybe even less. In advertising you show up at a web page and see tons of things that you did not want to see or did not bring you to that web page, sometimes shift your focus and annoys you. It's the same with CrowdProcess, except instead of annoying you, we annoy one CPU core. We believe that while being more expensive than traditional datacenter grid computing, it may be less expensive because it only has to outperform ads. We don't compute on all the CPU cores, of course, only on one.
We actually ask websites to tell they're a part of this, but we cannot control what they do because they can simply display:none.
- patmcc 12 years ago
  
  >>We actually ask websites to tell they're a part of this, but we cannot control what they do because they can simply display:none.
  You could certainly just check to see they're using it properly. Do a screen scrape or even have someone hit the page every month or two, ban anyone abusing it by not notifying users.
  - joaojeronimo 12 years ago
    
    that may be harder to do efficiently than building the entire platform and we're an extremely small team. I'm sure some day we'll do it but can't prioritize that now.
    
    patmcc 12 years ago
    
    Really? How many websites are signed up / do you expect to sign up? Can you not spare 5 minutes per site per month to make sure there's a notification and/or opt-in? Or come up with an automated way to check it. Or use mechanical turk and pay somebody $0.50 per site to check for you.
    If you can't prioritize something as important as running an ethical (and law abiding - take a very close look at the ramifications of unauthorized computer usage, which I think it could be argued you're doing with this platform) business, then you really shouldn't be in business.
    
    pgfonseca 12 years ago
    
    It isn't as much a matter of ability to verify, but to enforce. Currently the platform is supplied by quite a few websites (over 100), and the best way to get them to adequately communicate this is through proper incentives.
    The incentives must be: if you do not comply, your content won't be monetized (as would happen with ads).
- ris 12 years ago
  
  > It's as immoral as advertising, maybe even less.
  You've actually managed to convince yourself that, haven't you?
  It isn't because:
  1: the user is paying for the electricity being wasted by you. A tab left open could be significantly detrimental.
  2: it will cause real problems for mobile users who will be wondering why their battery's flat.
  - joaojeronimo 12 years ago
    
    We -really- don't want to seem sketchy and immoral. We plan to stop computing after a certain amount of time (still to figure out, so far we don't have that many tasks running for it to be significant), and we completely block mobile phones and tablets while the Battery Status API isn't present in all devices (http://www.w3.org/TR/battery-status/ only Firefox implements it currently)

ris 12 years ago

Could you give me a list of all your CDN domains so I can blacklist you?

joaojeronimo 12 years ago

Sure! You'll want to block ss.crowdprocess.com
An opt-out is under way, one website requested an opt-in but we haven't translated that yet: http://cdn.crowdprocess.com/opt-in.html

pokoleo 12 years ago

What you're saying is that you have a botnet.

Look at botnet owners.

joaojeronimo 12 years ago

That's actually pretty good advice, except it may be hard to find legal and moral things to run. We wanted to find the cure for cancer and not produce rainbow tables or do DDoS attacks.
- glimcat 12 years ago
  
  It's also hard to find legal and moral justification for constructing a botnet in the first place.
  - joaojeronimo 12 years ago
    
    Well we noticed that flash ads sometimes take up even more than 100% of a CPU (meaning it can spawn threads and use multicore processing), video ads perhaps even more since they may get to be gpu accelerated, as CSS3 animations. We figured if people are spending this much CPU cycles for advertising, than why not clean up all the advertising and use the CPU cycles for some protein folding and finding a cure for cancer to make a website owners, visitors and a group of researchers happy ?

bdcs 12 years ago

If you can use GPUs then I think scrypt-based coin mining will be the most profitable thing to do. If not, then you need to find problems that are relatively fast on CPUs compared to GPUs, parallelizable, and low bandwidth. It will be a small intersection, but there will likely be something.

pedrombafonsoOP 12 years ago

We're not yet the GPUs. We're definitely searching for that small intersection you've mentioned.
Do you have any clues how to find it?

tlarkworthy 12 years ago

CPLEX and Gurobi cost a lot but are used by big companies to solve mixed integer linear programming problems. You can exploit parralelism in the MIP part.

Operations use MIPs a lot. Usually via an excel spreadsheet :s However people expect a good interface to these problems, its not trivial

thaumaturgy 12 years ago

Hmm. It's not clear from your documentation, is it possible to use xmlhttprequest through the web workers and get the response?

Because having thousands of systems as distributed web crawlers would be really really cool.

joaojeronimo 12 years ago

It's not possible, sorry. It's possible if you ask us to access a certain address, but to the outside world it's not allowed. It would be pretty cool to have distributed web crawlers but it would also be extremely dangerous if someone decided to use CrowdProcess to do a DDoS.
- thaumaturgy 12 years ago
  
  Seems like that could be handled by your API, if you throttle requests by request domain. But, no worries, just curious.

Everlag 12 years ago

coin miners where is any coin using scrypt?

joaojeronimo 12 years ago

that's a pretty good one, especially since mining Litecoins is still profitable on EC2 (at least last time I checked). I was planing to write something like that on the weekend but couldn't, will have to do it pretty soon...

apw 12 years ago

I checked the FAQ, but didn't see an answer -- how do you prevent malicious actors from returning bogus data?

joaojeronimo 12 years ago

We thought of sending puzzles to the worker that would have to really be computed by the VM and would take some amount of time until it was possible to be faked, or would have changed by the time a human could decipher the puzzle and return the expected result, but so far we're only ignoring bad actors and sometimes comparing results from different actors until a quorum is found among results

Settings

Ask HN: What would you compute on 2000 badly behaved worker nodes?

Keyboard Shortcuts