Settings

Theme

Using AWS lambda for cheap S3 content processing

docs.scanii.com

89 points by cleverfoo 10 years ago · 19 comments

Reader

jayroh 10 years ago

This is an excellent write-up and a great use-case for what S3+Lambda can do. As AWS offerings go, I believe Lambda has a tremendous upside and will be growing significantly over the next few years.

I've been throwing a lot of my own personal resources into building some things on top of both S3 and lambda and have found a few tools that help with it quite a bit. For one - the lambduh project from Russ Matney has been a great resource for abstracting out some of the more common s3->lambda workflow: https://github.com/lambduh/lambduh. On a different note is T.J. Holowaychuk's Apex project: https://medium.com/@tjholowaychuk/introducing-apex-800824ffa...

cleverfooOP 10 years ago

Hi there, author here, happy to answer any questions.

  • StreamBright 10 years ago

    Excellent write up, kudos for using IAM and roles for this. We are working on implementing the very same system, we might just re-use your code. Thanks for sharing!

untog 10 years ago

Off topic, but I'm hoping there are some Lambda-heads in the room. I want to write a system that basically rebroadcasts a message sent over SNS, to different HTTP endpoints. (I don't have control over these endpoints so can't use SNS itself as I can't confirm subscriptions).

How many HTTP requests can Lambda do concurrently? Is my best approach to fire all these requests inside one worker, or should/could I have it spin up subsequent lambadas whose only function is to run the HTTP request then close? I'm imagining that would be a lot more expensive.

  • wsh91 10 years ago

    There's a per-invocation cost that, at ridiculous volumes, becomes non-trivial ($0.20 per million). We get very high throughput (my back of the envelope math says about 3 writes per millisecond per Lambda, that's to a Cassandra cluster) for I/O intensive operations. You can have up to 100 simultaneous invocations, and you can ask for more (we did). Without knowing more about your situation, I would suggest that you use a library that lets you fire off a bunch of async requests and block on them all. Play around with RAM/CPU (one knob for both)--a higher setting may result in quicker processing at a lower cost (!). If you're highly cost sensitive, consider batching your SNS messages--remember that it supports 64K payloads. (We use SNS to do batchloading, actually--it's a cheap, managed alternative to Kinesis.)

    Should you choose the fanout route, Tim Wagner from AWS told me that it's pretty fast: https://twitter.com/timallenwagner/status/658025794900365312

  • yowmamasita 10 years ago

    My guess: all you can do in the 300 sec execution limit

    • cleverfooOP 10 years ago

      The tricky part there is that it wouldn't work if you just sat there in a tight look dispatching http requests, any one of them timing out would, likely, trigger the deadline and make all subsequent http requests not happen.

      So, alternatively, you could do something with DynamoDB event sources, where you have some sort of pub/sub table that your lambda functions listen on (basically a list of all the http requests that have to happen) - thus keeping a minimal 1 lambda dispatch per http request. The catch is you would need another system to manage that table (technically that system can be lambda itself).

      Two important things, 1) I haven't used the dynamodb/lambda integration myself so be skeptical of my suggestion and 2) what I can say from our usage of the s3/lambda integration is that concurrency is not a problem with thousands of lambda dispatches/second being surprisingly quick to spin up.

piyushco 10 years ago

Excellent post. I wanted to generate thumb images for photos uploaded to s3 bucket using aws Lambda, could successfully implement it.

but found one issue, that many here might not be aware of, S3 bucket and Lambda function should be in same aws region.

Unfortunately, my s3 bucket is in southeast-ap, and aws lambda is not available in this region. couldn't go live today. will have to copy bucket to another region to use it.

hope this helps. thanks.

  • gravypod 10 years ago

    Well that is strange but it does make sense. They would be spending a lot of money on bandwidth if that was no the case.

estefan 10 years ago

One lambda application I really want is a pingdom-style service. Use lambda to ping a web site and send an email if it's offline. Any takers to build this :-)?

aantix 10 years ago

Still waiting on Ruby support...

  • semiquaver 10 years ago

    It requires a little boilerplate but jruby works well and is reasonably performant if you precompile the ruby:

      require 'java'
    
      java_import 'com.amazonaws.services.lambda.runtime.Context'
      java_import 'java.util.Map'
       
      class Main
        java_signature 'static String handler(Map<String,String> args, Context context)'
        def self.handler(args, context)
          puts "hello world"
        end
      end
    
      # jrubyc --java main.rb

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection