Using AWS lambda for cheap S3 content processing
docs.scanii.comThis is an excellent write-up and a great use-case for what S3+Lambda can do. As AWS offerings go, I believe Lambda has a tremendous upside and will be growing significantly over the next few years.
I've been throwing a lot of my own personal resources into building some things on top of both S3 and lambda and have found a few tools that help with it quite a bit. For one - the lambduh project from Russ Matney has been a great resource for abstracting out some of the more common s3->lambda workflow: https://github.com/lambduh/lambduh. On a different note is T.J. Holowaychuk's Apex project: https://medium.com/@tjholowaychuk/introducing-apex-800824ffa...
Hi there, author here, happy to answer any questions.
Excellent write up, kudos for using IAM and roles for this. We are working on implementing the very same system, we might just re-use your code. Thanks for sharing!
Thank you!
Off topic, but I'm hoping there are some Lambda-heads in the room. I want to write a system that basically rebroadcasts a message sent over SNS, to different HTTP endpoints. (I don't have control over these endpoints so can't use SNS itself as I can't confirm subscriptions).
How many HTTP requests can Lambda do concurrently? Is my best approach to fire all these requests inside one worker, or should/could I have it spin up subsequent lambadas whose only function is to run the HTTP request then close? I'm imagining that would be a lot more expensive.
There's a per-invocation cost that, at ridiculous volumes, becomes non-trivial ($0.20 per million). We get very high throughput (my back of the envelope math says about 3 writes per millisecond per Lambda, that's to a Cassandra cluster) for I/O intensive operations. You can have up to 100 simultaneous invocations, and you can ask for more (we did). Without knowing more about your situation, I would suggest that you use a library that lets you fire off a bunch of async requests and block on them all. Play around with RAM/CPU (one knob for both)--a higher setting may result in quicker processing at a lower cost (!). If you're highly cost sensitive, consider batching your SNS messages--remember that it supports 64K payloads. (We use SNS to do batchloading, actually--it's a cheap, managed alternative to Kinesis.)
Should you choose the fanout route, Tim Wagner from AWS told me that it's pretty fast: https://twitter.com/timallenwagner/status/658025794900365312
My guess: all you can do in the 300 sec execution limit
The tricky part there is that it wouldn't work if you just sat there in a tight look dispatching http requests, any one of them timing out would, likely, trigger the deadline and make all subsequent http requests not happen.
So, alternatively, you could do something with DynamoDB event sources, where you have some sort of pub/sub table that your lambda functions listen on (basically a list of all the http requests that have to happen) - thus keeping a minimal 1 lambda dispatch per http request. The catch is you would need another system to manage that table (technically that system can be lambda itself).
Two important things, 1) I haven't used the dynamodb/lambda integration myself so be skeptical of my suggestion and 2) what I can say from our usage of the s3/lambda integration is that concurrency is not a problem with thousands of lambda dispatches/second being surprisingly quick to spin up.
Excellent post. I wanted to generate thumb images for photos uploaded to s3 bucket using aws Lambda, could successfully implement it.
but found one issue, that many here might not be aware of, S3 bucket and Lambda function should be in same aws region.
Unfortunately, my s3 bucket is in southeast-ap, and aws lambda is not available in this region. couldn't go live today. will have to copy bucket to another region to use it.
hope this helps. thanks.
Well that is strange but it does make sense. They would be spending a lot of money on bandwidth if that was no the case.
One lambda application I really want is a pingdom-style service. Use lambda to ping a web site and send an email if it's offline. Any takers to build this :-)?
Actually, Amazon already did -- one of the sample Lambda functions you can use is just that. It runs on a scheduled timer, and if I remember correctly, will alert using AWS Simple Notification Service, which can be configured to send alerts to your devices or emails.
Unfortunately, unless it's changed recently, SNS can only send SMSes to US numbers :/
I'm building some nice example projects for claudiajs now, this sounds like a fun thing to try. check out https://GitHub.com/claudiajs in a few days, it will probably be there
Still waiting on Ruby support...
It requires a little boilerplate but jruby works well and is reasonably performant if you precompile the ruby:
require 'java' java_import 'com.amazonaws.services.lambda.runtime.Context' java_import 'java.util.Map' class Main java_signature 'static String handler(Map<String,String> args, Context context)' def self.handler(args, context) puts "hello world" end end # jrubyc --java main.rb