In The Works – Amazon Aurora Serverless

aws.amazon.com

87 points by polmolea 8 years ago · 33 comments

Reader

Last week I created a small framework called lambdaphp[1].

My aim was to host a Wordpress or Laravel site on aws lambda without paying any monthly hosting charges. I got everything running (sessions, fs, request, etc) except of course I still had to use RDS and I think this takes care of it too. So now I can expect to run a full site which is only billed by the amount of resources consumed.

Of course my project was just for my own amusement but I think this is the way how it's going to be done soon or where Aws is heading. Seems pretty nifty!

[1] https://github.com/san-kumar/lambdaphp

meritt 8 years ago

AWS will likely add PHP support, yes but they absolutely will not do what this project does: Running a NodeJS http server that launches a PHP binary and runs local scripts.
- superasn 8 years ago
  
  That's just one way to make it happen until we get proper support. I wrote to AWS support and they said that they will consider PHP support for AWS lamba as many people have requested it too (did not give me an ETA though).
  The funny thing is the response time is quite good despite running it through a NodeJS server that launches a PHP binary (340ms, faster than 98% of the sites as per pingdom[1]).
  [1] https://tools.pingdom.com/#!/ex9izm/https://www.lambdaphp.ho...
  - meritt 8 years ago
    
    I understand your options are limited until AWS adds first-class PHP support but there are still plenty of superior ways you could arrange this. You're running the PHP CLI as opposed to using a persistent process daemon and speaking to it over one of the other SAPIs (e.g. FastCGI). You're also completely defeating the inherent async properties of NodeJS too by launching a synchronous PHP process, and you're completely missing the benefit of "pre-warming" servers. Only the NodeJS aspect gets to pre-warm but a PHP process must launch from cold for every single request.
    340ms is not a good response time at all. You need to elevate your expectations.
    
    superasn 8 years ago
    
    That's quite insightful. I see you really know what you're talking about and any examples or snippets will be helpful. You are also welcome to collaborate with me on the project for fun if you want (it was a great challenge for me). I'll probably look into it again next weekend to implement your suggestions.
    How much do you reckon we can get the response time with these optimizations? Right now the 340ms is nodejs + php-cli + my wrapper script (that inits s3, etc) + the actual content script (another php file) and request for the css framework from a CDN.
    
    lozenge 8 years ago
    
    You can't launch a persistent process on AWS Lambda.
    
    meritt 8 years ago
    
    You most definitely can. Lambda freezes the state of the container and thaws it for the next request (assuming the next request is relatively soon, otherwise it deletes the frozen instance entirely -- This is the entire premise behind keeping lambda functions "warm") including any background processes that might be running.
    It's not persistent in that it exists between requests, but for the use case here it's exactly what is needed. He would save significant response time by having php-fpm already running, code parsed, opcodes cached and just issuing a new FCGI request to php-fpm instead of relaunching the PHP CLI each and every time.

sologoub 8 years ago

Really wish "serverless" also meant that it can work with AWS Lambda efficiently. As is, each function would try to open a connection, making the overall overhead extremely high and stressing DBs.

k__ 8 years ago

Can't you open the connection outside of the function?
So as long as the function is hot, it won't reconnect.
- mjb 8 years ago
  
  That's right. Make connections to databases (and most other things) the first time your Lambda handler runs, and stash them in a static/global variable for re-use on future runs. That allows you to amortize the cost of forming the connection over many executions of your function, which improves latency, reduces cost, and reduces load on the backend.
  - k__ 8 years ago
    
    Well, amortize sounds a bit funny when the first user basically has to pay the whole cold run, hehe
  - sologoub 8 years ago
    
    Haven't heard of this approach yet. Do you have a write up I could reference to try it out?
    
    k__ 8 years ago
    
    I never read anything official, but some stuff by framework makers (serverless/apex up)
    edit: https://medium.com/@tjholowaychuk/aws-lambda-lifecycle-and-i...
    
    sologoub 8 years ago
    
    Thanks, but I think this is very different from connection pooling on a DB, say pdbouncer.
    Doing some searching, I did find this that seems much closer: http://blog.rowanudell.com/database-connections-in-lambda/
    TLDR; you can define the connection to DB outside of the scope of a given function, so it’s scoped to the container and can be reused so long as the container is not recycled. Seems promising!
    
    k__ 8 years ago
    
    That's basically what I wrote
inopinatus 8 years ago

Are you sure? That is, do you have specific knowledge of the implementation? Because:
> The endpoint is a simple proxy that routes your queries to a rapidly scaled fleet of database resources.
That doesn’t seem to preclude a multiplexing proxy a la PgBouncer.
- sologoub 8 years ago
  
  I don't think it is, just wishing it was.
nleach 8 years ago

Is that true? The default limit on concurrent function executions is 1000. The existing Aurora (MySQL) should be able to handle cycling through those connections without issue.
- sologoub 8 years ago
  
  It’s not that the DB servers can’t handle it, it’s that establishing a connection is slower than re-using an existing one.
  You also forgo certain optimizations within the DB designed to make fetching things for the given connection/scope faster, such as temp tables.

brootstrap 8 years ago

3 months later... AWS announces serverless, databaseless database system.

ignoramous 8 years ago

Which might be when their marketing team decides to re-launch S3?
- noobiemcfoob 8 years ago
  
  I'd merrily laugh at that ad :)
maxxxxx 8 years ago

And then "cloudless"?

dhd415 8 years ago

Given the existing architecture of Aurora, the independent scaling of CPU and storage seems pretty straightforward. What is much harder to scale up and (especially) down is a warmed-up buffer pool which is critical for consistent query performance. I wonder if that is what they mean in the article when they say that scaling happens on a "pool of 'warm' instances". If so, I'd be very interested in more details on how that works.

bpicolo 8 years ago

I wonder how the cost differs, given they have to keep hot standbys? Perhaps a lot of behind-the scenes prediction to make it cost effective? Takes a while to create a standby from scratch for large DBs.

Either way, sweet tech. Seems like a fun thing to build

zwily 8 years ago

Aurora doesn’t work that way - it has one storage layer shared by all the nodes. Adding a new replica is very fast no matter your data set size.
- bpicolo 8 years ago
  
  Ahh, makes sense. Thanks for the clarification there!

mrep 8 years ago

Is this for the master or the read replicas?

If it is for the master, that would be amazing and I would wonder if you can scale up past the 1 instance size max (currently r4.16xl)

drej 8 years ago

Cool... but.. how?

strong_silent_t 8 years ago

So, if I'm understanding right, it is using the database server instances as the current Aurora does. You just aren't responsible for managing them, the service makes those decisions within the spectrum of Aurora's capabilities and can change it in a very small time increment.
EDIT: pretty impressive though, looking at James Hamilton's criteria for "Automatic Management and Provisioning" here: https://www.usenix.org/legacy/event/lisa07/tech/full_papers/... , I think this addresses everything database related, as long as you are under the maximum capacity of Aurora.
odammit 8 years ago

Aurora already spreads each "10GB chunk of your database volume is replicated six ways, across three Availability Zones." It looks like from the diagram they are taking advantage of that fact and spinning down the instances that serve off those "disks"
jeffbarr 8 years ago

Read the post!
- drej 8 years ago
  
  I have, I'm just excited and baffled at the same time :-)

Settings

In The Works – Amazon Aurora Serverless

Keyboard Shortcuts