Improved VPC Networking for AWS Lambda
aws.amazon.comThis is huge for Lambda. It allows devs to create “serverless” apps [1], with relational databases, without 10+ second cold-start times. In the article, they measure it as 988ms.
I have tried building an API using API Gateway <-> Lambda, but had to choose between using DynamoDB to store data (no-SQL, so challenging to query) or suffering unacceptably long response times whenever a request happens to cause a cold-start. Theoretically, this problem is now going away!
*It allows devs to create those apps _within a VPC_.
You could always have fast startup with Lambda + database outside the VPC.
Which is how most breach announcements start
"A database server was found with an open port exposed to the internet and no or poor authentication, all records were exposed."
This also should mean that Lambda's can get stable public IPs through a VPC for firewalls as well.
*edit for must to most.
But VPC is not an especially efficient additional "defense-in-depth" layer against this kind of "fucked up both firewall and password" configuration mistake. The first 2 obvious ones are passwords, network-level firewalling, host-level firewalling of course, and after that you can add monitoring / port scanning for all your "must be firewalled" services. And you can mandate better-than-passwords authentication methods[1]. Etc. The latter is better because it is more general and doesn't add costly complexity to your networking topology (by way of NAT and/or ambiguous rfc1918 addressing)
[1] For example https://www.postgresql.org/docs/current/auth-cert.html or https://aws.amazon.com/premiumsupport/knowledge-center/users...
You mention defense in depth, but then immediately decide that an extra layer of defense is unnecessary.
Are you proposing that by acknowledging defense-in-depth, consistency dictates that one should pile up as many layers per attack vector as possible? Maybe, if you have infinite resources and don't need to make compromises on where you spend effort and resources in your risk management plan. But that's rarely the case in the real world.
You raise a fair point, this was possible, although it seems safe to say it would be a compromise on security.
I think it’s best not to expose the DB to outside connections in general, although it is still possible [1] when using RDS instances.
I think this is different for things like DynamoDB because, instead of a standard SQL-like db “connection”, they use AWS role-based auth for each request.
Of course, one could always configure some type of proxy service between the lambda and the DB... but that seems antithetical to going “serverless” in the first place.
[1] https://stackoverflow.com/questions/45227397/publicly-access...
Edit: I thought it was not possible to expose an RDS instance outside of a VPC, but I was wrong (you can place it in a public subnet, linked in [1]).
Also, wasn't Aurora Serverless created because of that problem?
I think Aurora Serverless has even worse [1] cold-start times (for the DB itself), and it was intended as more of a price-optimization than a performance boost.
[1] https://forums.aws.amazon.com/thread.jspa?threadID=288043
Aurora Serverless also handles connections. The problem of having a burst of 1000 concurrent invocations accessing your databases still exists even with VPC access
That limit can be raised, apparently. I've seen mention of limits up to 30K concurrent invocations.
is that a good tutorial? looks really good on the surface!
If you put an event bus in the middle (kinesis) your api-lambda functions don't need direct access to your RDS. Subscribe lambda functions to your kinesis stream, and let them handle the link to your RDS. This way you wont notice the cold starts.
This comment doesn't seem to make sense, could you elaborate a bit? How would you replace the database working as a persistence layer to an API application by polling an event stream?
It doesnt replace the DB. It just uses Kineses to be the messaging provider from the lambda to the DB and back. Not sure that's a great idea TBH but who knows?!
And then when you need to read the database?
AWS announced this enhancement at 2018 re:Invent. It was slated for "sometime in 2019". I was excited, and I'm impressed that they released the feature well ahead of the end of the year (and before the next conference, which would obviously raise a few questions)
They did something similar with drift detection and cloud formation. They announced it at reInvent 2017 and released it one week before reInvent 2018.
This has been a /major/ sore point for Lambda use, amazing they fixed it, and always great to see they've documented the intense engineering requirements involved to make it happen.
AWS is a beautiful mix of business and technology, it's very rare to see such a large engineering-driven organization managing to balance customer friendliness. I'm an unashamed fanboy
Major is a bit harsh.
As far as I know this was only an issue for legacy architectures.
No. Using an RDMS instead of DynamoDB is not a “legacy” architecture. You also shouldn’t expose your database publicly.
RDMS is not legacy, but perimeter security certainly is.
I’m one of the harshest critics of “lift and shifters” - old school net ops people who get one certificates by watching an ACloudGuru video, duplicate their on prem infrastructure and processes to the cloud and don’t go all in on the advantages of it and end up costing their clients more - but nowhere is it considered “legacy” to not use perimeter security.
Honest question: what, in you opinion, is the state-of-the-art approach? Something like BeyondCorp?
I think zero-trust goes into a good direction.
https://www.securityroundtable.org/zero-trust-approach-can-m...
There is an entire ecosystem of tooling that will shit itself and wake up half the company if you assign a public IP address in the wrong VPC
Stuff like this is pain in the ass, it was a major problem
This solves one part of the cold start problem. Starting the container and loading the image on to it is still going to cause some latency.
Solves might be strong, but it removes a big portion of the cold start latency that was difficult to optimize for and out of the control of developers. Creating minimal images isn't difficult for a number of environments (e.g. webpacking your node.js lambdas) and barring necessarily large images (think pandas on Lambda) this puts a lot of control for the cold start p99 back in the hands of customers.
Overall, definitely a big win!
I found it a bit strange that they sold Lambda as THE new way to do API development.
You can connect API-Gateway with other services via Velocity templates, which don't have cold starts.
AppSync also doesn't suffer from cold starts.
Both are also serverless services.
Lambda is good if the other solutions are missing something, so you can drop it in quickly, but I wouldn't use it as the go to services for that...
API-Gateway can return HTML?
Sure.
You can write Velocity templates for integration responses.
Normally they are JSON because that's what all the AWS services return and API-Gateway just passes them along.
But you could write something like this:
#set($pets = $input.path('$')) <html lang="en"> <head> <meta charset="utf-8"> <title>Pets</title> </head> <body> <table> <th>ID</th> <th>Type</th> <th>Price</th> #foreach($pet in $pets) <tr> <td>$pet.id</td> <td>$pet.type</td> <td>$pet.price</td> </tr> #end </table> </body> </html>
Which can be mitigated by invoking your own Lambda functions once every minute or 5 minutes. Usually does not blow the budget.
Warming functions in the previous VPC architecture was always a questionable practice. You had no guarantee that your environments would be warm across all subnets or which subnets would handle incoming requests. Beyond that, what happens to requests which you receive when the function is being warmed? You still incur cold starts.
There has never been a guarantee of environment reuse. Any architecture which isn't capable of incurring cold starts is not a good fit for serverless.
Which is a horrible idea....
How many lambdas do you keep warm? 5, 10, 20? Every new connection is a new lambda instance. You're still just delaying the inevitable.
Just use Fargate if you want to stay serverless and don't want the cold start times -- well at least before today.
Sorry but it does not matter how many since everything is automated and you create the warm up scheduler when you create the function. As other pointed out in this thread that are other challenges with this approach.
>> Just use Fargate
We were trying to and we decided that is not our cup of tea. Lambdas are.
Yes it does matter. In your scheduler, how do you ensure your ping (the way you start an instance) is actually creating another instance to keep warm or reusing another instance?
If you want to always keep 20 instances warm, you have to keep the first ping active until the 20th one is done.
In other words, if you want to keep 20 active instances warm and you send 20 requests in 5 seconds, if each request only takes .25 seconds. You will only have 5 warm lambdas. The 6th real concurrent connection will still have a cold start. Also while you are pinging the request to keep it warm, that instance can serve a real user.
Also, API Gateway has an algorithm to decide whether to launch a new lambda are cache a request hoping that using an already warm lambda will free up.
Wow! That's great. Cold starts are no longer a show stopper! Rust powered APIs running on AWS .. It sounds really exciting
This is great news, but I'm bummed they didn't bundle the NAT gateway with this service. In a typical function that calls out to get data from a service and reads/writes from a DB in a VPC, that requires the somewhat painful configuration of a NAT gateway and dedicated subnets, as well as a $36/month bill for the NAT gateway service.
There are some workarounds that using multiple lambdas, but they have their own gotchas.
Still, hooray, this is good news. The Data API is great for Serverless Aurora, but I can't use that with BI tools.
You can run your own gateway instance(s) for a lot cheaper than the nat gateway service. There are definitely some tradeoffs, but if $36/mo is an issue, they can be worthwhile: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Ins...
This is not meant to be a criticism of AWS, I’m an AWS true believer, but the main purpose of going to AWS is to make the “undifferentiated heavy lifting” someone else’s problem not to save money.
Going to AWS to save money on resources is about like going to the Apple Store to buy a cheap laptop.
I’m not against AWS or $36/mo. It just is kinda a drag when the promise of serverless is pay per user and scaling to zero. You could get a nice EC2 t3.medium and do a lot more RPS for the cost of that NAT and Lambda invocations.
If you don’t care about cold starts, there is always Aurora Serverless with the Data API. I don’t believe it requires either a NAT or for the lambda to be attached to your VPC.
This is a great improvement for Lambda users, much reduced cold start times!
Iconoclast view ahead (change my mind please):
AWS does tons of stuff around VPCs....I feel like they really want me to use them (or their customers really want to use them), but I just don't see why.
I just run RDS on the internet. I don't have to muck with the complexity or cost of NATs or peering or Lambda slow start or any other weird networking issues.
I know it's "public", but that seems irrelevant in the era of cloud services. This isn't any different than, say, how Firebase or a million other services run. Should I be concerned that my Firebase apps are insecure because someone isn't overlaying a 10.* network on them?
EDIT: I should clarify that I understand the legitimacy of security groups, especially for technologies that weren't meant to operate outside a firewall. But that's mostly a different subject; AWS had security groups years before VPCs and subnets and NATs.
So making the actual listening port for a database server "public" is generally a bad idea as that is another attack surface of code that honestly is hardly ever made public... but if when you say "public" you mean you are using security groups (which are super trivial to use and easy to understand) to define which other AWS devices can access the port, then yeah: I have never seen any reason why this entire feature should exist and the concept of having to think about IP address ranges as if they somehow matter is one of the things I was escaping when I moved to cloud in the first place, and somehow they wanted to reintroduce it? Why?!? It doesn't even work well (!!), and introduces tons of latency into everything it touches (not just Lambda) :/.
VPCs are very helpful for when you have a large number of developers working in an AWS environment. It'd be oh so easy for a developer to accidently change a bit of terraform and expose your database to the internet without VPCs.
My theory is that a bunch of entrenched network engineers just really like subnets and IPv4 and NAT and don't realize how mostly unnecessary it is in an era of cloud infrastructure and IPv6.
My grandchildren are still going to be NAT'ing.
I think it is easy to have dev, qa and prod VPCs. Without VPC these separate infrastructure groups might be harder to split out. I usually reference security groups instead of subnets in security groups, avoiding referencing IP ranges (v4 or v6) entirely.
You can also use multiple AWS accounts to separate those environments, which also eases user management (usually you have different people with access to each environment, with some overlap).
This also means that developers can have close to admin privileges, since the worst they can do, is to disrupt work of another developer, without affecting either QA or production.
Accounts are the correct level to separate these at. Keeps credentials easier to manage for devs, techs, etc, and limits blast radius if unauthorized accesses take place.
>> This also means that developers can have close to admin privileges
>> limits blast radius if unauthorized accesses take place.
I am not sure if admin privileges are the right way of limiting blast radius. Reasonable roles with least privileges are.
"In information security, computer science, and other fields, the principle of least privilege (PoLP), also known as the principle of minimal privilege or the principle of least authority, requires that in a particular abstraction layer of a computing environment, every module (such as a process, a user, or a program, depending on the subject) must be able to access only the information and resources that are necessary for its legitimate purpose."
Yeah, and you want those roles and accounts scoped appropriately. Someone with the 'Admin' role in a pre-prod account wouldn't necessarily get that same role in a production one. Someone with admin in a standard prod account might not get that same privilege in an account that you manage for a customer, or one with extra compliance requirements, etc.
Which is really an old school way of doing it. Having multiple VPC's doesn't get around account limits, resources with the same name (sns topics, queues, stacks, etc.)
Just use separate accounts in an Organization.
You can give your developers almost complete unrestricted access to your dev account.
I work with both style of AWS installations. Having organization is a constant pain for people who access multiple accounts even with something like Okta. 1 browser can access 1 account and if you switch you have to go through the switching process, or use multiple browsers. Quite often people would like to access cross account resources which a whole different level of discomfort. This is why I still think, old school or not, that having a single account and multiple VPCs is a better option.
If you’re using AWS as a more expensive colo and mostly just using VMs it’s fine. But once you start actually using AWS for anything else it’s s pain. You have to worry about the different service limits and they are all shared.
What’s the process for giving developers access to experiment? When you have different accounts, the development department can have basically unlimited access to the account and moving to production is basically getting a CloudFormation template approved and run by the Devops team.
I am not using AWS as a more expensive colo, actually as a much much cheaper colo instead. I have optimized several workloads on AWS to save several hundred thousand / year for companies. In my experience giving unlimited access to anybody is a fairly bad idea since it that credential gets stolen you are going to quickly figure out how much a bitcoin mine costs on AWS. I find it counter productive to have a separate DevOps team and prefer the DevOps resources embedded into teams. I also do not like CloudFormation, it is a horribly verbose and complicated tool there are many alternatives that are better for the use cases for the customers I work with.
I agree that you have to worry about service limits which is exactly the right thing to have, instead of living a wasteful life pretending that we have infinite resources.
The issue is that you have “shared” service limits between Dev, QA, and production. With separate accounts you can have service limits per environment. If you are worried about limiting dev resources, it’s a lot easier to do it per account using an organization.
How do you propose automating resource creation if not using either CloudFormation or an equivalent tool.
For instance, Parameter Store and Lambda have account level service limits. For lambda it’s 2000 concurrent instances. Do you really want all of your environments sharing that limit.
Parameter Store has a strict unchangeable limit per account. Do you want your dev environment affecting production?
This is true, AWS is pretty anti-internet in all their architecture recommendations. IMO security is better done by firewalling and protocol level authentication (belt + suspenders) because it keeps your configuration clean and understandable, and complexity is the enemy of security.
The attitude has two things in AWS interest: 1) keep lock-in by encouraging customers to build AWS-internal networks 2) don't scare away the lift-and-shift customers who want to transplant their 1990s style "intranet" (or mental model, at least) onto AWS.
Explains also why they aren't very keen about IPv6 because that would encourage internetworking.
Just don't tell anyone that you can access the AWS console from the internet :)
It’s never been considered best practice to expose services needlessly to the Internet. I’m as far from an old school net ops guy as you can get and jump at any new AWS technology that’s feasible as anyone but it would be the height of stupidity for me to expose my Aurora cluster to the Internet. Good luck explaining that to your external auditors.
Of course. I'm just saying that firewalling and end-to-end security are better ways of doing that than routing and ambiguous (rfc1918) addressing. Never trust the network, lest you end up making yours soft and chewy on the inside.
How do you propose you firewall your database access and only allow certain IP addresses when you need access from lambda when the lambda is always run from a random location on AWS’s network?
A lambda is never run “from within your VPC”, it’s attached via an ENI (or at least it was).
Yeah, this kind of thing is part of what I meant when I criticised AWS encouraging VPC use instead of end-to-end security.
But off the top of my head, you could always use the firewall API from the lambda to open network access between it and the RDS when the lambda starts. (In addition to using certs or IAM security on your TLS connection to the RDS db)
And then you are depending on a proprietary connection and authentication protocol instead of being able to use the standard MySQL/Postgres drivers.
Also, how do you handle the commercial hosted databases like Sql Server and Oracle?
Certs is a standard feature. IAM doesn't require nonstandard client or protocol features either (see https://aws.amazon.com/premiumsupport/knowledge-center/users...).
I think with client certs work fine with SQL Server and Oracle too, with standard clients.
But I'm sure you can come up with imagined scenarios where you end up cornered to use VPCs. I get it, these situations may exist. I'm just saying they suck, not that you'll never have to resort to them.
VPCs are very useful when running things like elasticache though( memcache and redis), because AFAIK those don't have an authentication ecosystem so making them public would be a terrible idea.
Memcache has had reliable authentication (SASL) for some time. Redis has authentication meant to be a secondary protection.
But that's a good point.
I suppose all the services I use already have security models (usually more complex, multi-user ones, so agent X can read but not modify, etc.).
HOWEVER...this could be solved with security groups, but it seems that's not the model AWS has emphasized. Security groups are orthogonal to NAT and private networks; AWS had security groups before it had VPCs.
Just use security groups, which fully solved this problem without all of the overhead and complexity of VPC.
Defense in depth. Not having public routes to your database adds another layer of protection. You should have multiple, and they should be redundant.
Firebase was made specifically for the cloud, RDS is the cloud atop postgres, I don't know how secure RDS is (against the myriads of attacks) but it wouldn't be bad idea to use the built-in aws firewall to at least restrict access to trusted IPs ;)
Also, VPCs are really useful if you have many systems and services(yours or theirs) inside AWS.
> RDS is the cloud atop postgres
Or MySQL. Or SQL Server.
Exposing a database to the public internet is a terrible idea. Yes, it's behind an auth layer, but is a username and password really enough protection for literally all of your company's data? Heck most people here have probably set up 2FA for their social media profiles, and for good reason.
> Exposing a database to the public internet is a terrible idea.
Isn't that a core idea of Firebase? Or Dynamo?
Not sure about Firebase, but DynamoDB can be behind your VPC. From what I know about Firebase, it's meant to be a backend for mobile apps, so I guess it makes sense for it to be public.
> AWS does tons of stuff around VPCs....I feel like they really want me to use them (or their customers really want to use them)
VPC is a very convenient fit for enterprise customers extending on-premises networks into the cloud, I think that's the market it's mainly focussed on.
> I know it's "public", but that seems irrelevant in the era of cloud services.
It's not irrelevant, but neither is it necessary critical all the time; there doesn't need to be a one-size- (or even one-shape-)fits-all universal approach to network security, and AWS encompasses a lot of different customer setups, including enterprises for which it is a virtual extensions of the on-premises internal network.
Say you have bunch of ec2 instances with public ip addresses that runs an application that makes calls to 3rd party service. Say that 3rd party service allows only access from certain ip ranges, would you rather give them a single ip or hundreds of ips for them to whitelist? What you say may be acceptable for small infra but not in large setup.
You need to realize that the point of AWS is lock in. Once your service becomes a ball of various AWS pieces, it becomes almost impossible to leave once you start scaling.
So there is always a priority towards things that cause more lock in like VPC.
I don’t think AWS want you to use VPC at all. The Golden Path for serverless on AWS has always been “networkless”. If your use case fits into their stateless HTTP stack (API Gateway + Lambda + Dynamo + SQS...) then you’re gonna have a really easy time. The reason VPC is required is because not every use case is going to fit into that stack, and the fact that VPC functionality seems to be always just a little bit not good enough (in comparison) doesn’t make me think they’re pushing people towards it.
They definitely do if you're trying to use things based on EC2 instances. The newest types of instances have been VPC-only for years now.
But if you’re using EC2, then you’ve already wandered far off the serverless path.