Settings

Theme

AWS S3: Sometimes you should press the $100k button

cyclic.sh

408 points by korostelevm 4 years ago · 239 comments

Reader

lenkite 4 years ago

sigh. My team is facing all these issues. Drowning in data. Crazy S3 bill spikes. And not just S3 - Azure, GCP, Alibaba, etc since we are a multi-cloud product.

Earlier, we couldn't even figure out lifecycle policies to expire objects since naturally every PM had a different opinion on the data lifecycle. So it was old-fashioned cleanup jobs that were scheduled and triggered when a byzantine set of conditions were met. Sometimes they were never met - cue bill spike.

Thankfully, all the new data privacy & protection regulations are a life-saver. Now, we can blindly delete all associated data when a customer off-boards or trial expires or when data is no longer used for original purpose. Just tell the intransigent PM's that we are strictly following govt regulations.

  • CydeWeys 4 years ago

    The data protection regulations really are so freeing, huh. It's amazing to be able to delete all this stuff without worrying about having to keep it forever.

    • jeff_vader 4 years ago

      In case of my previous employer it led to incredibly complicated encryption system. It took couple years to maybe implement in 10% of the system. Deleting any old data was rejected.

      • hinkley 4 years ago

        I wonder sometimes if it would help if we collectively watched more anti-hoarding shows, in order to see how the consultants convince their customers they can get rid of stuff.

        • mro_name 4 years ago

          humans started their first 300k years as nomads – storing was just impossible and decrufing happened by itself when moving along.

          So maybe that's why we're not good at it yet.

          • hinkley 4 years ago

            Being a renter definitely kept me lighter for a long time.

            When you have to box things up over and over you find that the physical and mental energy around keeping it aren’t adding up. I wonder if migrating from cloud to cloud would simulate this experience.

            • Bayart 4 years ago

              Being a renter just taught me to batch my $STUFF I/O to minimize read-writes to disk and maximize available low-latency space. ie. fill my bags to the brim with shit I didn't plan using whenever I'd go to my parents'.

            • travisgriggs 4 years ago

              Two space garbage collector in action right there. Maybe all things software need a "move it or lose it" impetus. Features in apps, old data, you name it. If you've gotta keep transferring/translating it, it would definitely pare things down.

              • mro_name 4 years ago

                maybe reuse is inferior to re-implement. Moderately re-inventing wheels may be beneficial. What may be a threshold?

          • fomine3 4 years ago

            Also hoarding digital data is far easier than real. I wish I could have grep on real space.

      • stingraycharles 4 years ago

        How is encryption compliant? I’ve implemented GDPR data infrastructures twice now, and as far as I’m aware, the only way to be compliant with encryption is when you throw the decryption key away.

        • aeyes 4 years ago

          Sometimes it might be a single field in a 1MB nested structure that you have to remove. So it gets encrypted when the whole structure gets stored and when the field is to be deleted you just throw away the key instead of modifying the entire 1MB just to remove a few kB.

          • dylan604 4 years ago

            If you're comparing gov't regulations to delete data to saving a few KB, then I think you're looking at this wrong.

            • viraptor 4 years ago

              It's few KB per-record. In practice when schemes like that are applied, it means "in total we can remove this key and not rewrite 10M rows across 3 data stores which itself would cost $$$ and make the database and incremental backups cry".

              • ByteJockey 4 years ago

                Bingo.

                We did a similar thing except replacing the values with a UUID and storing the pair in a lookup table somewhere. Delete that row and none of the rest of the data is able to be tied back to a human being.

                Bonus, most people didn't need that data, and it was no longer given out to everyone who grabbed the entire dataset.

        • spelunker 4 years ago

          As mentioned, encrypt something and throw a way the key, often called "crypto shredding".

          • stingraycharles 4 years ago

            Ahh I see, and that way you can quickly “remove” a whole lot of data by just removing the key, which makes for cheap operations, and/or more flexible workflow (you can periodically compact the database and remove entries for which you have no key).

            Is my understanding correct?

            • dalyons 4 years ago

              yes, but also its that a lot of the data these days ends up in pseudo-append-only stores (like s3/glacier, or many data warehouse products) where deletes/updates to old data are extremely expensive. Or just having to scan petabytes of cold stored data looking for a particular users records. Throwing away the key is instant and "free".

              • chrisjc 4 years ago

                Interesting... this raises soooo many questions.

                How are "crypto-shredding" actions propagated to the access patterns/layer?

                I assume that there is an encrypted partition/cluster/shard key (in addition to similarly encrypted rows/fields) that is invalidated during the shredding causing any predicate matching on these ids to evaluate to false.

                ---

                Now that I've typed this out, i realize that by electing to encrypt individual fields, all and any predicate matching will evaluate to false and has nothing to do with partitioning, sharding, or clustering...

                I guess it would also be pretty awesome since you could invalidated entire sets of data by "shredding" grouping ids that are being used as partition/cluster/shard keys.

                Now I realize that this implies that you shouldn't encrypt each and every fields of related data the same way (grouping ids), otherwise you're potentially going to end up with unique keys/ids for common attributes across sets of data... potentially rendering clustering/sharding/partition useless (cardinality too great).

                While "defragging" or "rebalancing" this increasingly "sparse", old data would be expensive, surely there has to come a point where the storage costs start to exceed that of interaction costs for specific subsets of your prefixes. For instance, partitions that consist entirely of data that has had all of its respective encryption keys shredded.

                ---

                Illuminating comment that has set my mind into overdrive... Fascinating stuff!

          • jhgb 4 years ago

            That doesn't sound like something jeff_vader was talking about, since "deleting any old data was rejected" and this is definitely a way of deleting stuff.

    • theshrike79 4 years ago

      Yep, having everything disappear at 2 months max is a life-saver.

      That "absolutely essential thing" isn't essential any more when there is a possible GDPR/CCPA violation with a significant fine just around the corner.

      • koolba 4 years ago

        Just make sure you actually test your backups. Two months of unusable backups are just as useful as no backups.

        • marcosdumay 4 years ago

          Well, you should have done this before GDPR too, but reminding people to test backups is never too late and never too often.

    • whimsicalism 4 years ago

      now this is a spin i havent heard before.

      • jabroni_salad 4 years ago

        As a sysadmin I really wish you had. SO MANY problems have come to my desk because some dude 3 years ago did not consider retention or rotation and now I have to figure out what to do with a 4TB .txt that is apparently important.

        • briffle 4 years ago

          "You never know when you might need this info to debug" The developer says as their cronjob creates a 250MB csv file, and a few MB of debug logs per day, for the past few years. "Disk is cheap" they say.

          As a sysadmin, I hate that too.

          • whimsicalism 4 years ago

            sometimes the data is just big...

            • colechristensen 4 years ago

              Often a considerable portion of those logs are useless, trace level misclassified as info, kept for years for no reason.

              You should keep a minimal set of logs necessary for audit, logs for errors which are actually errors, and logs for things which happen unexpectedly.

              What people do keep are logs for everything which happens, almost all of which is never a surprise.

              One needs to go through logs periodically and purge the logging code for every kind of message which doesn’t spark joy, I mean seem like it would ever be useful to know.

              • whimsicalism 4 years ago

                sure, in a world where machine learning doesnt exist i would agree with you. for low level logs of things like "memory low, spawning a new container" i would also agree with you. not for user actions though (which is the topic closest to whats under discussion given what sort of data these regulations cover)

        • dylan604 4 years ago

          Find out how important it is with a `mv 4TB.txt 4TB.old` type of things. See how many people come screaming

        • chrisjc 4 years ago

          Have you come up with a process, or an idea for a process to ensure this doesn't happen?

          For instance when they create a provisioning request, are you able to set an extremely low threshold? When they say that won't do, the cost increases and their able to see/understand and start to care about the actual lifecycles of what they're creating?

          Surely there is a way to project and monitor the cost of their resources over time, and deliver them an invoice on a regular basis? In other words something like a cost attribution model? That way when the bills start to increase dramatically overtime, pinpointing the heavy hitters becomes trivial, and when they come knocking on your door to "do something about it" you can just say "go talk to Bob".

          I don't mean to sound like I'm trivializing the problem (honestly I can relate as I've gone through it myself), but I'd love to hear how anyone else has dealt with this issue effectively.

          • jabroni_salad 4 years ago

            It comes down to monitoring, alerting, and followup. In other words, "good ops", which is lacking almost everywhere. Unfortunately that is always a moving target, with added complexity being that we're an external service provider and have limited authority in the client environment. Also, the sorts of companies that outsource their ops will also be willing to change providers multiple times, so it's often like trying to live in a library that has seen many generations of librarians each with their own ideas for how things ought to be organized.

      • hvs 4 years ago

        You haven't heard it because it's not spin, it's from an engineer's point of view. That's not the view you hear in the news when it comes to these things.

        • whimsicalism 4 years ago

          HN seems like an odd place to assume that people only hear about things from the news and aren't engineers themselves.

          i am a dev that has to deal with these regulations in my day to day. it is a pain, it is not freeing in any sense, and it makes my models worse.

          granted, i think there are good reasons for it, but it does not make my life easier for sure.

        • alisonkisk 4 years ago

          Eh, Retention and Deletion are both pain for devs. Not having to care is the happy state.

  • StratusBen 4 years ago

    Disclosure: I'm Co-Founder and CEO of a cloud cost company named https://www.vantage.sh/ - I also used to be on the product management team at AWS and DigitalOcean.

    I'm not intentionally trying to shill but this is exactly why people choose to use Vantage. We give them a set of features for automating and understanding what they can do to manage and save on costs. We're also adding multi-cloud support (GCP is in early access, Azure is coming) to be a single pane of glass into cloud costs.

    If anyone needs help on this stuff, I really love it. We have a generous free tier and free trial. We also have a Slack community of ~400 people nerding out on cloud costs.

    • vdm 4 years ago

      https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fdocs.va...

      I gave vantage.sh 5 minutes and did not see anything for S3 that is not already available from the built-in Cost Explorer, Storage Lens, Cost and Usage Reports, and taking 1 hour to study the docs https://docs.aws.amazon.com/AmazonS3/latest/userguide/Bucket...

      Most "cloud optimisation" products want to tell you which EC2 instance type to use, but can't actually give actionable advice for S3. Happy to be corrected on this.

      • simonw 4 years ago

        Saving people from learning how to use Cost Explorer, Storage Lens, Cost and Usage Reports - and then taking 1 hour to study documentation - sounds to me like a legitimate market opportunity.

        • alar44 4 years ago

          Not really. Sometimes you actually have to understand things. If you're so concerned about your billing, someone on your team should probably invest a freaking hour to understand it. If that can't happen, you are just setting yourself up for failure.

          • Jgrubb 4 years ago

            I've been learning the ins and outs of the major 3 providers cloud billing setups for the last year, and I'm just getting started. This is not a 1 hour job, but you're right that someone in your team needs to understand it.

            • llbeansandrice 4 years ago

              At my last job we had a team spend an entire quarter just to help visualize and properly track all of our AWS expenditures. It's a huge job.

              • beberlei 4 years ago

                Not confused why there is talk about software developer shortage when it seems a good amount of them work on this kind of nonsense. Talk about bs jobs.

              • sokoloff 4 years ago

                If you are willing and able to spend an entire team for a quarter, I think you'd be better off for quite a while to just license Cloudability (now maybe Apptio). Its face value isn't "cheap", but for most companies, I think it's going to be cheaper than a team for a quarter plus the on-going maintenance of that tech.

                (We're a long-time Cloudability customer; no other connection/conflict of interest here.)

        • banku_brougham 4 years ago

          its a lot more than an hour, in my experience

      • StratusBen 4 years ago

        We are in process of updating the documentation because you're right that it needs more work. For the record, if you're doing everything on your own via Cost Explorer, Storage Lens and processing CUR you may be set. From what we hear, most folks do not want to deal with processing CUR (or even know what it is) and struggle with Cost Explorer.

        Vantage automates everything you just mentioned to allow you to make quicker decisions. Here's a screenshot of what we do on a S3 Bucket basis: https://s3.amazonaws.com/assets.vantage.sh/www/s3_example.pn...

        We'll profile storage classes and the number of objects, tell you the exact cost of turning on things like intelligent tiering and how much that will cost with specific potential savings. This is all done out of the box, automatically - and we profile List/Describe APIs to always discover newly created S3 Buckets.

        From speaking with hundreds of customers, I can also assure you that at a certain scale, billing does not take an hour...there are entire teams built around this at larger companies.

    • samlambert 4 years ago

      Vantage is a seriously awesome product. We love it at PlanetScale. Obviously being a cloud product things can get pricy and so Vantage is essential.

    • cookiesboxcar 4 years ago

      I love vantage. Thank you for making it.

    • imwillofficial 4 years ago

      I work on a team the computes bills, shoot me a slack invite and perhaps I can offer insight.

  • candiddevmike 4 years ago

    Are you multi-cloud because your customers need you to be multi-cloud?

    • lenkite 4 years ago

      Yes, geographically diverse customers who prefer different cloud platforms.

  • raxxorrax 4 years ago

    I host stuff on AWS, but I am pretty sure that hosting on my own server or a server a IT service provider maintains is much cheaper.

    • tekknik 4 years ago

      Did you include maintenance, patching and machine upgrades? Cause likely it’s not.

liveoneggs 4 years ago

I have caused billing spikes like this before those little warnings were invented and it was always a dark day. They are really a life saver.

Lifecycle rules are also welcome. Writing them yourself was always a pain and tended to be expensive with list operations eating up that api calls bill.

----

Once I supported an app that dumped small objects into s3 and begged the dev team to store the small objects in oracle as BLOBS to be concatenated into normal-sized s3 bjects after a reasonable timeout where no new small objects would reasonably be created. They refused (of course) and the bills for managing a bucket with millions and millions of tiny objects were just what you expect.

I then went for a compromise solution asking if we could stitch the small objects together after a period of time so they would be eligible for things like infrequent access or glacier but, alas, "dev time is expensive you know" so N figure s3 bills continue as far as I know.

  • darkwater 4 years ago

    > I then went for a compromise solution asking if we could stitch the small objects together after a period of time so they would be eligible for things like infrequent access or glacier but, alas, "dev time is expensive you know" so N figure s3 bills continue as far as I know.

    This hits home so hard that it hurts. In my case is not S3 but compute bills but the core concept is the same.

    • WrtCdEvrydy 4 years ago

      Because the bill isn't a "dev problem". Once you move those bills to "devops", it becomes an infrastructure problem.

      • zrail 4 years ago

        A big chunk of responsibility for teams doing cloud devops is cost attribution. Cloud costs are incurred by services and those services are owned by teams. Those teams should be billed for their costs and encouraged (via spiffs or the perf process if necessary) to manage them. Devops' job is to build the tooling that allows that to happen.

    • ac2022 4 years ago

      It is also because devops is shoved down devs throats while claiming that it is easier and better. So now many of developers don’t want to spend time rewriting their code for something that is supposed to reduce their workload not increase it.

      • darkwater 4 years ago

        Yeah, I totally agree, and I do this from an Ops perspective. 6-7 years ago I was really fearing my job would disappear because "everything will be automated in the cloud and owned by developers who write business code". Turns out it just transformed a bit but there is still plenty of not-strictly-related-to-business code to be written and maintained that developers mainly don't care about.

  • vdm 4 years ago

    The warning should say "you have N million objects technically eligible for an archive storage class and hitting the button to transition them will cost $M".

    Also S3 should no-op transitions for objects smaller than the break-even size for each storage class, even if you ask it to.

  • sharken 4 years ago

    I suppose it's not just dev time on the line, but also the risk of doing the change that is thought to be too high.

    If I ever get to be a manager I'd go for an idea such as yours. Though I suspect too many managers are too far removed from the technical aspect of things and don't listen nearly enough.

asim 4 years ago

The AWS horror stories never cease to amaze me. It's like we're banging our heads against the wall expecting a different outcome each time. What's more frustrating, the AWS zealots are quite happy to tell you how you're doing it wrong. It's the users fault for misusing the service. The reality is, AWS was built for a specific purpose and demographic of user. It's now complexity and scale makes it unusable for newer devs. I'd argue, we need a completely new experience for the next generation.

  • dasil003 4 years ago

    I'm not sure any sizable group is banging their head against a wall. Yes, AWS is complex. Yes, AWS has cost foot guns. These are natural outcomes of removing friction from scaling.

    Sure we could start with something simpler, but as you may have noticed, even the more basic hosting providers like DigitalOcean and Linode have been adding S3-compatible object storage because of its proven utility.

    In terms of making something meaningfully simpler, I think Heroku was the high water mark. But even though it was a great developer experience, the price/performance barriers were a lot more intractable than dealing with AWS.

    • marcosdumay 4 years ago

      > These are natural outcomes of removing friction from scaling.

      Yes, and making scaling frictionless brings a very tiny bit of value for everybody, but a huge amount of value for the cloud operator. Any bit of friction would completely remove that problem.

      Also, focusing on scaling before efficiency benefits nobody but the cloud provider.

      • geoduck14 4 years ago

        >Yes, and making scaling frictionless brings a very tiny bit of value for everybody

        I disagree. Using AWS in a frictionless way has made the difference between not deploying applications and deploying them. In one example, I used S3 and EC2 to deploy an app used by several thousand users at work - the deployment was completely scripted and tested before the old app was taken down. It eliminated errors in deploying, increased frequency of denying (which enabled faster security patches), reduced down time from 6 hours to zero, enabled new features for our users (due to scripted testing). Everyone won - and I got a promotion :)

      • tekknik 4 years ago

        AWS was originally built to run amazon workloads. When building software at amazon scale absolutely is one of the first things you think about.

    • WaxProlix 4 years ago

      Heroku did so much right. I recently was toying with some bot frameworks (think Discord or IRC, nothing spammy or review-gaming) and getting everything set up on a free tier dyno with free managed sql backing it up, and a github test/build integration, all took an hour or so. Really exceeded my expectations.

      Not sure how it scales for production loads but my experience was so positive I'll probably go back for future projects.

      • greiskul 4 years ago

        Yeah, heroku is absolutely the best in just getting something running. Truth is most projects don't ever have to scale, either because they are hobby projects, or cause they just fail. Heroku is the simplest platform that I know to just quickly test something. If you do find a good market fit and then need to scale, then sure, use some time to get out of it. But for proof of concepts, rapid iteration, etc. Heroku is awesome.

        • ericpauley 4 years ago

          I’ll argue that Fly.io is beginning to meet that need in a lot of ways, especially with managed Postgres now.

  • jollybean 4 years ago

    In this case it is absolutely the user 'doing it wrong'.

    AWS allows you to store gigantic amounts of data, thus lowering the bar dramatically for the kinds of things that we will keep.

    This invariably creates a different kind of problem when those thresholds are met.

    In this case, you have 'so much data you don't know what to do with it'.

    Akin to having 'really cheap warehouse storage space' that just gets filled up.

    "It's now complexity and scale makes it unusable for newer devs. I'"

    No - the 'complexity' bit is a bit of a problem, but not the scale.

    The 'complexity bit' can be overcome if you stick to some very basic things like running Ec2 instances and very basic security configs. Beyond that, yes it's hard. But the 'equivalent' of having your own infra would be simply to have a bunch of Ec2 instances on AWS and 'that's it' - and that's essentially achievable without much fuss. That's always an option to small companies, i.e. 'just fun some instances' and don't touch anything else.

  • rmbyrro 4 years ago

    What do you see missing or not well explained in AWS documentation that newer devs wouldn't understand?

    I started using S3 early in my career and didn't see this problem. I always thought in data retention during design phase.

    My opinion is that lazy, careless or under time pressure developers will not, and then will get bitten. But it would happen to any tool. Maybe a different problem, but they'll always get bitten hard ...

    • chrisjc 4 years ago

      Forget newer devs for a moment... I've had years of experience with S3 and sounds like the author of the article has too. Despite my years of experience in programming/DBs/etc, I'm definitely not an amazing developer.

      But I learned a whole lot of new things from this article that I didn't understand from reading the AWS documentation, let alone think I had to even concern myself with some of these issues. Spotty warnings about transitional request charges?

      Anyway, kudos to you for always thinking about (and i hope actualizing) retention during policies the design phase. However, while I certainly think devs bare some of this responsibility, I'm sure they're usual met with all of the usual excuses and kicking the can down the road line of reasoning from PM/PO/etc that lead to these kinds of nightmares in the beginning... Then again, it will probably be another developer or system admins' nightmare when it becomes an issue.

      Even as an experience engineer, I still struggle setting the retention policy at the beginning of a new design... I'd love to hear any advice you have about how manage this incredibly important aspect?

      • rmbyrro 4 years ago

        In my previous experiences, it really boils down to the unit economics.

        If a given process generates $1 in revenue over a year, and it takes pennies for AWS services, that's a good sign your design is not going to break the company's pockets down the road.

        In some cases, it's not easy to narrow the unit economics so much, which adds uncertainty to your premises, and there might be market fluctuations that change the unit economics in the future. I try to anticipate which areas are most likely to change and think of a trade-off in terms of short term speed and flexibility to change later, if needed. Almost always they're a trade off.

  • _jal 4 years ago

    > we need a completely new experience for the next generation

    I mean, at some point, if you're (say) using some insane amount of storage, you're going to pay for that.

    I would agree that getting alerting right for billing-relevant events at whatever you're currently operating at should be a lot easier than it is. And I agree that there is a lot of room to baby-proof some of the less obvious mistakes that people frequently make, to better expose the consequences of some changes, etc.

    But the flip side is that infra has always been expensive, and vendors have always been more than happy to sell you far more than you need along with the next new shiny whatever.

    To the extent that these are becoming implicit decisions made by developers rather than periodic budgeted refresh events built by infra architects, developers need to take responsibility for understanding the implications of what they're doing.

  • jiggawatts 4 years ago

    My theory is that single-platform clouds actually make more sense than trying to be everything for everyone. While the latter can scale to $billions, the former might actually have higher margins because it delivers more value.

    An example might be something like a Kubernetes-only cloud driven entirely by Git-ops. Not TFVC, or CVS, or Docker Swarm, or some hybrid of a proprietary cloud and K8s. Literally just a Git repo that materialises Helm charts onto fully managed K8s clusters. That's it.

    If you try to do anything similar in, say, Azure, you'll discover that:

    Their DevOps pipelines are managed by a completely separate product group and doesn't natively integrate into the platform.

    You now have K8s labels and Azure tags.

    You now have K8s logging and Azure logging.

    You now have K8s namespaces and Azure resource groups.

    You now have K8s IAM and Azure IAM.

    You now have K8s storage and Azure disks.

    Just that kind of duplication of concepts alone can take this one system's complexity to a level where it's impossible for a pure software development team to use without having a dedicated DevOps person!

    Azure App Service or AWS Elastic Beanstalk are similarly overly complex, having to bend over backwards to support scenarios like "private network integration". Yeah, that's what developers want to do, carve up subnets and faff around with routing rules! /s

    For example, if you deploy a pre-compiled web app to App Service, it'll... compile it again. For compatibility with a framework you aren't using! You need a poorly documented environment variable flag to work around this. There's like a dozen more like this and clocking up so fast.

    Developers just want a platform they can push code to and have it run with high availability and disaster recovery provided as-if turning on a tap.

  • ignoramous 4 years ago

    > It's the users fault for misusing the service.

    I believe, AWS' usage-based billing make for long-tail surprises because its users are designing systems exactly as one would expect them to. For example, S3 is never meant for a bazillion small objects which Kinesis Firehose makes it easy to deliver to it. In such cases, dismal retrieval performance aside [0], the cost to list/delete dominate abnormally.

    We spin up a AWS Batch job every day to coalesce all S3 files ingested that day into large zlib'd parquets (kind of reverse VACCUM as in postgres / MERGE as in elasticsearch). This setup is painful. I guess the lesson here is, one needs to architect for both billing and scale, right from the get go.

    [0] https://news.ycombinator.com/item?id=19475726

    • chrisjc 4 years ago

      Perhaps I don't fully understand the nuances of what you're trying to do, but...

      > S3 is never meant for a bazillion small objects which Kinesis Firehose makes it easy to deliver to it

      Are you saying Firehose increases the likelihood of creating the "small file problem"?

      If so, isn't this exactly what Firehose tries to prevent? Sure, you can set all the thresholds low and unnecessarily generate lots of small files, but you can tune those thresholds to maximize record/file size and attain a reasonable latency. If there's a daily batch job to make this data useful, then who cares about latency?

      Also, why would you run a daily batch job to coalesce all these files into parquet files instead of letting Firehose just do that for you. It can also do a certain amount of partitioning if it's required.

      • ignoramous 4 years ago

        > Are you saying Firehose increases the likelihood of creating the "small file problem"?

        Firehose makes it easy to do so (when the thresholds are too low, as you point out). That is, it'd happily chug along and do what you ask of it to. Sometimes, these problems only manifest in the long run (kind of like a frog in boiling water).

        > Also, why would you run a daily batch job to coalesce all these files into parquet files instead of letting Firehose just do that for you.

        Firehose recommends that the output be at least 64M to 128M for parquet files... we don't have anywhere near that much amount of data to yeet out of Firehose, especially because data is partitioned per-user (and a single user doesn't generate anywhere near that much data, and so we're left with the current setup). And so: It was either to let Firehose batch the data up in larger parquets (and run the partitioning job offline), or employ its partitioning magic online (and run the merge job offline, on-demand). We chose the latter for cost efficiency given our workloads.

  • deanCommie 4 years ago

    HackerNews loves to criticize the cloud. It always reminds me of this infamous Dropbox comment: https://news.ycombinator.com/item?id=9224

    The cloud abstracts SO MUCH complexity from the user. The fact that people are then gleefully taking these "simple" services and overloading them with way too much data, and way too much complexity on top is not a failure of the underlying primitives, but a success.

    Without these cloud primitives, the people footgunning themselves with massive bills would just not have a working solution AT ALL.

    • jiggawatts 4 years ago

      > The people footgunning themselves with massive bills would just not have a working solution AT ALL.

      Sometimes guard rails are a good thing, and the AWS philosophy has very firmly been against guard rails, especially related to spending. The issue has come up here again and again that AWS refuses to add cost limits, even though they are capable of it. Azure copied this limitation. I don't mean that they didn't implement cost limits. They did! The Visual Studio subscriber accounts have cost limits. I mean that they refused to allow anyone to use this feature in PayG accounts.

      Let me give you a practical example: If I host a blog on some piece of tin with a wire coming out of it, my $/month is not just predictable, but constant. There's a cap on the outbound bandwidth, and a cap on compute spending. If my blog goes viral, it'll slow down to molasses, but my bank account will remain unmolested. If a DDoS hits it, it'll go down... and then come back up when the script kiddie get bored and move on.

      Hosting something like this on even the most efficient cloud-native architecture possible, such as a static site on an S3 bucket or Azure Storage Account is wildly dangerous. There is literally nothing I can do to stop the haemorrhaging if the site goes popular.

      Oh... set up some triggers or something... you're about to say, right? The billing portal has a multi-day delay on it! You can bleed $10K per hour and not have a clue that this is going on.

      And even if you know... then what? There's no "off" button! Seriously, try looking for "stop" buttons on anything that's not a VM in the public cloud. S3 buckets and Storage Accounts certainly don't have anything like that. At best, you could implement a firewall rule or something, but each and every service has a unique and special way of implementing a "stop bleeding!" button.

      I don't have time for this, and I can't wear the risk.

      This is why the cloud -- as it is right now -- is just too dangerous for most people. The abstractions it provides aren't just leaky, the hole has razor-sharp edges that has cut the hands of many people that think that it works just like on-prem but simpler.

      • saimiam 4 years ago

        > There is literally nothing I can do to stop the haemorrhaging if the site goes popular.

        There’s WAF with rate based limiting to prevent script kiddies for randomly hitting your URLs for files to download and run up your egress prices. Waf costs $5/month plus a flat fee per extra rule.

        For DDOS protection there’s Shield which is built into cloudfront and should be enough for most people but if you need more control they have Shield Advanced.

        The “Stop Button” for s3 is an application layer responsibility, imho though S3 Should make clean up easier.

        • jiggawatts 4 years ago

          Awesome. So I should spend more money to protect myself from flaws in Amazon's billing model with a service that I don't need for static file serving.

          This kind of "blame the user" thinking is why I avoid the cloud for my own use, and can't recommend it for most customers unless they have a specific reason.

          • xboxnolifes 4 years ago

            Think of it the other way. Instead of it being "it costs more to have the safety features", it's "it costs less if you don't need the safety features".

            If you want to spend the absolute bare minimum price,you get the bare minimum service.

          • saimiam 4 years ago

            > blame the user

            Not sure how this is blame the user. If you are setting up a bare metal server for a client and they don't ask you for (say) DDOS protection, will you still set up a DDOS protection protocol for them? I would think not since most people would try to match what a client asks for and maybe throw in some freebies.

            If after that, they get hit by DDOS, the onus is on them to have told you to plan ahead for it and knowing this is not "blame the user'.

            This is exactly what AWS is also offering - a basic setup and extra bells and whistles to protect yourself from possible issues based on your threat model.

            Maybe I'm missing something in your response.

            • jiggawatts 4 years ago

              There are two kinds of outcomes from a DDoS:

              1. an outage, which in reality is just an inconvenience, not the end of the world, unlike what most IT people seem to think.

              2. a bill that can bankrupt you, which may as well be the end of the world for many people or small businesses. It can be literally "game over".

              A bare metal box doesn't need protection from the 2nd risk. Its costs are fixed, irrespective of the amount of traffic attempting to hit it. A 100 Mbps link can't put out more than 100 Mbps, so even if you're charged by the terabyte of egress, there's a cost ceiling integrated into the hardware itself.

              The cloud generally has no such limits, or much, much higher ones than is typically desirable.

              Okay, here's another random example your WAF will not protect you from: cloud-hosted DNS.

              The bare metal scenario is a box sitting on the end of a 1 Gbps Ethernet link. If attacked by some crazy UDP DNS flood attack, it could probably saturate that pipe and send out... 1 Gbps. On a fixed-cost-link plan this costs $0.00 additional money. You might have an outage, or merely a brown-out, but you won't see a cent added to your next bill.

              On Azure's DNS Zones service, there's no "1 Gbps" pipe to rate limit them. They have infrastructure deployed globally, typically with 100 Gbps links. In practice, the DNS server probably only gets about 10 Gbps per region, but there's many regions. At 100 bytes per packet, you could be looking at a billion requests per second billed to your account, at an eyewatering $200/s or $720K/hour. Ouch!

              Now, Azure will probably forgive that bill because it's clearly an attack.

              But what if it isn't clearly an attack? Application Insights by design puts the Instrumentation Key into client-side JavaScript. It charges $3/GB on ingress! It's trivial to charge someone thousands or tens of thousands of dollars before they notice, and then they'd have a hard time convincing support that the traffic wasn't legitimate.

              I can send a terabyte out for cents, each of which would cost some poor fool $3,000.

              Good luck plugging every such hole, monitoring every alert (there's literally tens of thousands of metrics to alert on), and keeping up with every spike in billing that's a day late reporting on costs that can ramp up to thousands of dollars per minute.

              • saimiam 4 years ago

                An outage is a pretty big reputational risk imho - if I'm a startup offering a SaaS and being on Hackernews' front page hugs my site to death, I'm probably losing some potential conversions and have to answer questions about the solidity and longevity of my infra. It's an inconvenience in the moment but erodes trust over time. Unless you're Twitter and your fail whales become memes, being hugged to death is a bad thing.

                I agree that the downside of scaling is the risk of running up huge bills. But the safety net to prevent the run-up is literally a monthly flat fee - $5+ $1/WAF rule. Also, you don't have to monitor every alert - just the common ones. If I had to build a comparable alerting system on bare metal, I'd go crazy.

                To me, the flexibility of the cloud is worth the trade off.

                > WAF will not protect you from...UDP..

                Don't think WAF is the tool to protect against UDP layer attacks. Shield (which is available standard) already handles this.

                To be clear, I have not had to deal with DDOS attacks. We once had to deal with was someone repeatedly downloading a 1mb gif from our website which led to big egress fees. WAF's rate based rules but an end to that nonsense.

              • tekknik 4 years ago

                So let’s say they implement cost limits, how does this work? When you reach the limits does it delete all your resources? Idle them then force AWS to pay the bill by keeping those resources out of rotation for others? Someone has to pay for those resources while they’re used and most don’t want their database tables dropped because they got popular and hit their spend limit.

      • _alex_ 4 years ago

        What does an automatic cost limit look like when you have metered storage services? Start killing customer data?

    • prewett 4 years ago

      A surprise massive bill can be worse than no solution at all, in my opinion. And if the easy path leads to massive-bill lock-in, that's also not very helpful. It's not like people didn't know how to run servers and remote storage before AWS showed up. Before AWS showed up at least your data-center costs were pretty predictable: you managed the servers yourself, so whatever the salaries summed to, that was it. It's not like poor programming ate up your years' IT budget by May.

      • tekknik 4 years ago

        But you did waste money through capacity planning. As you either have the exact quantity of servers needed, or you have some sitting there idle. Then there’s swapping out older/failing hardware. In fact your DC calculations are rather difficult to get accurate.

  • 015a 4 years ago

    I agree 110%.

    Actually, I disagree with one statement: "AWS was built for a specific purpose and demographic of user". AWS wasn't built for anyone. It was built for everyone, and is thus even reasonably productive for no one. AWS's entire product development methodology is "customer asks for this, build it"; there's no high level design, very few opinions, five different services can be deployed to do the same thing, it's absolute madness and getting worse every year. Azure's methodology is "copy whatever AWS is doing" (source: engineers inside Azure), so they inherit the same issues, which makes sense for Microsoft because they've always been an organization gone mad.

    If there's one guiding light for Big Cloud, its: they're built to be sold to people who buy cloud resources. I don't even feel this is entirely accurate, given that this demographic of purchaser should at least, if nothing else, be considerate of the cost, and there's zero chance of Big Cloud winning that comparison without deceit, but if there was a demographic that's who it'd be.

    > I'd argue, we need a completely new experience for the next generation.

    Fortunately, the world is not all Big Cloud. The work Cloudflare is doing between Workers & Pages represents a really cool and productive application environment. Netlify is another. Products like Supabase do a cool job of vendoring open source tech with traditional SaaS ease-of-use, with fair billing. DigitalOcean is also becoming big in the "easy cloud" space, between Apps, their hosted databases, etc. Heroku still exists (though I feel they've done a very poor job of innovating recently, especially in the cost department).

    The challenge really isn't in the lack of next-gen PaaS-like platforms; its in countering the hypnosis puked out by Big Cloud Sales in that they're the only "secure" "reliable" "whatever" option. This hypnosis has infected tons of otherwise very smart leaders. You ask these people "lets say we are at four nines now; how much are you willing to pay, per month, to reach five nines? and remember Jim, four-nines is one hour of downtime a year." No one can answer that. No one.

    End point being: anyone who thinks Big Cloud will reign supreme forever hasn't studied history. Enterprise contracts make it impossible for them to clean the cobwebs from their closets. They will eventually become the next Oracle or IBM, and the cycle repeats. It's not an argument to always run your own infra or whatever; but it is an argument to lean on and support open source.

    • jiggawatts 4 years ago

      > Azure's methodology is "copy whatever AWS is doing" (source: engineers inside Azure), so they inherit the same issues, which makes sense for Microsoft because they've always been an organization gone mad.

      I guess this, but it's funny to see it confirmed.

      I got suspicious when I realised Azure has many of the same bugs and limitations as AWS despite being supposedly completely different / independent.

  • inopinatus 4 years ago

    That’s just it, though: it isn’t an AWS horror story. It’s the sorcerer’s apprentice.

rizkeyz 4 years ago

I did the back-of-the-envelope math once. You get a Petabyte of storage today for $60K/year if you buy the hardware (retail disks, server, energy). It actually fits into the corner of a room. What do you get for $60K in AWS S3? Maybe a PB for 3 months (w/o egress).

If you replace all your hardware every year, the cloud is 4x more expensive. If you manage to use your getto-cloud for 5 year, you are 20x cheaper than Amazon.

To store one TB per person on this planet in 2022, it would take a mere $500M to do that. That's short change for a slightly bigger company these days.

I guess by 2030 we should be able to record everything a human says, sees, hears and speaks in an entire life for every human on this planet.

And by 2040 we should be able to have machines learning all about human life, expression and intelligence to slowly making sense of all of this.

  • arein3 4 years ago

    >I guess by 2030 we should be able to record everything a human says, sees, hears and speaks in an entire life for every human on this planet.

    That's a very good point.

    Are you employed?

    Would you like to join Meta?

  • gmiller123456 4 years ago

    I don't get what's going on with on-line storage. You can walk in Best Buy and get a few Tb hard drive for well under $100. Yet every cloud service wants to charge you several times that per year for just 1Tb. I understand drives fail, there's operating cost, and some need extremely low latency. But there seems to be a huge disparity between what a hard drive costs, and what it costs to make it available on the Internet.

    • tekknik 4 years ago

      There’s a difference between a consumer drive and a server drive. Plop that $100 drive in and you may be back in a week or so replacing it.

      • gmiller123456 4 years ago

        Why would you think a drive automatically looses lifespan just because the computer it's in is referred to as a server? Many of my desktop hard drives see more activity that some of my website HDs.

jwalton 4 years ago

Your website renders as a big empty blue page in Firefox unless I disable tracking protection (and in my case, since I have noscript, I have to enable javascript for "website-files.com", a domain that sounds totally legit).

  • Sophira 4 years ago

    The problem is that the DIV that contains the main text has the attribute 'style="opacity:0"'. Presumably, this is something that the JavaScript turns off.

    A lot of sites like to do things like this for some reason. I haven't figured out why. I like to use Stylus to mitigate these if I can, rather than enabling JavaScript.

    • acdha 4 years ago

      This is a common anti-pattern — I believe they're trying to ensure that the web fonts have loaded before the text displays but it's really annoying for mobile users since it can add up to 2.5 seconds (their timeout) to the time before you can start reading unless you're using reader mode at which point it renders almost instantly.

    • MattRix 4 years ago

      The page animates in. I have no idea why it does, but it does, which explains why the opacity starts at 0%.

    • ectopod 4 years ago

      A lot of these sites (including this one) do work in reader view.

    • test1235 4 years ago

      mitigation for a flash of unstyled content (FOUC) maybe?

  • mst 4 years ago

    I have tracking protection and ublock origin both enabled and it rendered fine (FF on Win10).

    (presented as a data point for any poor soul trying to replicate your problem)

  • tazjin 4 years ago

    Chrome with uBlock Origin on default here, and it renders a big blue empty page for me, too. That's despite dragging in an ungodly amount of assets first.

    Here's an archive link that works without any tracking, ads, Javascript etc.: https://archive.is/F5KZd

  • moffkalast 4 years ago

    Noscript breaking websites? Who woulda thunk.

    How do you manage to navigate the web with that on by default? It breaks just about everything since nothing is a static site these days.

cj 4 years ago

Off topic: for people with a "million billion" objects, does the S3 console just completely freeze up for you? I have some large buckets that I'm unable to even interact with via the GUI. I've always wondered if my account is in some weird state or if performance is that bad for everyone. (This is a bucket with maybe 500 million objects, under a hundred terabytes)

  • albert_e 4 years ago

    I suggest you raise a support ticket.

    AFAIK there is server-side paging implemented in the List* API operations that the Console UI should be using so that the number of objects in a bucket should not significantly impact the webpage performance.

    But who knows what design flaws lurk beneath the console.

    Curious to know what you find.

    Does it happen only on opening heavy buckets? or the entire S3 console? Different Browser / incognito / different machine ...dont make a difference?

  • base698 4 years ago

    Yes, and sometimes even listing can take days.

    I worked somewhere that a person decided using Twitter Firehose was a good idea for S3. Keyed by tweet per file.

    Ended up figuring out a way to get them in batches and condense. Ended up costing about $800 per hour to fix coupled with lifecycle changes they mentioned.

    • orf 4 years ago

      > Yes, and sometimes even listing can take days.

      You have a versioned bucket with a lot of delete markers in it. Make sure you've got a lifecycle policy to clean them up.

    • properdine 4 years ago

      Doing an S3 object inventory can be a lifesaver here!

  • CobrastanJorji 4 years ago

    I'm curious. If you have a bucket with perhaps half a billion objects, what is the use case that leads you to wanting to navigate through it with a GUI? Are you perhaps trying to go through folders with dates looking for a particular day or something?

  • hakube 4 years ago

    I have millions (about 16m PDF and text files) of objects and it's completely freezing

  • jq-r 4 years ago

    In my previous company we had around 15K instances in a EC2 region and the EC2 GUI was unusable if it was set on the "new gui experience" so we always had to use classic one. The new one would try to get all the details of them so once loaded it was fast. But to get there it would take many minutes or it would just expire. Don't know if they've fixed that.

  • twistedpair 4 years ago

    Honestly this is when most folks move to using their own dashboards, metrics, and tooling. The AWS GUIs were designed for small to moderate use cases.

    You don't peer into a bucket with a billion objects and ask for a complete listing, or accounting of bytes. There are tools and APIs for that.

    That's what I do with my thousands of buckets and billions of files (dashboards).

    • scapecast 4 years ago

      It's also the reason why some AWS product teams have started acquiring IDE- or CLI-type of start-ups. They don't want to be boxed in by the constraints of the AWS Console - which is run by a central team. For example, the Redshift team bought DataRow.

      Disclosure, co-founder here, we're building one of those CLIs. We started as an internal project at D2iQ (my co-founder Lukas commented further up), with tooling to collect an inventory of AWS resources and be able to search it easily.

      • wizwit999 4 years ago

        Product teams don't do acquisitions. And thats not why that acquisition happened.

  • kristjansson 4 years ago

    Just checked, out of curiosity. A bucket at $WORK with ~4B objects / ~100TB is completely usable through the console. Keys are hierarchal, and relatively deep, so no one page on the GUI is trying to show more than a few hundred keys. If your keys are flatter, I could see how the console be unhappy.

  • grumple 4 years ago

    Sort of related, I faced such an issue when I had a gui table that was triggering a count on a large object set via sql so it could display the little "1 to 50 of 1000000". This is presumably why services like google say "of many". Wonder if they have a similar issue.

  • liveoneggs 4 years ago

    the newer s3 console works a little better. It gives pagination with "< 1 2 3 ... >"

lloesche 4 years ago

I had a similar issue at my last job. Whenever a user created a PR on our open source project artifacts of 1GB size consisting of hundreds of small files would be created and uploaded to a bucket. There was just no process that would ever delete anything. This went on for 7 years and resulted in a multi-petabyte bucket.

I wrote some tooling to help me with the cleanup. It's available on Github: https://github.com/someengineering/resoto/tree/main/plugins/... consisting of two scripts, s3.py and delete.py.

It's not exactly meant for end-users, but if you know your way around Python/S3 it might help. I build it for a one-off purge of old data. s3.py takes a `--aws-s3-collect` arg to create the index. It lists one or more buckets and can store the result in a sqlite file. In my case the directory listing of the bucket took almost a week to complete and resulted in a 80GB sqlite.

I also added a very simple CLI interface (calling it virtual filesystem would be a stretch) that allows to load the sqlite file and browse the bucket content, summarise "directory" sizes, order by last modification date, etc. It's what starts when calling s3.py without the collect arg.

Then there is delete.py which I used to delete objects from the bucket, including all versions (our horrible bucket was versioned which made it extra painful). On a versioned bucket it has to run twice, once to delete the file and once to delete the then created version, if I remember correctly - it's been a year since I built this.

Maybe it's useful for someone.

  • coredog64 4 years ago
  • k__ 4 years ago

    What about the lifecycle stuff?

    I thought, S3 can move stuff to cheaper storage automatically after some time.

    • lloesche 4 years ago

      Like I wrote for us it was a one-off job to find and remove 6+ year old build artifacts that would never be needed again. I just looked for the cheapest solution of getting rid of them. I couldn't do it by prefix alone (prod files mixed in the same structure as the build artifacts) which is why delete.py supports patterns (the `--aws-s3-pattern` arg takes a regex).

      If AWS' own tools work for you it's surely the better solution than my scripts. Esp. if you need something on an ongoing bases.

ebingdom 4 years ago

I'm confused about prefixes and sharding:

> The files are stored on a physical drive somewhere and indexed someplace else by the entire string app/events/ - called the prefix. The / character is really just a rendered delimiter. You can actually specify whatever you want to be the delimiter for list/scan apis.

> Anyway, under the hood, these prefixes are used to shard and partition data in S3 buckets across whatever wires and metal boxes in physical data centers. This is important because prefix design impacts performance in large scale high volume read and write applications.

If the delimiter is not set at bucket creation time, but rather can be specified whenever you do a list query, how can the prefix be used to influence where objects are physically stored? Doesn't the prefix depend on what delimiter you use? How can the sharding logic know what the prefix is if it doesn't know the delimiter in advance?

For example, if I have a path like `app/events/login-123123.json`, how does S3 know the prefix is `app/events/` without knowing that I'm going to use `/` as the delimiter?

  • xyzzy_plugh 4 years ago

    The prefix isn't delimited, it's an arbitrary length based on access patterns.

    A fictitious example which is close to reality:

    In parallel, you write a million objects each to:

       tomato/red/...
       tomato/green/...
       tomatoes/colors/...
    
    The shortest prefixes that evenly divides writes are thus

       tomato/r
       tomato/g
       tomatoes
    
    If you had an existing access pattern of evenly writing to

       tomatoes/colors/...
       bananas/...
    
    The shortest prefixes would be

       t
       b
    
    So suddenly writing 3 million objects that begin with a t would cause an uneven load or hotspot on the backing shards. The system realizes your new access pattern and determines new prefixes and moves data around to accommodate what it thinks your needs are.

    --

    The delimiter is just a wildcard option. The system is just a key value store, essentially. Specifying a delimiter tells the system to transform delimiters at the end of a list query like

       my/path/
    
    into a pattern match like

       my/path/[^/]+/?
    • stepchowfun 4 years ago

      Thank you! This is the first explanation that I think fully explains what I was confused about. So essentially the prefix is just the first N bytes of the object's name, where N is a per-bucket number that S3 automatically decides and adjusts for you. And it has nothing to do with delimiters.

      I find the S3 documentation and API to be really confusing about this. For example, when listing objects, you get to specify a "prefix". But this seems to be not directly related to the automatically-determined prefix length based on your access patterns. And [1] says things like "There are no limits to the number of prefixes in a bucket.", which makes no sense to me given that the prefix length is something that S3 decides under the hood for you. Like, how do you even know how many prefixes your bucket has?

      [1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...

      • xyzzy_plugh 4 years ago

        The sharding key is an implementation detail, so you're not supposed to care about it too much.

        • kristjansson 4 years ago

          That's true now. Used to be the case that they'd recommend random or high-entropy parts of the keys go at the beginning to avoid overloading a shard as you described above.

          From [0]:

          > This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications. This improvement is now available in all AWS Regions. For more information, visit the Amazon S3 Developer Guide.

          [0]: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3...

      • inopinatus 4 years ago

        It is related, in the sense both “prefixes” are a substring match anchored at the start of the object name. They’re just not the same mechanism.

    • chrisjc 4 years ago

      > So suddenly writing 3 million objects that begin with a t would cause an uneven load or hotspot on the backing shards.

      makes sense

      > The system realizes your new access pattern and determines new prefixes and moves data around to accommodate what it thinks your needs are.

      What does "determines new prefixes" mean? Obviously AWS isn't going to come up with new prefixes and change object names.

      So does AWS maintain prefix-surrogates (prefix sub-string(0,?) references) and those are what actually gets shuffled around to handle the new unbalanced workload? Sort of like resharding?

      Moreover, since it's really prefix-surrogates being used, the recommendation of randomizing prefixes can be replace with randomizing prefix-surrogates and delegated to AWS, removing the prior responsibility from the customer. Hence the 2018 announcement https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3...

  • inopinatus 4 years ago

    There’s no delimiter. There is only the appearance of a delimiter, to appease folks who think S3 is a filesystem, and fool them into thinking they’re looking at folders.

    The object name is the entire label, and every character is equally significant for storage. When listing objects, a prefix filters the list. That’s all. However, S3 also uses substrings to partition the bucket for scale. Since they’re anchored at the start, they’re also called prefixes.

    In my view, it’s best to think of S3’s object indexing as a radix tree.

    This article, as if you couldn’t guess from the content, is written from a position of scant knowledge of S3, not surprising it misrepresents the details.

    • ebingdom 4 years ago

      So if I have a bunch of objects whose names are hashes like 2df6ad6ca44d06566cffde51155e82ad0947c736 that I expect to access randomly, is there any performance benefit to introducing artificial delimiters like 2d/f6/ad6ca44d06566cffde51155e82ad0947c736? I've seen this used in some places.

      • dale_glass 4 years ago

        To AWS S3, '/' isn't a delimiter, it's a character that's part of the filename.

        So for instance "/foo/bar.txt" and "/foo//bar.txt" are different files in S3, even though they'd be the same file in a filesystem.

        This gets pretty fun if you want to mirror a S3 structure on-disk, because the above suddenly causes a collision.

      • elcomet 4 years ago

        No difference other than readability. And amazon may distribute your application with another prefix anyway, like "2d/f6/ad6c"

      • jstarfish 4 years ago

        I don't know what impact that partitioning pattern has on s3, but it has some obvious benefits if your app needs to revert to write to a normal filesystem instead (like for testing).

    • charcircuit 4 years ago

      >There’s no delimiter.

      What's the delimiter parameter for then?

      https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObje...

      • nightpool 4 years ago

        To provide a consistent API response as part of the ListObjects call. It has nothing to do with the storage on disk.

      • inopinatus 4 years ago

        To help you fool yourself. It affects how object list results are presented in the api response.

        • throwhauser 4 years ago

          "To help you fool yourself" seems like a euphemism for "to fool you". It's gotta be tough to go from "scant knowledge of S3" to genuine knowledge if the documentation is doing this to you.

          If the docs are misrepresenting the details, who can blame the author of the post?

          • inopinatus 4 years ago

            The documentation is very clear on the purpose of the delimiter parameter.

            The OP does not read the docs, makes bad assumptions repeatedly throughout, and then reaps the consequences.

        • ec109685 4 years ago

          They can’t present a directory abstraction for list operations without a delimiter. E.g. CommonPrefixes.

  • twistedpair 4 years ago

    This is where GCP's GCS (Google Cloud Storage) shines.

    You don't need to mess with prefixing all your files. They auto level the cluster for you [1].

    [1] https://cloud.google.com/storage/docs/request-rate#redistrib...

  • korostelevmOP 4 years ago

    AWS does the optimizations over time based on access patterns for the data. Should have made that clearer in the article.

    The problem becomes unusual burst load - usually from infrequent analytics jobs. The indexing cant respond fast enough.

    • ebingdom 4 years ago

      Thanks for the clarification. But now I'm confused about the limits:

      > 3,500 PUT/COPY/POST/DELETE requests per second per prefix

      > 5,500 GET/HEAD requests per second per prefix

      Most of those APIs don't even take a delimiter. So for these limits, does the prefix get inferred based on whatever delimiter you've used for previous list requests? What if you've used multiple delimiters in the past?

      Basically what I'm trying to determine is whether these limits actually mean something concrete (that I can use for capacity planning etc.), or whether their behavior depends on heuristics that S3 uses under the hood.

      I'm fine with S3 optimizing things under the hood based on access my patterns, but not if it means I can't reason about these limits as an outsider.

      • Macha 4 years ago

        S3 does a lot of under the hood optimisation. e.g. Create a brand new bucket, leave it cold for a while, and start throwing 100 PUT requests a second at it. This is way less than the advertised 3500, but they'll have scaled the allocated resources down so much you'll get some TooManyRequests errors.

      • acdha 4 years ago

        Those are what I would assume for performance when the system is stable. The concerns come from bursty behaviour — for example, if you put something new into production you might have a period of time while S3 is adjusting behind the scenes where you'll get transient errors from some operations before it stabilizes (these have almost always been resolved by retry in my experience). This is reportedly something your AWS TAM can help with if you know in advance that you're going to need to handle a ton of traffic and have an idea of what the prefix distribution will be like — apparently the S3 support team can optimize the partitioning for you in preparation.

      • ec109685 4 years ago

        Delimiter isn’t used for writes, only list operations.

        S3 simply looks at the common string prefixes in your object names and uses that to internally shard objects, so you can achieve a multiple of those request limits.

        aaa122348

        aaa484585

        bbb484858

        bbb474827

        Would have same performance as:

        aaa/122348

        aaa/484585

        bbb/484858

        bbb/474827

zmmmmm 4 years ago

The rationale for using cloud is so often that it saves you from complexity. It really undermines the whole proposition when you find out that the complexity it shields you from is only skin deep, and in fact you still need a "PhD in AWS" anyway.

But as a bonus, now you face huge risks and liabilities from single button pushes and none of those skills you learned are transferrable outside of AWS so you'll have to learn them again for gcloud, again for azure, again for Oracle ....

pontifier 4 years ago

DON'T PRESS THAT BUTTON.

The egress and early deletion fees on those "cheaper options" killed a company that I had to step in and save.

  • pphysch 4 years ago

    On a related note, suppose the Fed raises rates to mitigate inflation and indirectly kills thousands of zombie companies, including many SaaS renting the cloud. What happens to their data? Does the cloud unilaterally evict/delete it, or does it get handled like an asset -- auctioned off, etc?

    • cmckn 4 years ago

      I’m not aware of a cloud provider that is contractually allowed to do such a thing (except maybe alibaba by way of the CCP). Dying companies get purchased and have their assets pilfered every day, the same thing would happen with cloud assets.

      • bpicolo 4 years ago

        If the dead company stops paying the bills, Amazon can definitely delete that.

        • cmckn 4 years ago

          Of course, I meant the idea of auctioning it off or otherwise accessing the customer’s data.

          When I worked in Azure, I accidentally created some internal resources in a personal account. I didn’t have the ability to delete them after I left; the only way to do so was to cancel the credit card and let the grace period expire.

    • Uehreka 4 years ago

      > does it get handled like an asset -- auctioned off, etc?

      Who would buy that? I guess if this happened enough then people would start "data salvager" companies that specialize in going through data they have no schema for looking for a way to sell something of it to someone else. I have to imagine the margins in a business like that would be abysmal, and all the while you'd be in a pretty dark place ethically going through data that users never wanted you to have in the first place.

      Of course, all these questions are moot because if this happened the GDPR would nuke the cloud provider from orbit.

  • Aeolun 4 years ago

    If they were already paying 100k per month for their storage, I doubt the additional 100k would severely impact their business.

    Proven by the fact that they happily went on to pay the bill for the next 6 months.

Tehchops 4 years ago

We’ve got data in S3 buckets not nearly at that scale and managing them, god forbid trying a mass delete, is absolute tedium.

  • amelius 4 years ago

    Mass delete also takes an eternity on my Linux desktop machine.

    The filesystem is hierarchical, but the delete operation still needs to visit all the leaves.

    • sokoloff 4 years ago

      Is S3 actually hierarchical? I always took the mental model that the S3 object namespace within a bucket was flat and the treatment of ‘/‘ as different was only a convenient fiction presented in the tooling, which is consistent with the claim in this article.

      • cle 4 years ago

        This is mostly correct, with the additional feature that S3 can efficiently list objects by "key prefix" which helps preserve the illusion.

        • sokoloff 4 years ago

          Followup question: Is there something special about the PRE notations in the example output below? I can list objects by any textual prefix, but I can't tell if the PRE (what we think of as folders) is more efficient than just the substring prefix.

          Full bucket list, then two text prefix, then an (empty) folder list

            sokoloff@ Downloads % aws s3 ls s3://foo-asdf            
                                       PRE bar-folder/
                                       PRE baz-folder/
            2022-02-17 09:25:38          0 bar-file-1.txt
            2022-02-17 09:25:42          0 bar-file-2.txt
            2022-02-17 09:25:57          0 baz-file-1.txt
            2022-02-17 09:25:49          0 baz-file-2.txt
            sokoloff@ Downloads % aws s3 ls s3://foo-asdf/ba
                                       PRE bar-folder/
                                       PRE baz-folder/
            2022-02-17 09:25:38          0 bar-file-1.txt
            2022-02-17 09:25:42          0 bar-file-2.txt
            2022-02-17 09:25:57          0 baz-file-1.txt
            2022-02-17 09:25:49          0 baz-file-2.txt
            sokoloff@ Downloads % aws s3 ls s3://foo-asdf/bar
                                       PRE bar-folder/
            2022-02-17 09:25:38          0 bar-file-1.txt
            2022-02-17 09:25:42          0 bar-file-2.txt
            sokoloff@ Downloads % aws s3 ls s3://foo-asdf/bar-folder
                                       PRE bar-folder/
          • jrochkind1 4 years ago

            I don't understand the answer to that question either. Other AWS docs says you can choose whatever you want for a delimiter, there's nothing special about `/`. So how does that apply to what they say about performance and "prefixes"?

            Here is some AWS documentation on it:

            https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...

            > For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.

            Related to your question, even if we just stick to `/` because it seems safer, does that mean that "foo/bar/baz/1/" and "foo/bar/baz/2/" are two prefixes for the point of these request speed limits? Or does the "prefix" stop at the first "/" and files with these keypaths are both in the same "prefix" "foo/"?

            Note there was (according to docs) a change a couple years ago that I think some people haven't caught on to:

            > For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes.

          • jsmith45 4 years ago

            Umm... that output seems confusing.

            The ListObjects api will omit all objects that share a prefix that ends in the delimiter, and instead put said prefix into the CommonPrefix element, which would be reflected as PRE lines. (So with a delimiter of '/', it basically hides objects in "subfolders", but lists any subfolders that match your partial text in the CommonPrefix element).

            By default `aws s3 ls` will not show any objects within a CommonPrefix but simply shows a PRE line for them. The cli does not let you specify a delimiter, it always uses '/'. To actually list all objects you need to use `--recursive`.

            The output there would suggest that bucket really did have object names that began with `bar-folder/`, and that last line did not list them out because you did not include the trailing slash. Without the trailing slash it was just listing objects and CommonPrefixes that match the string you specified after the last delimiter in your url. Since only that one common prefix matched, only it was printed.

    • res0nat0r 4 years ago

      Use the delete-objects instead and it will be much faster, as you can supply up to 1000 keys to remove per a single API call.

      https://awscli.amazonaws.com/v2/documentation/api/latest/ref...

    • the8472 4 years ago

      Most recursive deletion routines are not optimized for speed. This could be done much faster with multiple threads or batching the calls via io_uring.

      Another option are LVM or btrfs subvolumes which can be discarded without recursive traversal.

    • Too 4 years ago

      There are some tricks on Linux. For example using mv into a trash dir instead of rm. I’ve also seen some successful use of rsync that does real deletion many times faster than rm -rf, not sure why but guessing some parallelism is involved.

      Google for this problem. There are surprisingly many creative ideas, many which also surprisingly are a lot better than the built in rm command.

    • deepsun 4 years ago

      I believe it's mostly a problem of latency between your machine and S3. Since each Delete call is issued separately in its own HTTP connection.

      1. Try parallelization of your calls. Deleting 20 objects in parallel should take the same time as deleting 1.

      2. Try to run deletion from an AWS machine in the same region as the S3 bucket (yes buckets are regional, only their names are global). Within-datacenter latency should be lower than between your machine and datacenter.

    • amelius 4 years ago

      (This is a good example where Garbage Collection wins over schemes which track reference explicitly, like reference counting. A garbage collector can just throw away the reference, while other schemes need to visit every leaf resulting in hours of deletion time in some cases.)

  • anderiv 4 years ago

    Set a lifecycle rule to delete your objects. Come back a day later and AWS will have taken care of this for you.

  • lnwlebjel 4 years ago

    Very true: it took me about a month of emptying, deleting and life cycling about a dozen buckets of about 20 TB (~20 million objects) to get to zero.

pattycake23 4 years ago

Here's an article about Shopify running into the S3 prefix rate limit too many times, and tackling it: https://shopify.engineering/future-proofing-our-cloud-storag...

  • sciurus 4 years ago

    Their solution was to introduce entropy into the beginning of the object names, which used to be AWS's recommendation for how to ensure objects are placed in different partitions. AWS claims this is no longer necessary, although how their new design actually handles partitioning is opaque.

    "This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications."

    https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3...

    • pattycake23 4 years ago

      Seems like it's a much higher rate limit, but it exists none the less, and Shopify's scale has also grown significantly since 2018 (when that article was written) - so it was probably a valid way for them to go.

      • sciurus 4 years ago

        I think two things happened that are covered in that blog post

        1) The performance per partition increased

        2) The way AWS created partitions changed

        When I was at Mozilla, one thing I worked on was Firefox's crash reporting system. It's S3 storage backend wrote raw crash data with the key in the format `{prefix}/v2/{name_of_thing}/{entropy}/{date}/{id}`. If I remember correctly, we considered this a limitation since the entropy was so far down in the key. However, when we talked to AWS Support they told us their was no longer a need to have the entropy early on; effectively S3 would "figure it out" and partition as needed.

        EDIT: https://news.ycombinator.com/item?id=30373375 is a good related comment.

wackget 4 years ago

As a web developer who has never used anything except locally-hosted databases, can someone explain what kind of system actually produces billions or trillions of files which each need to be individually stored in a low-latency environment?

And couldn't that data be stored in an actual database?

  • rgallagher27 4 years ago

    Things like mobile/webisite analytics events. User A clicked this menu item, User B viewed this images etc All streamed into S3 in chunks of smallish files.

    It's cheaper to store them in S3 over a DB and use tools like Athena or Redshift spectrum to query.

    • wackget 4 years ago

      Wow. What makes it cheaper than using a DB? Is it just because the DB will create some additional metadata about each stored row or something?

      • bpicolo 4 years ago

        S3 is often essentially a database in these scenarios. You store columnar data format files in S3, and various analytical systems can query with S3 as a massive backing storage.

  • gmiller123456 4 years ago

        And couldn't that data be stored in an actual database? 
    
    This is the "it's turtles all the way down" concept. A database is just going to store data in the file system, plus some extra overhead. Putting data in a database saves you nothing unless you actually need the extra functionality a database provides.

    That overhead doesn't mean much if you have 10 users and 1gb of data. But it adds up in very large systems.

  • abhishekjha 4 years ago

    An image service.

    • wackget 4 years ago

      Yeah that use-case I get. Binary files which would be difficult/impractical to index in a database.

      However it feels like something at that scale will only ever realistically be dealt with by enterprise-level software, and I'd hazard a guess that most developers - even those reading HN - are not working on enterprise-level systems.

      So I'm wondering what "regular devs" are using cloud buckets for at such a scale over regular DBs.

  • gnulinux 4 years ago

    My company gets sensor data from millions of devices and records. Happens all day, all around the word. It adds up. If you don't delete that data, it becomes petabytes. Thanks god GDPR et al exist so we have a good excuse to "need to delete this data boss".

wodenokoto 4 years ago

I've never been in this situation, but I do wish you could query files with more advanced filters on these blob storage services.

- But why SageMaker?

- Why do some orgs choose to put almost everything in 1 buckets?

  • tyingq 4 years ago

    >Why do some orgs choose to put almost everything in 1 buckets?

    The article seems to be making the case it's because the delimiter makes it seem like there's a real hierarchy. So the ramifications of /bucket/1 /bucket/2 versus /bucket1/ /bucket2/ aren't well known until it's too late.

    • charcircuit 4 years ago

      >So the ramifications of /bucket/1 /bucket/2 versus /bucket1/ /bucket2/ aren't well known until it's too late.

      What's the difference?

      • musingsole 4 years ago

        In the choice between a single bucket with hierarchical paths versus multiple buckets, there's a long list of nuances between either strategy.

        For the purposes of this article, you can probably have more intuitive, sensible lifecycle policies across multiple buckets than you can trying to set policies on specific paths within a single bucket. Something like "ShortLifeBucket" and "LongLifeBucket" would allow you to have items with similar prefixes (something like a "{bucket}/anApplication/file1.csv" in each bucket) that then have different lifecycle policies

        • 8note 4 years ago

          There's a lack of searchable blogs and recommendations for how many buckets you need, and how much stuff belongs in one.

          Got any recommended literature?

  • korostelevmOP 4 years ago

    For many at orgs like this, SageMaker is probably the shortest path to an insane amount of compute with a python terminal.

    Why single bucket? Once someone refers to a bucket as "the" bucket - it is how it will forever be.

  • akdor1154 4 years ago

    > But why SageMaker?

    You could ask the same thing of most times it gets used for ML stuff as well.

    > Why do some orgs choose to put almost everything in 1 buckets?

    Anecdote: ours does because we paid (Multinational Consulting Co)™ a couple of million to design our infra for us, and that's what the result was.

  • liveoneggs 4 years ago

    1 athena?

    2 some jobs make a lot of data

charcircuit 4 years ago

Can someone explain what happened in the end? From my understanding nothing happened (they deprioritizod the story for fixing it) and they are still blowing through the cloud budget.

  • snowwrestler 4 years ago

    They didn’t resolve the issue.

    There’s an important moment in the story, where they realize the fix will incur a one-time fee of $100,000. No one in engineering can sign off on that amount, and no one wants to try to explain it to non-technical execs.

    They don’t explain why. But it’s probably because they expect a negative response like “how could you let this happen?!” or “I’m not going to pay that, find another way to fix it.”

    In a lot of organizations it’s easier to live with a steadily growing recurring cost than a one-time fee… even if the total of the steady growth ends up much larger than the one-time fee!

    It’s not necessarily pathological. Future costs will be paid from future revenue; whereas a big fee has to be paid from cash on-hand now.

    But sometimes the calculation is not even attempted because of internal culture. When the decision is “keep your head down” instead of “what’s the best financial strategy,” that could hint at even bigger potential issues down the road.

    • hogrider 4 years ago

      Sounds more like non technical leadership sleeping at the wheel. I mean if they could just afford to lose money like this why bother with all that work to fix it?

  • seekayel 4 years ago

    How I read the article, nothing happened. I think it is a cautionary tale of why you should probably bite the bullet and press the button instead of doing the "easier" thing which ends up being harder and more expensive in the end.

vdm 4 years ago

DeleteObjects takes 1000 keys per call.

Lifecycle rules can filter by min/max object size. (since Nov 2021)

  • electroly 4 years ago

    Thank you for mentioning that lifecycle rule change. I must have missed the announcement; that is exactly the functionality I needed.

  • vdm 4 years ago

    Athena supports regexp_like(). By loading in an S3 inventory this can match what a wildcard would. Then a Batch Operations job can tag the result.

    Not easy, but is possible and effective.

Mave83 4 years ago

Just avoid the cloud. You get a Ceph storage with the performance of Amazon S3 at the price point of Amazon S3 Glacier in any Datacenter worldwide deployed if you want. There are companies that help you doing this.

Feel free to ask if you need help.

  • charcircuit 4 years ago

    You have to properly administrate those servers else you'll lose all your files and everything will be inaccessible.

  • red0point 4 years ago

    I want to know what the absolute cheapest way of doing this is, without having a lot of CapEx. I thought of renting dedicated storage servers (e.g. Hetzner) and slapping Ceph on them.

    Do you have another, better, idea?

valar_m 4 years ago

Though it doesn't address the problem in TFA, I recommend setting up billing alerts in AWS. Doesn't solve their issue, but they would have at least known about it sooner.

0x002A 4 years ago

Each time a developer does something on a cloud platform, that moment the platform might start to profit for two reasons: vendor lock-in and accrued costs in the long term regardless of the unit cost.

Anything limitless/easiest has a higher hidden cost attached.

StratusBen 4 years ago

On this topic, it's always surprising to me how few people even seem to know about different storage classes on S3...or even intelligent tiering (which I know carries a cost to it, but allows AWS to manage some of this on your behalf which can be helpful for certain use-cases and teams).

We did an analysis of S3 storage levels by profiling 25,000 random S3 buckets a while back for a comparison of Amazon S3 and R2* and nearly 70% of storage in S3 was StandardStorage which just seems crazy high to me.

* https://www.vantage.sh/blog/the-opportunity-for-cloudflare-r...

  • blurker 4 years ago

    I think that it's not just people not knowing about the lifecycle feature, but also that when they start putting data into a bucket they don't know what the lifecycle should be yet. Honestly I think overdoing lifecycle policies is a potentially bigger foot gun than not setting them. If you misuse glacier storage that will really cost you big $$$ quickly! And who wants to be the dev who deleted a bunch of data they shouldn't have?

    Lifecycle policies are simple in concept, but it's actually not simple to decide what they should be in many cases.

kondro 4 years ago

The minimum size of objects in cheaper storage types is 128KiB.

Given the article quotes $100k to run an inventory (and $100k/month in standard storage) it's likely most of your objects are smaller than 128KiB and so probably wouldn't benefit from cheaper storage options (although it's possible this is right on the cusp of the 128KiB limit and could go either way).

Honestly, if you have a $1.2m/year storage bill in S3 this would be the time to contact your account manager and try to work out what could be done to improve this. You probably shouldn't be paying list anyway if just the S3 component of your bill is $1.2m/year.

dekhn 4 years ago

I had to chuckle at this article because it reminded me of some of the things I've had to do to clean up data.

One time I had to write a special mapreduce that did a multiple-step-map to converted my (deeply nested) directory tree into roughly equally sized partitions (a serial directory listing would have taken too long, and the tree was really unbalanced to partition in one step), then did a second mapreduce to map-delete all the files and reduce the errors down to a report file for later cleanup. This meant we could delete a few hundred terabytes across millions of files in 24 hours, which was a victory.

cyanic 4 years ago

We solved the problem of deleting old files early in our development process, as we wanted to avoid situations such as this one.

While developing GitFront, we were using S3 to store individual files from git repositories as single objects. Each of our users was able to have multiple repositories with thousands of files, and they needed to be able to delete them.

To solve the issue, we implemented a system for storing multiple files inside a single object and a proxy which allows accessing individual files transparently. Deleting a whole repository is now just a single request to S3.

jopsen 4 years ago

One of the biggest pains is that cloud services rarely mention what they don't do.

I think it's really sad, because when I don't see docs clearly stating the limits, I assume the worst and avoid the service.

gfd 4 years ago

Does anyone have recommendations on how to compress the data (gzip or parquet).

zitterbewegung 4 years ago

I was at a presentation where HERE technologies told us that they went from being on the top ten (or top five) S3 users (by data stored) to getting off of that list. This was seen as a big deal obviously.

solatic 4 years ago

TL-DR: Object stores are not databases. Don't treat them like one.

  • throwaway984393 4 years ago

    Try telling that to developers; they love using S3 as both a database and a filesystem. It's gotten to the point where we need a training for new devs to tell them what not to do in the cloud.

    • mst 4 years ago

      Honestly a Frequently Delivered Answers training for new developers is probably one of the best things you can include in onboarding.

      Every environment has its footguns, after all.

    • hinkley 4 years ago

      Communicating through the filesystem is one of the Classic Blunders.

      It doesn't come up as often anymore since we generally have so many options at our fingertips, but when push comes to shove you will still discover this idea rattling around in people's skulls.

      • ijlx 4 years ago

        Classic Blunders:

        1. Never get involved in a land war in Asia

        2. Never go in against a Sicilian when death is on the line

        3. Never communicate through the filesystem

    • solatic 4 years ago

      You can either train them with a calm tutorial or you can train them with angry billing alerts and shared-pain ex-post-facto muckraking.

      I, for one, prefer the calm way.

    • Quarrelsome 4 years ago

      do you know if such sources exist publicly? I would be most interested in perusing recommended material on the subject.

  • wooptoo 4 years ago

    They're also _not_ classic hierarchical filesystems, but k-v stores with extras.

hughrr 4 years ago

For every $100k bill there’s a hundred of us with 14TB that costs SFA to roll with.

harshaw 4 years ago

AWS budgets is a tool for cost containment (among other external services).

gnutrino 4 years ago

Lol this post hits close to home.

gtirloni 4 years ago

A "TLDR" that is not.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection