Settings

Theme

Exploring performance differences between Amazon Aurora and vanilla MySQL

plaid.com

137 points by bjacokes 5 years ago · 20 comments

Reader

encoderer 5 years ago

This is a wonderful article. I recently discovered an aurora gem that saved me from some flakey mysqldump: you can save the results directly to s3 with a “select into outfile s3” query. This is, according to the docs, an optimized operation when run from a read replica.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

roopawl 5 years ago

Every once in a while there is a well written blog post about database internal. Uber's Postgres-MySql switch saga produced a few of them. This one is pretty good too

  • jeandenis 5 years ago

    We worked closely with AWS on this (problem and blog) and they were great and quite transparent. Glad it's interesting/useful to you.

slownews45 5 years ago

The simplest is probably read committed especially if like many ETL jobs you are just going to grab stuff using one read for further processing. Another option, do a read committed and omit last 15 minutes of data if you are doing long running jobs to avoid churn at end of tables / logs.

I see folks doing serializable reads for historic ETL jobs with one read in the transaction - why? Is there some history / tool issue I'm not familiar with?

  • bjacokesOP 5 years ago

    For Aurora MySQL, the default for read-only replicas is repeatable read. As we mentioned towards the end of the post, read committed support appears to have been introduced to Aurora MySQL just last year. But you're right – now that it's supported, switching to read committed is by far the easiest fix.

    No idea why people would be using serializable reads for ETL jobs though! :O

    • slownews45 5 years ago

      My own guess was that some ETL jobs were really data integrity jobs - in which case folks got used to higher levels of isolation being necessary across many reads to avoid false positives on their cross check stuff maybe.

whs 5 years ago

We had similar problem where a running ETL job caused a production outage due to binlog pressure.

One thing that surprised us that our TAM says that on a 1 AZ write-heavy workload normal MySQL would have higher performance as Aurora synchronously write to storage servers in other AZs. On immediate read-after-write workload that would mean it would take longer time to acquire lock.

  • frakkingcylons 5 years ago

    > One thing that surprised us that our TAM says that on a 1 AZ write-heavy workload normal MySQL would have higher performance as Aurora synchronously write to storage servers in other AZs

    What is surprising about a multi-AZ database having higher latency than one that runs in only one AZ?

    • bjacokesOP 5 years ago

      From what I can tell, they provisioned their DB instance(s) in a single AZ, but weren't aware that Aurora automatically provisions its own storage and always uses multiple AZs. We touch on the separation of compute and storage in the post.

      I think the surprise is that it's not possible to have a truly "single AZ" Aurora database, even though you might have thought you provisioned your DB instances that way.

      • frakkingcylons 5 years ago

        I see. I haven’t used Aurora, but have had experience running write heavy workloads on RDS. EBS failures would regularly (like monthly) cause our write latency to spike up 3-5x. If Aurora’s storage layer architecture is more resilient to those types of problems, that seems like a huge win.

      • goeiedaggoeie 5 years ago

        Should not be a surprise if you are using Aurora hopefully. Papers on the topic are very clear on how they scale the storage.

  • bjacokesOP 5 years ago

    This seems plausible given our understanding of the database internals. In general we found our AWS contacts to be knowledgeable and forthcoming about complex tradeoffs between Aurora and vanilla MySQL, even if some of that information is hard or impossible to find in the docs.

georgewfraser 5 years ago

I wonder why Aurora shares undo logs between replicas? It’s perfectly possible for the read replicas to each re-create their own copy of the undo logs, and retain those undo logs for different durations based on the different long-running queries on each replica.

user3939382 5 years ago

It seems like the benefits of cloud infrastructure have normalized vendor lock-in.

I’ve never used Aurora because I don’t want to code anything to the idiosyncrasies of AWS (or any other cloud provider).

  • sofixa 5 years ago

    Aurora has compatibility layers and you interact with it as with normal MySQL, MariaDB or PgSQL. Of course there are some underlying differences, but the code and most of the tooling stay the same.

exabrial 5 years ago

One huge difference is in locking we discovered. Do not expect Aurora to do you any kind of favors in you actually use these features.

shepardrtc 5 years ago

Really great article! I have a question: in it you say to keep an eye on RollbackSegmentHistoryListLength, and I want to do that, but I don't know at what number does it become something to worry about. There doesn't seem to be any guidance on AWS' site. I'm seeing ranges of 1,000 to 5,000 and sometimes 100,000.

  • bjacokesOP 5 years ago

    Great question, although I'm not sure there's a concrete answer to it other than "it depends". You can think of that metric as representing the number of logs that haven't been garbage collected, so as it goes up, performance will get worse.

    If you're seeing spikes in RollbackSegmentHistoryListLength that coincide with dips in DB performance, you've probably identified the culprit. In the scenario described in our post, that metric would have grown monotonically for the duration of the long-lived ETL query – probably a more overt problem than what you're describing with short spikes to 100,000.

    • shepardrtc 5 years ago

      A number of our 100k spikes spanned about a day, and a cluster of them seem to coincide with serious performance issues we have encountered. We "solved" the problem by increasing the instance size, but I'm starting to see spikes that get larger and larger, so I suspect we will run into this issue again. But now I have something to report on and watch out for. Thank you!

wooly_bully 5 years ago

H3 tags on this could really use a bump in size and contrast from the regular text.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection