Settings

Theme

Managing your Amazon Redshift performance: How Plaid uses Periscope Data

blog.plaid.com

80 points by whockey 7 years ago · 9 comments

Reader

scapecast 7 years ago

Lars here, the guy who gets the honorable mention at the end of the post "for brainstorming Redshift performance" with Austin (the author of the post) :-)

If you care to dig a little deeper into the things we discussed, we've written them up in a longer blog post:

https://www.intermix.io/blog/top-14-performance-tuning-techn...

dapearce 7 years ago

Great post. Check out dbt (https://www.getdbt.com/) for materializing your views, lots of great features and a great community.

nickolas_t 7 years ago

I just wish Plaid would change the color for TD Canada Trust in Canada to the color that resembles the logo.

TD Logo: https://i.gyazo.com/aa0d5f97954bd497b2f8b3a515752b34.png

Plaids iframe for TD: https://i.gyazo.com/41fd14755c25d83642359a054f9525d6.png

Users have complained about the mismatch, iv tried contacting plaid but it hasn't gone anywhere.

evtan 7 years ago

Query structures have a huge impact on performance, this problem can be managed by scheduling SQL based ETL for your data warehouse. We abstracted this into a simple feature on Holistics, you can take a look at how the guys at Rezdy use it https://medium.com/rezdy-engineering/an-introduction-to-data... Data Transforms SQL scheduler: https://www.holistics.io/features/data-transforms/

georgewfraser 7 years ago

I would be interested to know what their monthly Redshift bill is. The work they’ve done is really impressive, I’m just wondering if the cost savings justify all the time they’ve invested. Sometimes the right answer in these situations is just to throw more CPUs at the problem.

  • teej 7 years ago

    The problems they solved here are vanilla optimization for Redshift. Adding sort/dist keys on tables and pre-aggregating immutable data is stuff you’re going to have to do at some point, throwing more CPU at it can only help so much.

  • groestl 7 years ago

    I'd like to know the actual footprint of their data. They mention some of their tables have "infinite rows", yet from their screenshots, the largest query is on"link_web_production.exit_link", scanning 3.9 mio rows.

  • maslam 7 years ago

    Or Snowflake

    • dapearce 7 years ago

      Yep, we had similar Redshift issues and ended up switching to Snowflake.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection