Managing your Amazon Redshift performance: How Plaid uses Periscope Data
blog.plaid.comLars here, the guy who gets the honorable mention at the end of the post "for brainstorming Redshift performance" with Austin (the author of the post) :-)
If you care to dig a little deeper into the things we discussed, we've written them up in a longer blog post:
https://www.intermix.io/blog/top-14-performance-tuning-techn...
Great post. Check out dbt (https://www.getdbt.com/) for materializing your views, lots of great features and a great community.
I just wish Plaid would change the color for TD Canada Trust in Canada to the color that resembles the logo.
TD Logo: https://i.gyazo.com/aa0d5f97954bd497b2f8b3a515752b34.png
Plaids iframe for TD: https://i.gyazo.com/41fd14755c25d83642359a054f9525d6.png
Users have complained about the mismatch, iv tried contacting plaid but it hasn't gone anywhere.
Query structures have a huge impact on performance, this problem can be managed by scheduling SQL based ETL for your data warehouse. We abstracted this into a simple feature on Holistics, you can take a look at how the guys at Rezdy use it https://medium.com/rezdy-engineering/an-introduction-to-data... Data Transforms SQL scheduler: https://www.holistics.io/features/data-transforms/
I would be interested to know what their monthly Redshift bill is. The work they’ve done is really impressive, I’m just wondering if the cost savings justify all the time they’ve invested. Sometimes the right answer in these situations is just to throw more CPUs at the problem.
The problems they solved here are vanilla optimization for Redshift. Adding sort/dist keys on tables and pre-aggregating immutable data is stuff you’re going to have to do at some point, throwing more CPU at it can only help so much.
I'd like to know the actual footprint of their data. They mention some of their tables have "infinite rows", yet from their screenshots, the largest query is on"link_web_production.exit_link", scanning 3.9 mio rows.
Or Snowflake
Yep, we had similar Redshift issues and ended up switching to Snowflake.