Why is Snowflake so expensive?

12 min read Original article ↗

Stas Sajin

I was recently reviewing Snowflake’s quarterly financial statements and I came to the realization that their product is so well loved that they could stop adding new customers and still get double-digit revenue growth. This chart from Clouded Judgement really highlights how sticky the product is: a customer that joins a year ago and spends $1 is paying out well over $1.7 a year later. Snowflake doesn’t have yet an expansive catalog of products that they can upsell or cross-sell, so the majority of this new revenue is driven because of increased usage, low churn, and because … Snowflake's pricing model doesn’t scale.

Press enter or click to view image in full size

Clouded Judgement; This figure shows that Snowflake has a very high net new revenue indicating that once you start using the platform, you start expanding your usage.

Imagine the outcome of this conversation between a performance engineer working at Snowflake and RevOps management.

  • Engineer: “I figured a way to improve query performance by 20%.”
  • RevOps: “Great, we have to carefully weigh the decision to ship because it impacts our bottom line.”

Snowflake’s incentives to push on performance optimizations are diametrically opposed to any revenue goal targets. This doesn’t mean that Snowflake does not prioritize performance improvements. Since the first version of this article, people have mentioned that Snowflake did recently make a performance improvement that led to a drop in revenue. Moreover, to Snowflake’s credit, I think the pay as you go billing itself is a cost saver since I don’t end up paying for idle time. Nonetheless, I expect that as Snowflake becomes a market share majority, in a typical Innovator’s Dilemma, it will prioritize other things that generate an ever larger menu of compute options that most organizations will not be able to handle well because of unfavorable cost/performance ratio.

This is a natural course through which a SaaS model functions and many other companies have their own non-scalable pricing structures. For example, what person at Google thought it was a good idea to let BigQuery perform a full table scan when running this query?

// most databases scan 10 rows of data when running this query while Biquery will scan the full tableselect * from table limit 10 

The choice to scan all the data might seem dumb, but it’s not when you realize that they charge per Gb of data scanned, and it’s in their interest to leave optimization gremlins in.

What Snowflake could do better?

As an investor, I expect Snowflake to show amazing profitability and record-breaking revenue numbers. As an Engineer, if Snowflake continues on the current path of ignoring performance, I expect them to lose share to the open-source community or some other competitor, eventually walking down the path of Oracle and Teradata. Here are a few things I think they can do to stay relevant in five years.

Disclose Hardware Specs

Snowflake charges you based on your consumption. You’re not buying any specific hardware and instead pay as you go using virtual warehouse credits that have no hardware definitions. For folks in technical or engineering backgrounds, this is a red flag. Whether your query runs on a machine with SSD or hard drive, low or high RAM, slow or fast CPU, high or low network bandwidth makes a measurable impact on performance. Snowflake is very secretive about their hardware and when I interacted with a Sales team during migration from Redshift, I could never get any SLAs on their query performance nor information on the hardware specs. This is distinctly different from Redshift, Firebolt, and Databricks which are very transparent and provide more flexibility in customizing your performance through hardware.

Lack of transparency can also lead to bad incentives where Snowflake could revert to less optimal machinery due to internal pressures to improve its margins.

Not adopting benchmarks

Several months ago, DataBricks published a study highlighting that they outperform Snowflake on the TPC-DS benchmark, to which Snowflake posted a rebuttal. Snowflake's statement on benchmarks was very clear:

In the same way that we had clarity about many things we wanted to do, we also had conviction about what we didn’t want to do. One such thing was engaging in benchmarking wars and making competitive performance claims divorced from real-world experiences. This practice is simply inconsistent with our core value of putting customers first.

This is a very shortsighted statement and contradicts why data warehouses became so popular and what customers care about. Data warehouses became popular precisely because they delivered on performance promises, where you don’t have to wait for days or weeks for reporting to be done. A 100x improvement in performance didn’t only improve existing workflows but allowed practitioners to invent new ways of interacting with the warehouse. Benchmarks, although not ideal and can be gamed, although can be divorced from reality, although can suffer from reproducibility and generalizability, are starting points for driving discussions around performance. If I pick a technology that will cost my employer north of $1M/year, I would always insist on seeing some numbers. I hope Snowflake reverses its stance and can start implementing benchmarks as part of its release notes. An industry that collectively will be worth north of $1T+ desperately needs closure on benchmarking.

Optimizer gremlins

[Disclaimer, 2022–09–11] I’ve received some replies that my example is inaccurate. I’m willing to admit my mistake and I encourage folks to experiment on their own. Nonetheless, my claim was based on Snowflake’s documentation, personal experience from having this problem a year ago, and at least one other example that experienced this issue with external tables. I have tried recently reproducing the problem with subquery predicates and I saw no issues, so the example below is inaccurate. The issue with import CTEs is nonetheless still present.

If you work long enough with Snowflake you’ll find several gremlins that lead to vast performance degradation. I’ll give you an example that I experienced.

Get Stas Sajin’s stories in your inbox

Join Medium for free to get updates from this writer.

Micro-partitioning pruning is disabled based on predicates in a sub-query, even if the sub-query returns a constant. A query like in the example below will not benefit from fast pruning, even if you partition on created_houryet queries like this are all over the place if you build incremental models or if you use import CTEs.

// this query will perform a full table scan even if you partition // on created_hour
select
count(*)
from table1
where
created_hour > (select max(hour) from table2)

If you use DBT, you should really be mindful of this issue because this query pattern is very common and represents north of 50% of query costs. I almost never see these types of gremlins in open source databases (if they exist, they get fixed), yet they are ever present in enterprise data warehouses. This only makes me wonder whether there is a performance optimization team that has a voice within Snowflake.

Improve the workload manager to increase throughput

The query workload manager in Snowflake is inefficient. Let’s imagine you have one warehouse and two users. One user submits a compute-intensive query while another user submits a small query that scans very little data. The workload management software in Snowflake seems to allocate all its resources based on FIFO (first in first out) pattern. I really hope this is not the case, but after staring dozens of times at query queues I think it is primarily FIFO. If the large query runs first, you leave the second user hanging in a queue, massively decreasing the throughput. A better option is to pause the big query and relocate the resources to the smaller query or split the resource consumption so that 80% of capacity goes to the big query and 20% to the small query. Snowflake recommends two other solutions instead:

  • Scale out or up your warehouse, so the second user does not suffer from huge queue delays. This forces more spending.
  • Or use the newly announced query accelerator that allows you to scale on a per query basis. This again forces more spending if you don’t have good data governance in place.

From my days of administering Redshift, I can definitely say that Redshift’s workload manager was a lot smarter in preventing massively inefficient queries/users from bringing your system down and forcing you to scale out. In an era where ML is so advanced and we have so much metadata on queries and tables, it’s surprising to me that Snowflake has not built something that smartly manages query workloads. If I can predict that a query will take a while to run based on the number of columns required, joins present, and the size of tables involved, why can’t a workload manager do that with ML? I imagine this idea floated around internally within Snowflake, but because it is not a revenue driver, it was not on anyone’s roadmap.

Not providing observability to monitor and reduce costs

Snowflake provides very limited tooling to narrow down on your biggest cost drivers and perform per role/user level query timeouts or budget constraints. For example, I would like to do the following:

  • Have a list of users/systems that are responsible for most of my costs. Am I dealing with users that have little knowledge of SQL, perform cross joins, and leave their queries to timeout? Do I have ETL jobs that folks forgot existed, yet they capture a large share of my costs? This level of cost attribution is not possible to do today unless you start performing query tagging and build tools on top of Snowflake. You only know the total cost of a warehouse, which requires you to split workloads into multiple warehouses to get a more granular attribution.
  • Have the ability to specify per user/role timeouts or budget constraints. Today this is still limited on a per-warehouse basis, limiting your ability to control costs. For example, I would like to set a budget of $10k for a team and let them know that that’s all you get and it’s up to you to manage the cost efficiently.
  • Have tight integration with pagerduty/slack/email alerting users that they are being over-budget or allowing users to understand that their query patterns are not sustainable.

I think Snowflake will likely add better observability or it will be created through custom tooling, nonetheless today those features don’t exist and many companies find that Snowflake bills are unexpectedly large. See a longer list of features from this twitter thread.

What companies that use Snowflake could do better?

Snowflake, like any SaaS or IaaS, is expensive often because it’s a tool that gets misused. Although I think Snowflake has a share of the blame for performance scalability, some blame exists with the consumer.

Not negotiating

We have 5–6 very good open-source data warehouse alternatives. We have Redshift, DataBricks, Firebolt, BigQuery, and likely a few other enterprise offerings, yet it is surprising how little training most companies have in negotiating and re-negotiating vendor contracts or in pushing for heavily discounted pricing. You should not pay the same rate as your usage increases.

Lack of education

On the topic of misuse, if you have the right monitoring in place on your infrastructure, you’ll generally find an interesting pattern where 5% of your internal users drive 95% of your costs. Let me give you some examples:

  • Almost every person I know learned SQL on the job and made their fair share of mistakes, including writing queries with cross-joins, not filtering on the right predicates, selecting all columns, or pulling massive amounts of data into R and Python because they prefer those languages. Only through a trial-by-fire process do users learn how to work more efficiently. Nonetheless, today I would probably pay $25 and ask users to take the online SQL classes from DataCamp for a few days and just learn the craft efficiently and more scalably.
  • When my previous company used Looker, I noticed that some folks would just keep adding dimensions and measures on the order of 100+ without reflecting on the number of joins that happened in the background. Then they would run a query that would obviously reach our internal time-outs. This led us to change policies for how you use Looker where creators of dashboards had to belong to user groups that knew SQL. The BI tool is not a replacement for the lack of education unless you’re a passive consumer of dashboards.

Generally speaking, finding that 5% of users and simply letting them know that what they are doing is not sustainable can make a meaningful impact on costs. For example, you could set up a slack bot that periodically alerts users that they consume a lot of resources or simply provides teams with slack reports on cost attribution. Or if you have a large organization, you could dedicate a team of 2–3 people to optimize the overall infrastructure costs using the following Cloud Finops framework.

Should you still use Snowflake?

If you’re a small company with a budget of less than $500k/year for warehouse costs, I think you don’t have a choice but to use an enterprise tool. The number of features a tool like Snowflake offers, the usability, and the opportunity cost of not using data to drive better product development push you into a corner where you have to go with a buy decision. If your yearly costs are above $500k mark, it’s useful to consider the benefits of an off-ramp.

I expect the ecosystem to change over the next five years, where both big and small companies will go with open-source. We have DucksDB, Clickhouse, Pinot, Trino, Dremio, Druid, Iceberg, Doris, Starrocks, Delta Lake and likely a few other open-source alternatives that are used in large and small companies. The challenge is that some of these tools have not yet reached a point of high popularity or ease of developer experience. As things consolidate and optionality gets narrowed down, as more data engineers become familiar with a core set of tools, it will be a lot cheaper to go with open-source. In the meantime, for Snowflake to be relevant, they need to ensure that its performance engineering team is completely independent of teams that are responsible for revenue drivers, so it can truly reinvent itself.

Disclaimer: The opinions expressed within this article are solely the author’s and do not reflect the opinions and beliefs of Medium, the author employer, or any other affiliates.