PlanetScale increases plans to include billions of reads
planetscale.comWorth noting "rows read" is not "rows returned". If you do 1000 full table scans on table with a million rows, you got a billion reads
https://docs.planetscale.com/concepts/billing#understanding-...
I still don't see how this pricing model is usable in a database with a query planner. The user is not in control over the query plan, the database is. This is a recipe for disaster, and I'd never feel comfortable with this. Even with good understanding of how the database works there is pretty much no way to ensure that the database doesn't do some suprising full table scans.
Many young founders won't have the experience or the patience to really understand this, and in practice most won't have the scale to really feel the pain anyway. The ones that do will be worth so much they can negotiate custom contracts. Such is the market for pickaxes created by easy VC money.
My big problem with SQL is that I want to do the query planning. Sure, the database can often do a better job -- but I want to be able to guarantee upper bounds on things, damn it! I've seen so many weird edge-case performance problems that boiled down to poor query plans. Things will be fast most of the time and then, suddenly, the database will look at its stats and decide to do a full table scan, and then everything goes straight to hell and stays there for too many milliseconds.
If I ever cross paths with a database, I'll spit on its shadow.
MySQL has query hints: https://mariadb.com/kb/en/index-hints-how-to-force-query-pla...
True, even with tuned indices table scans are pretty frequent and perhaps one would want a slower but cheaper query. It’d be quite stressful to have to worry about query plans executed because they can vary quite a bit across very similar queries and be nudged towards more expensive queries by development teams.
If your application allows somewhat flexible queries this might also create potential Denial of Money attacks, which I personally find scarier than plain old Denial of Service. I mean that's always a risk with this type of pricing, but the particulars of a relational database with a query planner makes this a lot more dangerous.
This is why Google Cloud Firestore/Data store charges for rows returned, but the caveat is you can't do stuff like aggregations as the cost just doesn't work out
This comment has to be higher up. This pricing is like a sword over the neck and drops when you screwup or the sql planner screws up.
DynamoDB is the same, it charges you for the number of reads/writes you do, so if you're doing full table scans on massive databases, you're going to have a bad time.
With the caveat that ddb extensively documents how you will get billed, down to the request size.
I'm all for the aws hate when it's deserved, but if you get screwed by it on billing, you didn't read.
There is a difference though - you don’t have a query language that makes it easy to do, and the actual technology pushes you to make a different more correct choice.
First of all, I agree there are gotchas. We have roadmap items that will eliminate this problem in the medium term.
I do however disagree that this is any worse than idle RDS hosts sitting around when you have no traffic, costing you huge sums for a service that is basically `apt-get install mysql-server` on top of EC2.
> RDS hosts sitting around when you have no traffic, costing you huge sums for a service that is basically `apt-get install mysql-server` on top of EC2.
rds gives you automatic backups, automatic failover with DNS, easy upgrades, IAM authentication, an API for manipulating your database instances, and more.
As ceo it damages your company's credibility when you say it's just `apt-get install mysql-server`. Please do better.
Strongly agree. There is a huge amount of value in what RDS provides (over DIY EC2).
Fellow HN user quinnypig described it succinctly:
https://twitter.com/QuinnyPig/status/1173377290815721473RDS is a huge win not because of anything intrinsic to what the platform actually is, but because we collectively suck at setting up and managing replication, backups, etc.
One of the things about a $29/mo RDS instance, though, is that if you're doing full table scans over a million rows, it's going to grind to a crawl and immediately alert you (not explicitly, but via the performance hit) that you're doing something wrong with indexing. Effectively, it's a hard budget cap, and that's super useful for budget-conscious organizations and individuals.
Does PlanetScale have functionality to provide budget alerts? Does it have the ability to speed-throttle calls that would require a budget increase to do effectively without further optimization, which effectively that CPU-capped RDS instance does de facto?
In other words, can I tell PlanetScale to cap my usage and throttle query speed if I exceed row scan limits, rather than blocking me entirely or charging my credit card more? If it doesn't yet have those capabilities, then I think it's fair to say it can easily be worse than an idle RDS host sitting around.
I have built systems for deploying both Mysql and Postgres setups with backups and replication and failover, and while it's simple enough to do, describing RDS as "apt-get install mysql-server" is a gross oversimplification.
I'd roll my own again for many reasons, not least because AWS is ridiculously overpriced, but if you're first using EC2 and so already taking the costs of being on AWS I'd recommend people use RDS over rolling their own any day unless they're very familiar with what proper redundancy and backup strategies entails .
A high but predictable cost is completely different from an unpredictable cost risk wise.
Hopefully you are not running a company nor anything that involves risk taking.
he is the CEO lol
That is wasteful too, but deterministically wasteful. You know exactly how wasteful you are being and how much it will cost.. no surprises.
After that comment, I hope you have a CTO who keeps you away from the engineering decisions.
This limitation prohibits the use of joins until a session parameter like 'set max_logical_reads to xxx' is available.
They're not alone in this approach. BigQuery and DynamoDB also meter data usage based on the amount of data processed during a query.
BigQuery allows you to specify maximum billed bytes for a query to avoid situations like this. You can also purchase slots for fixed cost unlimited queries.
Authzed[0] also does somewhat similar with SpiceDB[1], but charges based on query complexity for the levels of nesting in the graph traversal, rather than the actual number of rows affect. A flat query is easy to compute, thus cheap.
One difference with DynamoDB is that there's no query planner, so you can have a pretty good sense of how many items you'll hit and how big that read is.
Athena as well
We wrote a blog post comparing RDS vs PlanetScale from a pricing perspective back in January...kudos to PlanetScale on responding to feedback on that so quickly. HN discussion here: https://news.ycombinator.com/item?id=29910812
We'll get the blog post updated but conceptually it seems like the "tipping point" of using PlanetScale vs RDS just got a lot easier at least from a read perspective - analysis is here: https://www.vantage.sh/blog/2022-01-12-rds-vs-planetscale-pr...
Well this is some nice news! From what I've seen from my data usages I was going to fall well within the free tier /before/ this change and now I've got even more headroom. I've been switching over to PlanetScale from Aurora Serverless and I'm really enjoying the experience so far and the savings are great too (1 instance costs about $40/mo, I have my dev/qa environments turn off after 5 minutes of no activity so they are almost free but prod has to stay on 24/7, now I'll pay $30/mo and that can take care of all 3 environments).
This is a nice change. It looks like it used to be $15/mo per 100 Million rows read, $15/mo per 10 Million rows written. Now it's $1 per billion reads and $1.50 per 1 million writes. So the write pricing hasn't changed, but the read pricing has gone from $150 per billion to $1 per billion (a reduction of 99.3%). $1 would have gotten me 6.7M reads before the change and now a billion reads (150x more reads for your dollar). That's a huge pricing change!
I'm guessing the "difficult to understand and hard for you to predict" is around how the read pricing is based on how many rows MySQL needs to read in order to serve your query. That's going to depend on how the query optimizer executes your query (and what indexes you've made and such).
It does make me wonder if I'm allowed to create as many indexes as I want without incurring additional "writes". Their billing page makes it seem like indexes don't count: https://docs.planetscale.com/concepts/billing. In fact, their storage language makes it seem like indexes aren't charged for storage: "Data saved in the form of tables, columns, rows, and their corresponding relationships." I'm guessing that's just an oversight and the thinking is that "tables" covers indexes, but wouldn't "tables" also cover "columns, rows, and their corresponding relationships?" Given the expensive storage at $2.50/GB, how big things are seems to matter. Cockroach charges $1/GB (60% less), Google charges $0.34/GB for Cloud SQL HA storage, and $0.33/GB for Cloud Spanner storage (and 2-3x that if you're going multi-region). $2.50 is a big premium.
It still seems like there's an incentive to over-index rather than doing small filtering in-memory, though the incentive is now 99% smaller. Likewise, there seems to be no charge for doing something that requires sorting - or maybe they consider the sort to be a re-read of all the results? Looking over MySQL's explain analyze results, it looks like there shouldn't be a cost for reading through the index.
Sorry for the random thoughts. PlanetScale is a great project to offer a serverless MySQL on top of Vitess. I wish Vitess existed for PostgreSQL (we use Postgres' GIN indexes with JSON so it's not easy to move to MySQL).
Thank you for your thoughts. We are continuing to iterate. Pricing in the serverless world is a new craft and there is lots to improve on. We were very excited to see other companies follow suit with serverless pricing models after our launch.
A lot went into making this iteration on our pricing. We spoke to customers, we reviewed all of our billing to make sure that this would save money for nearly all of our customers (the bill has increased for 2 customers because they have very unique workloads, we are providing them discounts to mitigate).
One thing to mention is that our storage includes replication of the data. We never want customers to worry about how many copies to keep to achieve HA so we do that for them and that is represented in this price.
We are continuing to optimize with the customer in mind and I am sure there will be further iterations. Stay tuned!
GP had direct questions which you did not answer.
If that's true, that will be similar to firestore when using limits and offsets. I really hate asymmetric pricing like this that forces developers to think in non standard way to reduce expenses. There is enough complexity as is in software development without more gotchas in billing.
You're almost always being charged for the resources you use, the gotcha is that they aren't free
This is similar to the problem we face, where you're charging based on usage of something people don't usually count. For us, that's GraphQL requests.
While certainly big companies have monitoring for this kind of thing set up, we learned that a majority of engineering teams have absolutely no clue how many GraphQL requests they get per month. Like, not even a ballpark. Hundreds of thousands? Millions? Billions? No clue, could be any of those.
Our free plan was originally 5M free requests per month, which is relatively generous — but people didn't know and thus we had almost no users on the free plan. We recently changed it to just be "free for solo developers working on side projects / indie hacking / etc. no matter how many requests".[0]
So far, the change's been well received! Curious to see Planetscale dealing with the same general kind of issue, just on a different layer.
[0]: https://graphcdn.io/blog/unlimited-free-requests-for-develop...
It’s because it makes zero sense for me to care about how many requests I send when I have my own server.
The limit is basically 31 x 24 x 3600 x [req per second server can handle]
Even at a relatively low rate of 10rps that is 26M requests per month.
Now putting a demo environment at 2 rps per second for a whole month is pretty generous, but until I took the time to calculate it, it sounded like a pretty bad value proposition to me (even though I probably make less than 1000 requests per day).
Database reads/writes are really hard (read: impossible) to predict unless you are already in production. Leading to thoughts like: "1 Billion reads!! I'll never use that much..." Once you cross the line, the overages kick in.
That being said, this does appear to be absurdly cheap compared to competitors. Amazon Aurora appears to be sitting at around $200 a month for 1 billion reads, excluding writes/transfer/storage/etc.
CockroachDB Serverless includes 250M "request units" (request units include reads and writes and individual requests can costs multiple units depending on size). They charge an extra $1 per month per 10M "request units," so $75 to get to 1B reads at least.
Am I missing something? What's the catch?
You're mixing up I/O-based pricing with rows-based pricing. Aurora is priced based on I/O operations, with 16KB pages: https://aws.amazon.com/rds/aurora/faqs/
If you're doing a range scan query and your rows are reasonably-sized, you can conceivably get tens of rows to maybe even a few hundred rows per single I/O operation. Or even 0 I/O operations if the pages are already in memory in the buffer pool cache.
Planetscale prices based on rows.. And scanning a few FAQs I don't see anything about cached rows being free, but maybe i missed it.
I'm not really sure why a pricing change is HN-worthy, but I guess here I am biting:
> We’ve also heard your feedback about how our pricing is difficult to understand and hard for you to predict.
> Starting March 1st we’ll be offering our customers up to 200x more reads across all pricing plans.
Just giving more reads doesn't seem like it's actually simplifying pricing or making it more predictable?
It was noteworthy change for me as I was considering PS for a new project but was a bit weary of the read limit and unwittingly hitting it. Now with the bigger numbers I feel that I have less to worry about and could just dive in.
And since as you mentioned from the post its a popular complaint so I figured others might be interested in this change as well.
One of the most common complain was the read [1], and now they have upped the bundled limit and cost thereafter as $1 per billion reads.
Yeah, but they didn’t actually adress the problem they stated customers had in their own blog post.
It feels like they’re just kicking the can down the road by increasing their numbers.
Yea but it's great marketing. It fits a few usecases of mine. And I think they reeled me in
The pricing page/docs leaves so many questions unanswered:
-What's the cost of egress?
-What is a read/write exactly? It is a DB "page" read/write? I know there's a section on this, but it doesn't explain details.
-If it's a page read/write, what is the size of the page? 16kb?
-If it's a real row read/write, what is the maximum size? Can I write a 100mb row for the same price?
-What about indexes, or merging the WAL log? Will I be charged for these operations (can result in million+ writes)?
-What about small consecutive writes that fit in a single 16kb page, do I get charged a single write "unit"? RDS actually combines this into a single op (see IOPS with RDS).
-What about cached reads, do I get charged for that?
-What about computationally expensive queries, that do not actually read/write that much?
Please answer these questions. Provide useful real-world pricing examples. This is standard stuff, and especially important if "transparent" pricing is a key feature.
You come off as quite demanding and expectant for apparently not having read their pricing page or billing docs, which specify very clearly that they're talking about row reads/writes.
They even go into examples with using `EXPLAIN`, `EXPLAIN ANALYZE`, `innodb_rows_read`, etc to see row counts.
Disagree, GP's expectations are perfectly reasonable
Their pricing page starts out by saying "Transparent pricing you can understand"! I shouldn't have to read a several thousand word FAQ, and then still come away wondering if cached pages still count as rows read
Cached page reads are free on Aurora, this makes a huge difference in pricing if it isn't the case on planetscale
I agree with you that there are some simple questions left unanswered, but "is it row reads?" is not one of them.
The linked blog post literally just says "reads". The word "rows" does not appear in the blog post! You have to click through, yes it's clear once you do, but i'd say it's a valid complaint about the blog post
The comment by truetraveller is complaining about the pricing and docs, not the blog post:
> The pricing page/docs leaves so many questions unanswered:
The pricing page and docs make "rows" very clear. I was never referring to the blog post, nor was truetraveller.
They do not make it clear.
From the pricing page:
> How are rows read and rows written calculated? > Every time a query retrieves a row from the database, it is counted as a row read. Every time a row is written to the database, it is counted as a row written.
From the billing docs:
> our paid Scaler plan comes with 25 GB storage, 100 billion row reads, and 50 million row writes
> Every row read from a table during execution adds to the rows read count, regardless of how many rows are returned.
> You can test a query for approximate rows read using the EXPLAIN statement. Running EXPLAIN with your query will return information about the query execution plan.
> To see the exact rows read, you will need to run the query. You can use the EXPLAIN ANALYZE statement to do this. It will return the estimated information about how it will run the query, run the query, and then return the actual impact from running the query.
> Another useful way to check rows read is using innodb_rows_read. This server status variable will show you the number of rows read across all tables. You can run it before and after queries to calculate how many rows were read.
These bits are extremely specific, down to the storage engine level. I don't know what more you could be looking for as to what "rows read" means than `innodb_rows_read`.
> I don't know what more you could be looking for as to what "rows read" means than `innodb_rows_read`
On the contrary, this is completely unclear because the mysql manual doesn't even give a clear explanation at all!! it only says "The number of rows read from InnoDB tables." https://dev.mysql.com/doc/refman/8.0/en/server-status-variab...
Let's say my db has a primary and 3 replicas. I insert one row. Does planetscale count this as 1 row written? 4 rows written?
Let's say my rows are small and there are 200 per DB page. I do a single-row select by primary key. This reads the page (you physically can't read just one "row" at the i/o level). Is this 1 row read or 200?
My read query triggers an internal temp table. Is the number of rows read based on only on the real tables, or also add in "reads" from the temp table? What if the temp table spills over to disk for a filesort?
I alter a table with 100 million rows. Is this 100 million rows read + 100 million rows written? or 0? something else? while this is happening, do my ongoing writes to the table count double for the shadow table?
Does planetscale even know the answers to these questions or are they just going by innodb status variables? do they actually have any innodb developers on staff or are they externalizing 100% of their actual "database" development to Oracle? (despite their blog post using wording like "we continue to build the best database on the planet"??)
I'm not demanding. If "transparent" pricing is a key feature, then please at least provide detailed billing info/examples. From the billing page you linked, still have many questions: If I have a row that's 100mb, is that counted as one read? What about index creation, is that counted? Or, merging the WAL log with the base, is that free? And what is a read "unit"? is this different than a "row read".
Your comment used to read:
> C'mon guys, this is super basic.
which came off as demanding, until you edited it after my response to instead read "This is standard stuff".
It's bad form to make substantial edits an hour later after you've been replied to, especially if you then refute the reply based in part on that edit.
I agree with you that more examples would be helpful, and you have some good questions which are left unanswered, but "is it rows?" was answered very clearly by the pricing page and billing docs.
I edited it because I didn't want to be mean, and I thought that might have been mean. My intention was not to be less "demanding". In fact, I added to more questions to my list with my edit. So, I became even more demanding!
>"is it rows?" was answered very clearly
In the world of DBs, "rows" is extremely vague. We need more info, which is the point of my post.
If there's no egress, Planet Scale would make for a pretty neat blob cache store (key -> blob queries) to front S3 for hot objects... 100b reads per month with S3 would cost $40K+ for just GET requests (discounting egress): https://archive.is/4HYH6
Maybe the FAQs will go into more detail.
They don't!
If PlanetScale just offered to reimburse any costs above the 99th percentile of normal operations it would be a great success. If their business model is dependent on people getting screwed by costs above the 99th percentile then they shouldn't be in business.
I was going to say I agree but I dont see how that would work. How do you define normal operation?
Look 200x more of something is 200x more, and 200x more for the same amount is a huge win for a service's users.
In the DB space though pricing per row or iop or this or that is tough. We're heavy users of BigQuery and the pricing per bytes consumed is tough, too, as you can't always rely on the estimator. But then if you go the fully pre-paid route like with something like Redshift you have high upfront fixed costs for an always on DB (that changed a bit with Redshift serverless -- currently in preview -- https://aws.amazon.com/redshift/redshift-serverless/) but I mean it's the same with BQ in that sense: don't run a query don't get charged except for stored data.
The point I am trying to make is that pricing of a DB is hard. If I had to choose I think I rather like the straight forward per second billing of serverless.
The answer really is do both kinds of pricing, like Cloudflare does with Workers.
Workers Bundled is priced for upto 50ms of CPU time and unlimited egress. Workers Unbound is priced per-ms of 8xCPU or 1xIO time, which ever is higher.
I mean, pricing is hard, but at least read through these if nothing else: https://html.duckduckgo.com/html/search?q=inurl%3Ahbr.org%20...
Think of it, billions of `0000-00-00 00:00:00`s!
Can anyone comment on PlanetScale vs Supabase? I'm not their target, because I'm just a random individual that wants a free database for personal projects. I could try both but would be nice to hear about someone's experience.
From my understanding Supabase is more of "all-in-one backend" whereas PlanetScale is "just" a DB (MySQL vs the Postgres it looks like you get on Supabase). PlanetScale also has some pretty awesome tools for moving DB schema changes between environments (zero downtime) and I don't know what Supabase provides for something like this. If all you want is a DB then give PS a try, I'm loving it so far, if you want a full backend "no code/low code"-firebase-like thing then Supabase might be for you. Personally I don't like going "all in" on something as important as "my whole backend", I'm much happier just getting a DB from PS and then using AWS Lambda as my "backend".
{supabase ceo}
This is true, we do offer some extra functionality - but also I want to point out that the database we provide is "just" Postgres. You can use it as a standalone database without any of the other functionality. We give "postgres level access", so you can use it just like RDS or any other DBaaS
for example, we commonly see Prisma developers using our databases (because we also provide pgbouncer, a connection pooler).
Just curious, how are indexes priced?
I assume you’ll pay $2.50/GB for storage, but if I update a row that touches 5 indexes is this 1 write or 6?
Vitess is a popular choice, a battle hardened current gen solution.
I'm keen for Spanner/CockroachDB!