Settings

Theme

Prediction market: We're unhappy with Firestore. What will we switch to?

manifold.markets

43 points by akrolsmir 4 years ago · 46 comments

Reader

jpgvm 4 years ago

Unless you need realtime PostgreSQL is always the most obvious solution.

If you do need realtime you can build it on PostgreSQL yourself depending on your requirements either using LISTEN/NOTIFY or logical replication. There are tradeoffs to both if tbh if you are asking this question you probably don't want to go that path.

Non-realtime it's very easy to handle nested JSON in PostgreSQL but I would still avoid it like the plague unless it's user-supplied data without any real schema.

You might feel like schema-less lets you "move faster" but it's a load of horseshit that really starts to stink much sooner than you might think.

Schemas and by extension database integrity make it easier to move faster because migrations allow you to ensure there are no edge conditions related to stored data when upgrading your code to use an extended or otherwise modified data model.

The other main benefit of PostgreSQL is just the sheer body of resources available, with the exception of the other main RDBMS (MySQL/MSSQL) it just completely dwarfs what is available for other data stores. You will rarely if ever encounter a problem someone hasn't already solved.

com2kid 4 years ago

> - Related: Firebase cloud functions have extremely slow cold-start times (>5 seconds is common), and deploying new versions can take minutes.

This somehow has gotten a lot worse. When I first started with Firebase Cloud Functions, which to be clear are amazing and simple to get up and running with compared to anything competitors offer, it was a few seconds to deploy. Sadly it has gotten worse and worse as time goes on.

Still though, the paradigm around cloud functions is so simple compared to the nightmare that is AWS Lambdas. When trying to explain why cloud functions are better, AWS users just stare at me like I am talking in a foreign language about some sort of mythical land of make believe.

  • akrolsmirOP 4 years ago

    We've started on Firebase Cloud Functions, but lately I've been interested in moving our serverless stack to Vercel's functions instead - the developer experience is much better (no idea about cold-starts though). We'd lose the neat integration with Firebase Auth, though...

akrolsmirOP 4 years ago

More context: The main thing I'm unhappy with is the extra developer burden imposed by needing to denormalize information. E.g: I have a user document in Firestore, with userId, name, and avatarUrl. If I want to be able to fetch a list of comments and have the name & avatarUrl of the creators, in Firestore I have to write those alongside userId. Then, if I later add isVerified to the user document, I need to either backfill my entire db and denormalize again; or client-side handle the missing case.

Then the other pain point is the "joins" use case; right now we do the equivalent of fetching all comments & users, then doing an in-memory join. Ideally, we could craft a single request that just says "get the 10 latest comments on this market, plus the associated avatars" without data duplication and without doing a bunch of up-front thinking about exactly how to structure indexes.

My hesitation with relational DBs comes from the mismatch between client data model (loosely, JSON objects of pointers) and how it's represented in the DB (in a row); plus the requirements of needing to specify your e.g. indices up front, and annoyance of doing migrations. I'm hopeful someone's found a graph-type solution to work really well for them!

  • anaccountexists 4 years ago

    I mean, if you don’t want to denormalize your data, you’re going to need to think about indexes in some capacity regardless (this is true for non-relational DBs like Dynamo and Mongo too).

    "get the 10 latest comments on this market, plus the associated avatars" couldn’t be better suited to a relational DB. That’s a textbook use case that Postgres would be amazingly well suited to.

    Also: remember with Firestone that you’re paying for redundancy and availability that’s entirely Google managed. Most DB offerings you work with on your own are significantly more hands on as far as recovery / backups / replication go.

    Engineering time is usually more expensive than server costs when you’re a startup, so think about how much time it’d take to do it yourself before you decide to optimize your server costs over R&D costs.

    • akrolsmirOP 4 years ago

      Yeah - totally agreed re: eng time > server costs; the db costs are the least significant part of the equation.

      Fundamentally, I'm trying to optimize for something like "developer happiness as we build out lots of new features quickly". My dream workflow would look something like: take the Typescript types we've defined on the client, and shove that somewhere in the cloud; then later query to pull out exactly the data we need to render any particular view of our site (ala GraphQL)

      AND I'd really like to not have to spend a lot of up-front time knowing exactly which indices to set up, or to figure out complicated migrations later. And I'd like to not think about hosting/managing replications, etc. Maybe that's too many asks, and I'm being too greedy! I'm just hoping that someone's solved this pain point already, and I just haven't heard about it.

      • jacobmischka 4 years ago

        We do something kind of similar, only in reverse, with Prisma and PostgreSQL (models are defined in Prisma and TypeScript types are automatically generated for use in the client). It's been a pleasant developer experience, though we do not have any realtime data needs yet (we just do basic polling for the few parts that need reactivity based on database updates). I wasn't aware of the Supabase realtime PG project which was discussed here, so thanks for bringing that to our attention!

        • burggraf 4 years ago

          Supabase developer here. If you're interested in realtime, be sure to mark your calendar for our upcoming launch week (we do this a few times a year.) It's the week of March 28 and you'll definitely be very interested in some of the stuff we're launching.

      • marviel 4 years ago

        Not a well-understood solution to me, but you may be interested in looking into FaunaDB. IIUC, It works by defining a GraphQL API and queries you want to use on it, and it will create the correct data structures behind the scenes to allow efficient queries of the form you've provided.

        It is really new, though, from all I can tell.

  • simulate-me 4 years ago

    Why not do something like:

    1) Fetch the list of comments

    2) Add a listener on the public user info for each comment poster

    3) Render the comments immediately. When the user info is available, re-render with the avatar information.

    The nice thing about this is that the avatar information will immediately update in real-time as soon as someone updates their avatar. Yes, with a KV-store, you need to do more reads because you can't join data (which implicitly will do a read btw), but it doesn't seem like that big of a deal to me. Immediately reflecting changes to the public user state seems nicer than the convenience of a join.

  • stickfigure 4 years ago

    I'm a longtime user of Google App Engine / Cloud Datastore / Firestore and wrote a Java ORM for that environment that has achieved some popularity. I really like the datastore for some applications, but there's a pretty good chance that it's a bad fit for you. While I'm only casually familiar with your problem domain, it seems at first glance like the kind of thing that would scale reasonably well in a traditional RDBMS. You could would get a lot of value from joins and aggregations, and you don't really have zillions of elements changing all at once.

    I could probably give you some advice on how to use the datastore better (most of which would be along the lines of "don't denormalize, store foreign keys and use batch key fetches instead") but it might just be the wrong tool for the job. If you want to talk about it, contact info is on my profile.

    • akrolsmirOP 4 years ago

      Yeah - we do store foreign keys, but Firestore only supports fetching a batch of 10 keys at a time afaict. It might just be the wrong tool, like you said.

      We do very much want real-time updating, but there are okay integrations for that with RBDMS's now (eg Supabase). Primarily, I'm curious about some of the newer/more modern DBs, and whether anyone has had good or bad experiences with them!

      • burggraf 4 years ago

        Supabase developer here. I came from Firestore to Supabase due to running into a lot of limitations you're seeing. Just my biased opinion, but looking at "newer/more modern DBs" is not necessarily the route you want to take. That's why I looked at Firestore and ended up at Supabase. PostgreSQL is not "newer/more modern" but it's time-tested, battle-tested, and I know thousands of companies have used it in production for decades. I prefer to go with something I know works, will work at scale, and has tons of community and commercial support. FWIW

  • krikou 4 years ago

    > developer burden imposed by needing to denormalize information.

    > Then the other pain point is the "joins" use case;

    We usually do that client side, with the aid of a web-component holding a ref to the (realtime db, not firestore) database path, and rendering its value. The payload is small as you only fetch data you use.

    That works pretty well, even with long lists or grids; quotas/price on the realtime db are pretty generous.

  • AlchemistCamp 4 years ago

    What initially motivated you to use Firestore?

bo1024 4 years ago

1) Any thoughts on the conflict of interest? Any insider can easily take advantage of a market like this.

2) Is there any element of a decision market here? Are you asking users to just predict, or help you decide? The incentives change a lot in the latter case, since one can get cyclic dependency -- self-fulfilling prophecies.

  • jahooma 4 years ago

    Great questions!

    1) There are three insiders on this decision which would be the three cofounders of this site (including me and the OP). We wouldn't insider trade because we genuinely want to know the answer to this question. The market is also tied to what we actually do (and we're not going to lie!).

    2) This is both for you to predict what we will do, and to convince us. If you propose a DB and make a case for it, you can gain in expectation because maybe there really is like a 7% chance we'd pick it. So buying up shares from 0-7% is a win for you.

    Alternatively, if you think that one DB choice is much better than the others, and we would be somewhat likely to figure that out, then you might gain by buying shares in that answer.

    Basically, I think the incentives are good! We're using a more rigorous mechanism — prediction markets! — to do Q&A better than Stack Overflow.

    • bo1024 4 years ago

      1) Cool. I'd add it's really important that everyone knows and believes that you won't insider trade, otherwise they might get scared off of participating. So the perception is important too.

      2) Right, so, the way this can go wrong in theory (not saying it will happen in practice) is if a large group of users, or a small group with a lot of points, decide to all get together and focus on one alternative regardless of how good it is. They predict that alternative really strongly and vote against / predict against all the others.

      Then, again in theory, you (the site operators) look at the votes and say 'wow, this database must be much better than all the others, we'd better use it.' Then the colluders all get their predictions proven true, so it's a self-fulfilling prophecy.

      Here's one research study that looked at this kind of issue, although they end up not finding a lot of manipulation: https://www.researchgate.net/publication/315529106_Manipulat...

    • teruakohatu 4 years ago

      But you are not asking what you should switch to, you are asking users to predict what you will be using on a certain date.

      Prediction markets are supposed to have insiders buying in. Nothing wrong with that, but they are not supposed to be self fulfilling prophecies. Right? And certainly the prediction market itself should not be the insiders.

      • jahooma 4 years ago

        >>> [Prediction markets] are not supposed to be self fulfilling prophecies. Right? And certainly the prediction market itself should not be the insiders.

        We've thrown out all the rules. Why can't the prediction market be a self-fulfilling prophecy? Why can't the market outcome be decided by insiders?

        What matters is what the incentives are. I say the incentives here are for you to point us to the most useful database and argue for it. That helps us.

        Our incentive as insiders is to be truthful about our choice, because we want to incentivize you to help us choose a database.

      • nl 4 years ago

        There is a lot of evidence that prediction markets work best when they are inside markets.

        Yes there are problems with that!

latchkey 4 years ago

Firestore has very specific usecases that don't match a lot of things that Postgres is really meant to handle. Namely, as you are learning the hard way, it isn't a relational database. It doesn't make it bad, it just isn't what you thought it was. $1400 a month is nuts, it sounds like something isn't optimized well.

GCP Cloud SQL (Postgres) has been fantastic for me. Easy to connect to, easy to scale and not that expensive.

I have GCP Cloud Functions (golang) in front of that and they spin up, connect to the database and serve the whole request in under 1s. Hot requests are 80ms for submitting some simple data.

If you do it right, you can minimize persistent connections to the SQL backend (offload as much as possible to PubSub messages which a backend function can handle). This will keep your bills down too because you won't need as large of an instance to serve requests.

antifa 4 years ago

I wish GCP had 2 things. A dumber key-value-store/mongodb-clone thing. Firebase has too much going on with it's confusing await getCollection().getDocument().getSnapshot().data() thing, the half documented transition to composition API, and I didn't need realtime.

Also a serverless postgres offering. Not even the fancy kind. There are some great free tiers outside of GCP. I have some projects where my usage would, if given true pay-as-you-go pricing, would fairly be between $0.10/month and $25/year. But GCP starts at $9/month for a postgres and a lot of competitors just leap straight to $25/month when leaving the free tier.

  • breakingcups 4 years ago

    Re: the serverless Postgres offering, if you allow me to cheat a little bit I might suggest CockroachDB Serverless. It's not Postgres, but it is Postgres wire-compatible. For some use-cases that might be enough.

  • tlarkworthy 4 years ago

    Google cloud storage (and S3) are the dumb KV stores with great read latencies

  • latchkey 4 years ago

    Dumb question: how is serverless postgres different from cloud sql postgres?

    • lf-non 4 years ago

      Serverless solutions are typically priced based on usage - in case of db you may be charged based on actual iops, storage used etc. as opposed to pre-provisioned reserved capacity and infrastructure.

      Depending on usage patterns the former can be cheaper for apps that don't need reserved capacity and need sporadic/occasional resource access or have unpredictable spikes with otherwise low usage.

    • antifa 4 years ago

      The minimum price of existence involves reserving a server and 10GB of disk space (about $6/month). This cost is incurred if you leave it empty and do nothing with it the entire month. Serverless things typically can scale cost closer to zero if real usage reflects that.

      • latchkey 4 years ago

        I'd love to know of a real world example of a serverless postgres.

        I'm not sure how much closer to zero you can get at $6/month.

        • antifa 4 years ago

          CockroachDB and several companies offer a very competitive free 5GB postgres or almost postgres DB, some companies like fauna offer 5GB of suitable NoSQL, but most of them leap straight to $25/month to exceed the free tier. There's almost nobody out there who is offering to charge me less than a dollar per month if I only need less than a dollar worth of storage, read/write, bandwidth, open connections, etc.

satyrnein 4 years ago

The site sounds neat, and using the site to answer this question about itself is amusingly meta. I'm not sure if M$ are 1:1 dollars or even real money, so I wanted a FAQ or something, but there's only a sign up form (on mobile), which I'm certainly not going to do for something that might just be a Hollywood Stock Exchange type of game. This is all probably intentional, but letting you know just in case!

jasfi 4 years ago

You could use PostgreSQL with your own back-end. Related to this, I'm working on an SDK for Flutter that works via REST and is quite easy to use: https://nexusdev.tools/

tqkxzugoaupvwqr 4 years ago

Tip: You can avoid saving the avatar url if you use a predictable url like /avatars/:userId instead of something like /avatars/:randomToken

gerardnico 4 years ago

https://fly.io/docs/reference/postgres/

xrd 4 years ago

Do you mean Firebase or Firestore?

If you mean Firestore, what about minio? It's incredibly scriptable and awesome.

If you mean Firebase, use rxdb and connect over graphql (hasura is fantastic) to your postgres database. It can be a little work to understand how the models all map into the database, but once you get it, it's magical.

Both are easy to self host. I run my entire stack of all components on dokku, so I get easy logging, backup, and can migrate to a new host in a few standard commands.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection