Ask HN: How do you managing staging database content?
You have a staging environment, or maybe a bunch of them, that you push code to. The code is short-lived but you have a database as well. What do you put in this staging database? Some options I can think of:
1. Staging DB spins up empty. You create a user during use, probably data is never cleaned, reproducing prod issues is kind of a pain.
2. Staging DB is populated with some dummy data from scripts, possibly as part of deployment. Nice but you have to maintain the scripts.
3. Staging gets a copy of prod. Great for reproducing issues from prod, and possibly viable at small scale, but has some security issues - you'd probably need to censor some columns.
Perhaps there are other options, or ways to alleviate the pain here? It depends on where the rest of the team is. Have they gone through the experience of having a persistent staging environment that slowly drifts from production (1)? If they haven't, they can't possibly understand why that is a bad idea. I'll just go with the flow until they realize. Maybe I'll hint at the possible issues sometimes. Have they figured out that copying prod is a bad idea (3)? If they haven't, same thing. They can't understand why that sucks and why that's not true reproducibility. Finally, (2). Fixtures! It's also a journey. There are so many things that can go wrong. Knowing those things depends on having gone through those journeys with a persistent staging and production copies. There is no relief from the pain. No magic bullet. No product or solution that will ever solve this. You have to go through those stages. If you're lucky, someone will guide you through them (in practice). The journey can be sped up, but I haven't seen a shortcut that works (like forcing the team to adopt a practice without them internalizing it). If you go with (2), and you add a DB column in a PR, I suppose you include a script to update staging with the new content at the same time? That is database schema. The post asks for opinions on database contents. They are related though, so it's a valid question. > I suppose you include a script to update staging Essentially, yes. But if I write a script to do it on staging, why don't I write one for prod as well? Maybe I could even go beyond that. Make a simple tool to handle those scripts for me (that's migrations). It requires some version control and deployment discipline, which is something you probably wouldn't have if you're adding columns directly on the prod db. I learned very early that no developer should have access to prod db. Not even read access. No writing, definitely. Schema changes by a single developer outside version control should be impossible. 4) is same as three but you take a subset of the data with correct relations, anonymise it carefully and use that, it has a lot of complexity at scale but is about as faithful as I’ve seen, it’s also one of those tools that forever requires maintenance as the core db mutates. For local devs, seeding the database with plausible correct data works pretty well. Also 2, as part of migrations: There 3 types of migration files/scripts: structure, basic-data, dummy-data. structure - new table, add column goes here
basic-data - e.g. default config values go here
dummy-data - gets used on local and stating Run migrations with a flag to include the dummy-data migrations. +1 for this approach. The big benefit here is not just your staging database, but everybody's locally running databases also get prepopulated data which is really useful for development and testing.