Good Hygiene Starts on Day One, Not When You're Ready to Ship

The temporary choice that nobody revisits is the most dangerous pattern in software.

I stumbled onto a Microsoft Azure SQL tutorial the other day. Standard stuff: LangChain integration with SQL vector stores. Load a dataset, chunk it, generate embeddings, run similarity search. The kind of post that lives on devblogs.microsoft.com and nobody thinks twice about.

Then I looked at what they used as demo data.

All seven Harry Potter books as plain text files, hosted on Kaggle — a public dataset platform where anyone can upload and share data, no vetting required.

Not excerpts. Not public domain text. The full copyrighted text of seven novels by an author who is, whatever else you think of her, famously litigious about her intellectual property. Uploaded to Kaggle by a random user, linked from an official Microsoft corporate blog, loaded into Azure Blob Storage, and used to build a Q&A system and a Harry Potter fan fiction generator.

The post was published November 19, 2024 by Pooja Kamath. It was pulled on February 19, 2026. The Wayback Machine’s last snapshot from January 5 still shows it up. Fourteen months of pirated novels sitting on a corporate blog belonging to one of the largest companies in the world, indexed by search engines, before someone noticed or cared.

And Microsoft is a defendant in the New York Times v. Microsoft lawsuit over copyrighted training data. Their own blog team handed opposing counsel a gift.

Someone needed a dataset for a tutorial. They grabbed the most convenient one, copyrighted novels from Kaggle, because it was right there and it worked. The tutorial shipped. Nobody went back to swap in a proper dataset before publishing. This is devblogs.microsoft.com. Microsoft’s official developer blog, branded with the Microsoft logo, published under the Microsoft domain. Not someone’s personal Medium post.

Fourteen months later, it was still there.

The temporary choice became the permanent one. Nobody decided it should be permanent. Nobody decided it shouldn’t be, either.

I keep seeing this pattern. Not in blog tutorials. In infrastructure.

A team spins up a Postgres instance for a new service. They create a shared dev_admin password, put it in a .env file, and get to work. The password is changeme or the project name or something equally guessable, because it’s just dev. They’ll set up proper access controls later.

Later never comes.

Six months in, the service is in production. The password hasn’t changed. It’s in a config file, maybe a wiki, maybe someone’s Slack message history. Eight people know it. Three of them have left the company. The database sees one account connecting, dev_admin, regardless of who’s actually running the query. The security team, if they look, sees nothing useful.

This isn’t hypothetical. This is the default at most companies I talk to. Not because they’re negligent, but because the insecure path was frictionless and the secure path required effort they planned to invest “later.”

The Harry Potter dataset and the shared database password follow the same logic. Grab what works for the prototype, ship it, never go back.

The problem with fixing things later is that later has its own problems. Once a shared credential is in production, replacing it means coordinating across every service and person that uses it. The longer you wait, the harder the migration gets. And the harder the migration, the less likely anyone actually does it.

Credential rotation every 90 days doesn’t fix this. You rotate the password, and the new password gets shared the same way the old one did. The rotation creates the appearance of hygiene without changing the underlying problem: nobody knows who’s actually connecting.

This is how auditors find you. They don’t ask if you have a password manager. They ask: can you show me who accessed this specific table on this specific date? If your answer requires cross-referencing five systems and still ends with “probably someone on the backend team,” you have a finding.

The Microsoft tutorial could have used Project Gutenberg texts from day one. Thousands of freely available books, no legal risk, same technical demonstration. It would have taken the same amount of effort. The choice to use Harry Potter wasn’t a deliberate decision. It was the absence of one.

Database access works the same way. If your tooling makes the secure path the easy path from the start, there’s nothing to retrofit. Tie every database connection to a real person’s identity from the beginning, not a shared account, not a service credential that six humans also use, and you have an audit trail that actually means something. You don’t have to bolt it on later.

The alternative is what most companies do: start with shared credentials, promise to fix it later, and discover three years on that “later” now requires a multi-quarter migration project that nobody wants to fund.

This is a big part of why I’m building rmBug. Every database connection tied to a real person from the start. Not because identity-based access is a new idea, but because it’s how every other system in your stack already works. SSO for applications, IAM for cloud resources, certificates for service-to-service communication. Databases are the holdout. The fix isn’t heroic cleanup after the fact. It’s better defaults.

If Microsoft’s developer blog can’t catch a pirated dataset in a tutorial that’s been live for fourteen months, what’s sitting unnoticed in your database infrastructure?

Not because your team is careless. Because the defaults made the wrong thing easy, and the right thing required a decision nobody made on day one.

The temporary password. The shared credential. The “we’ll set up proper access controls when we’re closer to production.” These compound quietly until someone, usually an auditor, asks the question you can’t answer.

Start on day one. That’s it. That’s the whole thing.