Accidentally destroyed production database on first day of a job
reddit.comFor us, one of our junior Devs managed to wipe out all environments, all datacenters for one of our Elastic instances. They got handed a task to modify an index. The Dev Lead and Sr. Devs were 'too busy' and she stack overflowed how to do it and it was rubber stamped. What she did worked - but it wipes out all the existing documents when she dropped and recreated the indexes. Issue was it was a large enough collection that it was not apparent that bad things were happing to the documents on the system. She scripted it up once she thought her test worked and everything went away as the automation ripped through systems. Three days later we had everything restored. The good news is the older system that the new release was replacing was still operational, so we did that.
A few learning experiences. Elastic was brand new to our mix, so not a lot of domain knowledge there. We discovered how dangerous a handful of curl commands could be with the 'stock' permissions the developers had and fixed that. Also became a nice conversation on code reviews, signoff, and deadlines. Also about actual DR readiness and how long it takes to actually restore. We got a lot of value out of that mistake.
She is still my favorite developer. I'd steal her for my personal team any day had she not been stolen from our group by another a couple years later. (shakes fist) As one would expect, she apologized and learned the lessons. She was one of the youngest we ended up giving root to a 12B document prod instance because I knew she would do it carefully and correctly.
Nice story.
Your dev lead/senior engineer should have never been "too busy" for letting someone do that on a first day unsupervised
Never too busy to fix a fuck-up that could have been prevented.
She had been around for around a year when that happened. The 'first day' story linked by Reddit was someone else. It was her first time modifying an index on ES, however. I include the scrum master and PO to be complicit, as resources were tight and they still pushed for the work to be completed. This one bit us hard. The team was broken up and moved on. She was the only one I personally asked to have as a direct report.
The blame should be on whoever decided a junior engineer should be given permission to delete the production database on day one. One time at my first job my boss accidentally deleted a staging database, and even he said he never should have had permission to do that.
Lots of red flags there:
- Dev installation guides with credentials to prod.
- Tests that delete everything, not just what they create.
- The obvious of giving access to delete production on the first day.
- Where they lacking backups?
Seems from the post that there were backups but they had never actually tested restoring from them.
If you have backups, but have not recently tested restoration, then you do not have backups.
Test your restoration process people!
I fondly remember accidentally deleting around half of a warehouse's bulk inventory once during a previous job; I was trying to wipe out test data from a test database, and stupid Azure Data Studio was showing the IP address of the test DB server in the status bar even though the active tab was using a prod connection.
Luckily we were pulling full snapshots of the relevant tables into CSV exports (for ingestion into Redshift for reporting), so it was just a matter of grabbing the most recent of those, reinserting the data, and having some of the warehouse workers do some cycle counts to spot check. Still was a nerve-wracking and awkward conversation with my boss, though, lol
Previous post on HN:
I didn't notice until now this is 3 years old. I wonder what ended up happening; I hope he/she is okay.
Maybe this shouldn't be posted here, but any new employee at my company is required to see this and because my company is small, typically everyone watches as well to see his reaction: https://www.youtube.com/watch?v=1aEqd4bl6Bs (subtitles required).
I am technical lead for one a collection of risk systems and trade storages for one of largest banks in the world.
We have simple rules to prevent significant data loss:
1. Deleting data is not permitted and mutable objects are highly discouraged. No deletes/modifications == greatly reduced possibility of data loss from app code errors.
I lied a little bit.
There are functions that can remove data, but they are hardwired to refuse to work unless it is beyond question that they can't remove live prod data.
For example, the function will refuse if the collection is not prefixed with "tmp" or "test.
For production objects we have a system where we basically do CoW and create new versions of the documents and vacuuming system that archives old versions. The archive is preserved for a minimum period of time (2 weeks) to give chance to react in case somebody makes a blunder and screws up some rules to remove too much.
2. Remove write access to the database from every single person. No single employee should have write access to the database. No single employee should bear responsibility of having to work with an account that can let them destroy the database. No employee has access to PROD credentials, which are generated automatically and not present in configuration or application server.
Instead, if you have a need to introduces some changes to the database, an app was created where you can write your code as a kind of job that can modify the database. The job is not allowed to take any parameters (only has a name) so that it is possible to audit what it is going to do exactly. This code then goes through regular development process including code reviews, automated tests, etc. Once deployed to PROD you can go to API and execute the job by name within PROD context.
Data loss prevention is a significant issue that no CTO should "dump" on his/her employees.
Also not accepting responsibility for everything that happens in their department is a sign of lack of leadership.
This post is from three years ago. I wonder what the poor guy is doing now.
His last comment from three years ago says he found a new job shortly after. But who knows? Reddit has a chronic problem with people using it as a creative writing portal, regardless of the subreddit (r/AITA, r/RelationshipAdvice, r/MaliciousCompliance, ...).
/r/legaladvice
The list never ends.
I swear folks post sock puppets just so they can tell at them...
It really never ends. Add /r/TIFU and /r/ChoosingBeggars to the list.
I am glad he left that toxic place.
This is probably made up, but if real, the CTO should have fired himself. Some of the shittest practices I've ever heard of. Why would you use a real persistent database for your test scripts, whether it's a production instance or not? Spin up a lightweight mock server on the fly and autogenerate fake credentials. Heck, one of the test cases should be that you can't just delete the whole thing. Why on earth would you store credentials in an onboarding guide?
Anyway, for a real story, back in 2002 Sears before they totally tanked tried to open a home decor business called The Great Indoors. I was one of their first retail employees hired to work the stock room at a new opening. First day on the job, a week before the place was going to open, someone spends about 30 seconds showing me how to operate some forklift-like crate carrier called a wave or something like that, and I'm supposed to move some stuff to another floor via freight elevator. I accidentally accelerate when trying to slow down and promptly destroy the elevator.
Store manager was irate and promptly spends 10 minutes screaming at me in front of everyone and then sends me home permanently. You know what? 21 year-old me internalized that shit and believed I was actually at fault, but in retrospect, that place was bullshit and both Sears and The Great Indoors deserved the fate of eventually going out of business. The ensuring two decades have been up and down, but I'm in a great place now. I hope, if this really happened, that this junior dev landed all right, too. Life is way too short to stick with a toxic workplace, and when you don't have a family to feed and still have the freedom to just walk away, you absolutely should.
USA doesn’t require forklift licences?
Does but I'm guessing this vehicle somehow qualified under regulatory minutiae as not actually a forklift.
The experience of reading this on reddit today versus what it was like when this was originally posted is astonishing. The comment are what make this post valuable and the view to them is abysmal.
Try using old.reddit.com:
https://old.reddit.com/r/cscareerquestions/comments/6ez8ag/a...
(And there are browser extensions that will turn all reddit links into old.reddit.com links).
(2017)
LOL Insurance fraud ;) That guy was set up
Any chance this story was made up?
If not, I was imagining this could be an elaborate scam. Although the clues seem to point that it was an in-person job, I can imagine in today's climate "hiring" people remotely and pulling a scam like this. The result being to get some cash payment from the victim to avoid legal trouble.
This is the Internet and also Reddit. I think you know the answer.