Ask HN: How do you prevent data leaks?
this is something that i've been pondering for awhile.
at pretty much every job i've been at, we've had Linux Servers that we ssh into.
these linux servers have customer data on them, builds, code, business critical shit.
if i were a malicious individual, i could easily just rm -rf important files, scp important customer data out to my home computer. or use dropbox to sync some critical business data into the cloud... or casually peruse customer data and take advantage of it. and there would be no evidence that i did anything at all.
obviously, it's not possible to handle this from the networking side. SSH is encrypted, and any network level scans would just see a bunch of bits flowing between the server and me.
is there some solution to prevent this kind of linux data leak from happening? is it normal in companies for anyone to be able to just log on to servers, do whatever they want, without any kind of auditing or recording or tracking whatsoever?
i'm curious, with data leaks like the ashley madison one... did they ever find the perpetrator? does anyone even care about securing data....?
what does your company do to prevent data leaks? There isn't a totally bullet/fool-proof way, but it boils down to aggressively locking down computing functions to match closely the job functions expected to be performed by every employee. This requires a thorough understanding of said job functions. For example, why do engineers need to ssh to production machines? If the answer is "to tail logs", then a facility needs to be created that allows the tailing of logs and nothing else. This can be done either by locking down authorized_keys, using restricted shells, or introducing centralized logging (logstash, kibana, ...). Access to outside SSH is a big no-no. Access to outside file sharing (DropBox et al) is a liability unless explicitly required for performing job functions. I've worked with a brilliant security mind (no irony here) who wanted to go as far as provide employees with remote desktop environments only, which were to run in a fully controlled environment. This removes attack vectors such as USB drives, computer theft, and so on. The proposal never flew, but the idea has merit and is thought-provoking in its own right. what do you think about recording every SSH session? You bring up remote desktop environments... its actually a good point... similar to what VM's on AWS are. If the access point (SSH) is locked down and recorded, doesn't that pretty much remove any possibility of employees leaking stuff? knowing that they are being recorded is a pretty big deterrent to leaking data right? I've personally come to assume that everything I do on an electronic device is being recorded all the time, so the mere presence of surveillance isn't in itself a deterrent. Most reasonable people will IMHO realize that no one's going to read through the interminable logs of SSH sessions, 99.999% of which will likely turn out to be most mundane and boring. Apart from that, storing and securing these logs will in itself become a liability. Imagine all the sensitive information that might get caught in those logs, only to be leaked itself in a titanic stroke of irony! Relying on surveillance is folly; simply lock down access and remove privileges that aren't necessary. This is something you do once and never have to think about again, unless some event warrants a review. Thankfully such reviews can be triggered by normal business activity: new project, new employee, new team, new vendor product, etc. yeah. what if one of those privileged accounts gets compromised or an admin goes rogue... i think theres still a use for surveillance. just my opinion. then again. im paranoid. i want the fort knox of data. Automate everything... you should never need to SSH into a server. that can't always be true.
for instance, some of my friends are doing data science. they are constantly SSH'd running jobs on the data, using different tools to understand their data. i think SSH will always exist going forward, and we won't be able to automate everything. any other thoughts on how to prevent data leaks?