GitHub and Medium take down database of ICE employee LinkedIn accounts

79 points by nathanielks 8 years ago · 60 comments

Reader

Q "But don't we sell the same kind of data to our paying customers?"

X "Yes, but when we do it, it's not doxxing"

Q "Why not?"

X "Because we would never do it for any nefarious reason"

Q "What about that time we scraped the data of those groups that were linked to that reporter, so that one company could do oppo research and prepare a PR counterattack, targeting specific journalists"

X "That wasn't nefarious. We made a lot of money doing that"

friedman23 8 years ago

>X "Yes, but when we do it, it's not doxxing"
Was this just you making up someone's response? Because neither of those companies listed in the title sell people's info.
Your comment is just a terrible strawman.
tinus_hn 8 years ago

‘Doxxing’ is such a useless term anyway. In this case all the database is is the search results for ‘ICE’ on LinkedIn. That’s data these people filled in themselves. But now that attention has been called to it it’s ‘doxxing’.
The phonebook is full of ‘doxxing’. Who cares.
- erric 8 years ago
  
  Not to mention that most govt. employees record of employment is public info( in the us anyway). Being law enforcement this may be a little different, however this info may be foiable or even just sitting on the ice web site.
qbaqbaqba 8 years ago

Are you referring to the Facebook-Cambridge Analytics controversy?

allenz 8 years ago

Sam Levigne is a performance artist. The context for this work is that ICE plans to increase its surveillance of immigrants, including surveillance of social media accounts. Sam argues that if this level of transparency is too invasive even for our public figures, then why do we demand this for prospective citizens?

After a public backlash, ICE recently suspended its Extreme Vetting Initiative, which would scan social media history and automatically flag people for deportation based on the exact criteria from the original Muslim ban. The Brennan Center discusses why this is bad: https://www.brennancenter.org/analysis/ice-extreme-vetting-i...

ICE will still require five years of social media history: https://www.cnn.com/2018/03/29/politics/immigrants-social-me...

simula67 8 years ago

> “I think that’s a totally valid question to bring up,” Lavigne said earlier today about whether the database could be used for targeted harassment, “but I think that the information is already out there, and if people want to embark on individual campaigns of harassment, then they’re going to be doing that no matter what.”

That does not mean you have to make it easier

erric 8 years ago

>That does not mean you have to make it easier
He’s calling attention to the appalling policy choice made by his government. In this case I think he would have to make it easier.
qop 8 years ago

There can sometimes be a fine line between "enabling" and "meddling"
I mean, surely you can't expect a site like GitHub with thousands of projects happening every day to be able to keep track of each one and make sure that something bad isnt being done with the project.
- txcwpalpha 8 years ago
  
  What? Yes you surely can expect that. That's what moderation is. That's what every site on the internet that allows user-submitted content does. Thats what Twitter does, YouTube does, Facebook does, Reddit does, and HN does. That's what GitHub does. GitHub is ultimately responsible for everything that is hosted on their site, and so of course they track each one and monitor them for violations of their policies. If they couldn't do that, they wouldn't be in business.
  - MichaelMoser123 8 years ago
    
    Then how comes that gitlab is still hosting this content? Do the have a different policy? Or is it they just didn't notice?
    
    txcwpalpha 8 years ago
    
    GitLab isn't still hosting the content. They too have taken it down.
  - ben_w 8 years ago
    
    Your examples are explicitly social sites (networking or forums) where moderation is done by the general public seeing and responding to content.
    GitHub has a slightly different dynamic, being work focused, and there being no reason for most people to delve into random repositories and flag inappropriate ones.
    Heck, even the underlying tech assumes problems fixed with patches rather than repository deletion, so if anthing it’s half way between Wikipedia and Geocities.
    
    txcwpalpha 8 years ago
    
    >Your examples are explicitly social sites
    I guess you don't realize this, but GitHub is a social site too.
    >where moderation is done by the general public seeing and responding to content.
    And this is the same at GitHub, too. Have you never noticed the "Report Abuse" buttons for PRs, users, comments, etc?
    > GitHub has a slightly different dynamic, being work focused, and there being no reason for most people to delve into random repositories and flag inappropriate ones.
    Except you're looking at a reason right here. GitHub is also used for sharing your work with others, as was the case here. There are instances where such work is against community policies/guidelines, and in those instances, GitHub takes them down. I'm not sure where you're seeing the disconnect here. It's not any different at all than making a post on Reddit and a moderator removing it or someone reporting said post and then it being removed.
    
    ben_w 8 years ago
    
    > I guess you don't realize this, but GitHub is a social site too.
    Which is why I say “explicitly”. Github has social as what feels like a bolt-on afterthought because everyone else was doing it and it’s a buzzword.
    > And this is the same at GitHub, too. Have you never noticed the "Report Abuse" buttons for PRs, users, comments, etc?
    Nope. Never needed to look for one. However, rather more importantly, I have just made a deliberate look for a “report repository” link…
    …and found nothing.
    > Except you're looking at a reason right here.
    Key word being “random”. Twitter has trends, Reddit has its front page, public Facebook posts can and do go viral. GitHub has such a list, but you need to go looking for it — you don’t have random stuff thrown in you face whenever you use it like the other platforms, so there is _much_ less opportunity to train a learning algorithm to automatically filter anything. I’m not sure you could even train such a model now, with perfect data, because that would involve understanding the purpose of a repo rather than sentiment analysis of natural language.
    The appropriate people only found out about this repo because the person who made it did so with the internation to be noticed.
    And I’m not saying there shouldn’t be or even that there isn’t the capacity to take things down. I’m saying comparing repos to tweets is like comparing apples to grenades — they both “keep the doctor away”, but for the most part, treat them differently.
    
    txcwpalpha 8 years ago
    
    >Which is why I say “explicitly”. Github has social as what feels like a bolt-on afterthought because everyone else was doing it and it’s a buzzword.
    If that's the logic we're going by, then Facebook isn't "explicitly" social either, as it originally started out as a photo site and just had comments "bolted on" as an afterthought. And yet it still has moderation. So again, I don't even know what your point is.
    >Nope. Never needed to look for one. However, rather more importantly, I have just made a deliberate look for a “report repository” link…
    So because you personally have never reported anything, means that the community doesn't report things? I don't think you know how things work...
    >…and found nothing.
    You must have not looked very hard. Repositories are linked to users, and thus to report a repo, you report a user. And in case you have trouble finding it, the link to report a user is one of the first things you see when you open their profile, just underneath the profile picture and name.
    >Key word being “random”. Twitter has trends, Reddit has its front page, public Facebook posts can and do go viral. GitHub has such a list, but you need to go looking for it — you don’t have random stuff thrown in you face whenever you use it like the other platforms
    What are you even talking about? GitHub's discover repo feature is one of the very first things you see on the GitHub front page. There is literally a giant banner dedicated to discovering new projects right there in front of you when you first open up GitHub. If you do a Google search for "GitHub", the "Explore" link is the first link that is shown to you under GitHub.com. You don't have to go looking for it at all.
    >so there is _much_ less opportunity to train a learning algorithm to automatically filter anything.
    Who said anything about training any kind of algorithm to do anything? We're talking about reporting and moderating content. Nobody even mentioned a learning algorithm.
    But hell, if we're going to bring it up: GitHub does have this. Again, on the front page of GitHub, if you click on the link in the giant banner that suggests you explore more repositories, one of the places it takes you is a list of suggested repositories that it suggests to you based on other repos that you have starred.
    >The appropriate people only found out about this repo because the person who made it did so with the internation to be noticed.
    The appropriate people found out about this repo because the person who made it shared the repo, because GitHub is a social site meant for sharing code, just like thousands of other people use it all the time. Just go view Show HN and see how many link to GitHub: https://news.ycombinator.com/show
    I really have no idea what your point even is, other than for some reason you seem to be trying to draw some distinction in GitHub's social features against other social site's features. There is no need for such distinction, because at the end of the day the same type of moderation is still happening.
bassman9000 8 years ago

Precisely
Hey, people are still going to kill each other, so I'm going to leave this gun in here

zaarn 8 years ago

> The database included information like job title, profile picture, and general location of work.

Ah, yes. The White Knight of Severely Violating People's Privacy because You're Right and They Aren't. Truly the most moral person in the world.

In the current political situation in the US, afai can judge it, will enable harassers to seriously harm or even endanger the lives of these people for the crime of having the wrong employer (they might not even be involved in any of the bad crap you see on TV but who cares, wrong employer!)

In other countries or, for example, the EU, such behavior would be a crime, end of story. And you'd be responsible for the damage that comes from doxing people.

Saying "the information is already out there" is no excuse. It's like a swatter saying "it was just a prank".

mad_tortoise 8 years ago

If you sign up to be in the police, or ICE, or any such governmental arm used to jail the poor and protect the rich, and are passively accepting these policies you are supporting them. As such by supporting these despicable policies those enforcing them are scum in my books, and if they are doing these things I see nothing wrong with publishing their information. Maybe now they will be as afraid as the people they persecute.
- zaarn 8 years ago
  
  That still gives you no right to expose them to such harm. I simply do not care what you believe, it's wrong.
- exegete 8 years ago
  
  What if you're an law enforcement agent and do your job ethically (refuse unethical orders, etc.)? Or what if you're pressured by your supervisor to do things that unethical and know that the consequence of not doing those things is getting your and your family's lives ruined? Should your private info be exposed so you can be harassed?
  - eucitizen 8 years ago
    
    i’ll leave this here
    https://www.washingtonpost.com/archive/opinions/1979/10/21/t...
eucitizen 8 years ago

a) the eu is not a country
b) it certainly is not illegal in germany, france
ah, my other comment got flagged immediately. what a shame, @dang
- zaarn 8 years ago
  
  Your other comment is simply just wrong. I doubt every single ICE employee is part of the most horrible acts they carry out.
  And still, this gives you or anyone no right to violate the privacy and endanger their rights. That just puts you on the same level as them, willingly destroying the lives of others because it brings you pleasure, because you believe it to be justice.
  Regarding a), I recommend to examine the syntax "Other countries or, for example, the EU", which specifically does not refer to the EU as a country due to the occurence of the word "or".
  Such behaviour is also certainly a crime in germany (§238 StGB and related Articles); if your actions cause someone to die or injury or the threat thereof, you can be prosecuted for it (though you need to actively sue either a specific person or against anonymous).
  - eucitizen 8 years ago
    
    Paragraph 238 StGB absolutely does not hold here (I know from first hand). Also semantically you formed your phrase in such a way as though Europe as a whole had common legislation which is not the brightest thing to say.
    But what do I expect from a community where there’s at least one thread daily speaking favourably of breitbart.com? Lastly, ICE employees should think hard about the direction their upper ranks are going. They absolutely have a choice to not become 21st century Gestapo. It’s not too late.
    
    zaarn 8 years ago
    
    Even if 238 does not hold, of which I'm doubtful, you still indirectly exposed another human to potentially deadly harm. I will not condone such immoral and unethical actions.
    I simply do not care if you believe they are the next Gestapo. If you want to be better than Gestapo, don't expose them other people, who are potentially innocent, to harm. End of story.
    Mob rule always leads to bad outcomes and this is encouraging mob rule.

staticelf 8 years ago

So if I get this correct; this person wrote a program to gather information about individuals that may or may not be responsible for things related to the government agency and published that list with the intent that people who are angry about that should do what exactly? Contact them? Threat them? Stalk them? I see no other possible outcome of this action.

Either you:

1. Don't care

2. Use this information in a bad way, like harassing or stalking the individuals that are doing their job (I'm assuming).

I don't understand people who do this kind of things and also expect support and sympathy. You have none from me at least. It is great that Medium and Github removes such databases to protect individuals that probably are innocent.

tomaha 8 years ago

Sadly I think this is the new way of doing things. It gets more and more acceptable to crucify people using social media. But at the same time I don't agree with your conclusion that this makes it right for Medium and Github to remove this. They should keep out of the judging/censoring business. Steam is sadly the only good example of how it should be done at the moment.
Edit: Because really otherwise they should remove the information from LinkedIn (or make it non-searchable) which also makes it very easy to get and they don't do that.
- staticelf 8 years ago
  
  Yes I agree with you that sadly this is the way people do stuff today. An allegation is more important than truth it seems sometimes.
  That said, we must protect people that are having their information leaked with a nefarious intent. In Sweden for example, you can take this information find out their addresses and social security numbers since all of this is basically public information.
  Github and Medium simply don't want their platforms to be used to harass and stalk people, which is a real problem that anyone with experience can attest to.
mad_tortoise 8 years ago

I support the people who do these things, as I think they should live in the same amount of fear as those that they unfairly persecute. They are all complicit in the acts of violence and oppression against innocent civilians. The capitalist system, especially in the USA is designed to oppress the poor and protect the rich. It does this through the use of agencies like ICE. Pretend it's not true, but the facts over the past century overwhelmingly speak for themselves.

azertyxxx 8 years ago

For the sake of completeness and putting moral questions aside, what is currently the best way to publish similarly questionable information in a hard-to-censor way?

TheDong 8 years ago

BitTorrent is fairly hard to censor.
Mega.co has an okay track record at this point, though it's centralized obviously.
Tor hidden service offering a link to the files + bittorrent magnet link may be the best option.
VMG 8 years ago

Tor hidden service
(No not Blockchain)
simplecomplex 8 years ago

BitTorrent with DHT, upload to lots of places.
confounded 8 years ago

pastebin type services are often popular
eucitizen 8 years ago

IPFS
https://ipfs.io/ipfs/QmfBDkAQhgVd1nbREXGocVJr48jLp42zX8gEJKg...

TravelTechGuy 8 years ago

So the author copied data from one Microsoft site (LinkedIn) to another Microsoft site (GitHub).

Both sites use PII to target people and companies: how many annoying emails have you received from LinkedIn this week? And I mean the creepy ones, suggesting contacts based on minute details from your profile, or encouraging you to import all your contacts so they can be spammed?

But I guess if a user does it, it’s doxxing. I wonder if this is political, or maybe Medium and GitHub are just trying to avoid a potential fight with a federal agency.

Ntrails 8 years ago

> I mean the creepy ones, suggesting contacts based on minute details from your profile, or encouraging you to import all your contacts so they can be spammed?
None at all. The only emails I get from linkedin are friend requests and message notifications. That's how I set the thing up.
I'm not sure that's super creepy?

Animats 8 years ago

And now the Verge article has been scrubbed of the link to the Gitlab copy of the "ice-linkedin" repository.

The real casualty here is going to be Linkedin. They don't publicize much how easily their data can be acquired in bulk.

SauciestGNU 8 years ago

You mean the repo that can be found here[0] for the curious?
[0] https://gitlab.com/marge_innovera/ice-linkedin/
luckydata 8 years ago

You mean legally?
- Animats 8 years ago
  
  No, that this will probably cost Linkedin their users who work for government or government contractors.
  - tomnipotent 8 years ago
    
    A user violated the TOS and wrote a screen scraper that pulled down public profile data. Yeah, real "easy". You cannot protect from this sort of behavior outside of completely disabling this functionality altogether. Anything a human has access to, so does software.
    
    johnnyfaehell 8 years ago
    
    > A user violated the TOS and wrote a screen scraper that pulled down public profile data. Yeah, real "easy".
    Yep, that's pretty easy to do. But I'm pretty sure this was even easier since I don't think they wrote a screen-scraper. They just accessed a JSON endpoint.
    > You cannot protect from this sort of behavior outside of completely disabling this functionality altogether. Anything a human has access to, so does software.
    You say that like it changes anything. People don't care if it can be protected against easily, people just care if it can happen at all.

King-Aaron 8 years ago

So, I have to ask as it's not noted anywhere in the article... What does the acronym ICE stand for in this context?

seanhunter 8 years ago

Immigration and Customs Enforcement
kevingadd 8 years ago

United States Immigration & Customs Enforcement, the agency currently responsible for separating children from their parents at the border. (Specifically, that seems to be the motive for publishing employee information - there is a long list of historical grievances people have with ICE, but that's the main topic of discussion these days)

cozzyd 8 years ago

I don't get it... the author has a web site of his own. If he wants to distribute information, why not put it there instead of relying on third parties? (not offering any opinion on whether or not the data should be distributed, just questioning the means).

zdragnar 8 years ago

Throwing it in a git repository ensures that every person who clones it can readily republish it. Hosting said repository on Github, when anticipating an utterly massive spike in traffic, is an easy way to not have to pay for said spike in traffic (either from provisioning or data transfer).
I didn't look at what format the "database" is in, or if the size would make it (im)practical to simply zip it up and email it around, but if the format isn't readily consumable by non-technical people, there wouldn't be any reason to not utilize a tool like git anyway.
- JetSpiegel 8 years ago
  
  Why not host the repo on his site? At least a read only copy.
  `git clone --bare` is enough.
  - zdragnar 8 years ago
    
    Git itself was a bit of a red herring, even though it was somewhat relevant to that specific point. I know nothing about what hosting platform the author is currently using, so to make a quick assumption:
    - hosting on AWS is not free for super high traffic (assuming the free tier can't keep up) - serving files from S3 is not free (though it's cheap enough at low read levels, it adds up)
    At a typical level of traffic, the author's current host may be sufficiently inexpensive. Assuming the author was assuming many, many times the usual traffic (even if everyone is kind enough to bare clone), it would be a pointless expense.
    Of course, third party hosting can take the content down... and this is where git became relevant. Assuming the author was more interested in distributing the content than the prestige of being the distributor, even though Github etc. took down the repo, every person who has since cloned is now capable of re-publishing to any new upstream repository of their choosing, on any server.
    Assuming, again, that all of this was the goal, it probably made sense to utilize the free, fast, scalable third party hosting as long as possible rather than risk self-hosting slowing down or collapsing under traffic, or creating a massive spike in cost.
    That's a whole boat load of assumptions, any of which could be wrong. In the realm of possible motivations, though, I think it's a fairly logical conclusion.
deftturtle 8 years ago

Exactly what I’m wondering. It’s not even a real fight when you know it’ll be deleted.

dnautics 8 years ago

Is there a broad right to privacy if you work a job that is funded by the taxpayer? I know the states of Maryland and California disclose the salaries of all professors, postdocs, and grad students, and I believe generally public servants as a class.

Settings

GitHub and Medium take down database of ICE employee LinkedIn accounts

Keyboard Shortcuts