Reidentification ban is not a solution
blog.lukaszolejnik.comThe details are very important here. Would the proposed ban really affect researchers proving that anonymization schemes don't work, or would it just apply to attempts to reidentify real people in real user data?
It seems reasonable that a company be prohibited from actively trying to ascertain the identity of users who have tried to remain anonymous. The ease of doing it is rather irrelevant. I'm kind of tired of this tech culture meme, that something should be allowed because it is easy. How easy it is to do something is really irrelevant to how legal it should be. As an extreme example, killing a man is rather easy.
EDIT:
Here is the bit from the source document that the blog author is responding to:
>Create a new offence of intentionally or recklessly re-identifying individuals from anonymised or pseudonymised data. Offenders who knowingly handle or process such data will also be guilty of an offence. The maximum penalty would be an unlimited fine.
"intentionally or recklessly re-identifying individuals" seems to limit this to real user data, not researchers evaluating anonymization schemes. As with any law, it is important to see what the eventual proposed legislation looks like, but I don't think there's anything to worry about here for legitimate security research.
Would the proposed ban really affect researchers proving that anonymization schemes don't work, or would it just apply to attempts to reidentify real people in real user data?
There's not a clear line between the two. If a company publishes a list of "anonymized" email addresses, should I be arrested for putting one of the strings into Google to see if it's just an MD5 hash?
The ease of doing it is rather irrelevant. I'm kind of tired of this tech culture meme, that something should be allowed because it is easy.
The full argument is of the form "X is easy to do and hard to detect, so it would require police state tactics to have any hope of enforcing a law against it". The war on drugs is the classic example for this. Murder isn't; killing someone may be relatively easy, but it's usually obvious when it happens and it's hard to avoid leaving evidence of your involvement.
>The full argument is of the form "X is easy to do and hard to detect, so it would require police state tactics to have any hope of enforcing a law against it".
Plenty of crimes go unsolved in most cases. Littering, for example.
When you do catch an internet marketing company deanonymizing data, you can throw the book at them though. Strong penalties can serve as sufficient discouragement to others even if they are unlikely to get caught.
As far as I understand the GDPR, email hashes wouldn't be "anonymous" data at all, they'd be considered pseudo-anonymous (and therefore still PII)
I mean the problem is that this makes good-willed sites like haveibeenpwned.com illegal in the UK (with criminal sanctions) as they attempt to re-identify data that comes from a breach.
But on the other hand, I don't see why processing PII that comes from a data breach with the intent of de-anonymising it should be legal.
Maybe protections should be in place for security researchers, but how do you distinguish between them and malicious actors?
> I'm kind of tired of this tech culture meme, that something should be allowed because it is easy.
It's not just that it's easy; it's that it can be done merely by thinking in a particular way about information that's public or that was freely given to the person doing it. I'm not sure the fact that the thinking is done mainly with the aid of an algorithm changes the fundamental concept.
It probably shouldn't be illegal to think about things or to process data you've obtained legitimately whether it's easy or hard.
And what of the examples in the article about reidentifying Netflix users from public data, or reidentifying people from Australian census data? These two incidents could no longer, legally, be publically written about. We'd be left discussing only theoretical applications (ie "this is why MD5 is weak" vs "this is how you can deanonymize this real-world complete example"), which simply never has the same impact.
Do we have to start posting these on Pastebin instead of Medium now? Can 3rd parties report them during a security audit?
Even if this has all the good intentions of preventing scummy marketers from scraping data, the execution, if history is any indicator, will likely result in a law can be used to throw people in jail for reversing an MD5 hash.
Anything that attempts to ascribe intention to code is going to run into a lot of corner cases; see the long history of "copying" programs vs copyright law.
"Knowingly" is similarly vague: are you knowingly running every line of code executing on your machine right now? How would you be sure?
> "Knowingly" is similarly vague: are you knowingly running every line of code executing on your machine right now? How would you be sure?
That's exactly the point. If you perform the act unknowingly, you're innocent of the offence.
As far as I can tell the Statement of Intent[1] references this only in the following paragraph:
[We will:] "Create a new offence of intentionally or recklessly re-identifying individuals from anonymised or pseudonymised data. Offenders who knowingly handle or process such data will also be guilty of an offence. The maximum penalty would be an unlimited fine."
Following that there is also:
"Create a new offence of altering records with intent to prevent disclosure following a subject access request. The offence would use section 77 of the Freedom of Information Act 2000 as a template. The scope of the offence would apply not only to public authorities, but to all data controllers and processors. The maximum penalty would be an unlimited fine in England and Wales or a Level 5 fine in Scotland and Northern Ireland."
"Widen the existing offence of unlawfully obtaining data to capture people who retain data against the wishes of the controller (even if the they initially obtained it lawfully)."
"Protection for journalists and whistleblowers - The important role of journalists and whistleblowers in holding organisations to account and underpinning our free press will be protected by exemptions."
Which seems more like creating clear legal charges for activity that is already illegal.
[1] https://www.gov.uk/government/uploads/system/uploads/attachm...
Sounds good to me. Following the authors login why make anything an offence? It doesn't stop people from doing it anyway.
It seems like this is intended to stop dodgy marketing companies re-identifying data not hackers. And there doesn't need to be some technical way to know if they've done it. Any company can do illegal stuff and get away with it. They don't because if they are caught (and all that takes is one employee to come forward - and making it an offence to knowingly handle that data makes that more likely) they are in a lot of trouble (in this case an unlimited fine).
Why can't researchers work with fake data sets? If my data has been anonymised I don't care who the person is, I don't want them re-identifying it. Maybe I'm not seeing the necessity for this, and, if it exists I'm sure when the final Act comes around there will be an exception for researchers. Seems like panic over nothing for now.
I was reading about the UK's upcoming GDPR implementation on the BBC earlier, and I assumed the ban on reidentification would apply to service providers and businesses etc., and not to researchers or private individuals with legitimate intentions.
Is this not the case?
The author is clearly mistaken. There are several things that are possible in the physical world yet illegal, e.g. forging signatures, breaking doors open, breaking into parked car, sending spam emails etc. Specifying reindentification as illegal is a great step since it let's legal machinery to do its job.
The reality of data privacy is that it's impossible to guarantee anonymity while keeping data useful.
Reindentification ban enshrine coherent guidelines into the law. It's a good step forward.
I am saying this as a researcher who has signed several agreements with US government agencies which had reindentification ban clause and penalty of felony offense if found violated.
Watch this be used selectively to prevent the public from finding out how capital flows and which politicians it buys.
Publish the name of the owner of the company who built the bridge that collapsed due to cost-cutting? Now now, he didn't want that public, that's reidentification! He even hid behind several shell companies, so you can't claim you didn't know he wanted to stay anonymous.