Settings

Theme

Melbourne professor quits after government pressure about reporting data breach

theguardian.com

244 points by nbgl 6 years ago · 39 comments

Reader

DarthGhandi 6 years ago

This is horrible but not surprising, the government was told beforehand it was a bad idea and within a few months ended up with egg on their faces. Instead of remedying the situation they shoot the messenger.

Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.

She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.

  • Aeolun 6 years ago

    > she was written off, the attack was deemed impractical as it required a corrupt official

    I think that hit a bit too close to home for most of the government.

oska 6 years ago

Vanessa Teague:

> I can't believe @healthgovau is still saying "The dataset does not contain the personal information of patients." We have shown many of the patients' records can be easily and confidently identified from a few points of medical or childbirth info.

https://twitter.com/VTeagueAus/status/1236402085974798336

  • ShroudedNight 6 years ago

    > "The dataset does not contain the personal information of patients."

    As far as I can tell, 'personal information' is potentially the only thing this data set contains. Further, the information is so personal that the Australian government hoped that it would be infeasible to cross-reference it with other data and use it to identify the persons involved.

  • DEADBEEFC0FFEE 6 years ago

    She might be referring to something link the SLK581 statistical linking method.

    I did some work with it a few years ago, and you easily generate the key.

Thorrez 6 years ago

> The breach so shocked the government, the then attorney general, George Brandis, quickly announced plans to criminalise the act of re-identifying previously de-identified data, although ultimately the legislation never passed before the 2019 election.

If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

  • rs23296008n1 6 years ago

    If you're going to start using logic and reason with this issue then that government will simply outlaw those as well. This government has already set a precedent of having overridden the basic limits of mathematics before. See also: anything to do with encryption.

  • shakna 6 years ago

    > If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

    The letter sent to the university [0], claims that re-identifying information is actually illegal, according to the department's understanding. (Nevermind that they also admit that particular law is completely irrelevant to the work of the researcher).

    [0] https://www.righttoknow.org.au/request/correspondence_on_re_...

    • tastroder 6 years ago

      Fascinating read, thank you. Another kicker in there seems to be bottom of page 2, start of page 3. They basically assert that now that the data has been taken down no harm can be done anymore and to top it off, suggest that presenting the findings a GovData conference because there subsequently is "no public interest" anymore.

  • dathinab 6 years ago

    You don't have to go that far.

    Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.

    In the end the are always intelligent criminals (or foreign countries acting against your country for their interest). So you will always be able to buy deanonymized data on the black market. Even more the people doing it for that reason can include leaked/stolen datasets to do deanonymization invested if just public data making it potentially much easier.

    • throwawayjava 6 years ago

      Consumer data brokers are a legal (non-criminal) business and they would definitely reidentify then well information if it weren't illegal.

  • Aeolun 6 years ago

    This sounds so much like a ‘stick my head in the sand’ policy that it’s unreal. Who the hell would stop doing it because it’s illegal...

DoofusOfDeath 6 years ago

When organizations claim to have "anonymized" a data set, what exactly does that mean?

I.e., do they mean that nobody they talked to could think of a way to recover the identity of even one individual in the set with 100% certainty? Or is there some information-theoretical or legal standard of anonymization they're claiming to have met?

  • MaulingMonkey 6 years ago

    > When organizations claim to have "anonymized" a data set, what exactly to that mean?

    For "organizations" in general? It means approximately nothing, or if you're feeling particularly generous, it means "we probably remembered to drop the column containing your social security number before publishing this data... this time". You're asking exact specifics of a vague and broad category.

    There are some legal standards, information theory, and non-legal organization standards that might being met in some cases - involving adding noise or removing data / making it sparse. https://en.wikipedia.org/wiki/Data_re-identification goes into all the ways that it can go wrong despite the best of intentions. My basic take on this all is: data "always" gets more identifying, not less. Two datasets that were successfully anonymized individually can still be correlated to de-anonymize some or all of the data when combined. Even organizations applying information theory with the best of intentions and proper diligence will eventually make a mistake.

  • emmelaich 6 years ago

    In this particular example, they produced a random number which use the real id[0] as a seed, then mixed the result with the original id. It was not enough, and, as Teague et al note:

    > Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.

    Scroll down from here: https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

    [0] The data had ids for providers (e.g. doctors) as well as patients.

  • throwawayjava 6 years ago

    There are some mathematical definitions [1], but the fundamental problem is that with enough cross-referencing between databases it's hard to say anything for sure [2]. You never know what data other people might publish in the future.

    I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.

    [1] https://en.wikipedia.org/wiki/K-anonymity

    [2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...

  • DEADBEEFC0FFEE 6 years ago

    In hwalthcare there usually an ethics panel, that will look at the data, and look for way to reduce re-identification.

    The common example is the one-legged child with cancer from a remote town. You can remove a the PII columns and it's pretty easy to find that person.

    • rzzzt 6 years ago

      One way around that is to drop all cases below a certain occurrence threshold, ie. if there aren't at least 1000 people in the same town with the same condition, they aren't getting into the dataset.

      (The downside is that rare diseases might fall through the cracks.)

alfiedotwtf 6 years ago

How long before she gets raided and her copy of the dataset and research gets taken away

... all the while as the government forgets that it’s all available on the internet ️

raxxorrax 6 years ago

This is the worst kind of personal data leak. Government cannot keep any data safe. The only way is to not collect the information. The reaction of the government is predictable and poor.

Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.

basicplus2 6 years ago

There needs to be Australian Standards developed that everyone must comply with to annonymise personal data

eloop 6 years ago

If the university followed through on that last paragraph why did she resign?

kop316 6 years ago

To anyone coming to the comments, the title is misleading. The health department is pressuring her "to stop her speaking out about the Medicare and PBS history of over 2.5 million Australians being re-identifiable online due to a government bungle."

aschatten 6 years ago

The title is just horrible.

  • dang 6 years ago

    The best way to complain about a title is to suggest a better one. Better means: more accurate and neutral, preferably using representative language from the article. When someone suggests a better title, we're happy to change it.

    Edit: I've taken a crack at fixing it now.

  • nbglOP 6 years ago

    Yeah, I agree. I had copied it verbatim from the article.

forkexec 6 years ago

Pardon my ignorance, but it seems like there should be standard ways of irrevocably anonymizing data and reversible means given a private key.

Off the top of my head, only the latter is necessary if throwing away a random key for the previous to be equivalent (or run the plaintext through SHA-3 20 times in feedback instead.). Say 100 rounds of AES-256 in feedback. Fixed integer-only fields could be XORed with a private key of the length of the field (OTP).

Any other ideas, please add a comment.

  • akiselev 6 years ago

    Yes, turning data into a bunch of (ideally) random bits using encryption is an effective way of annonimizing.

  • yoloClin 6 years ago

    Just because the primary key is gone doesn't mean other data can't be cross referenced. Birth dates with external sources, addresses with public registers, etc.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection