Settings

Theme

Interactive Text Prediction Explainer

pudding.cool

55 points by codenberg 7 years ago · 11 comments

Reader

elicash 7 years ago

The Mueller report got me thinking.

How hard would it be to put together a machine learning tool that guessed at the redacted material based on:

(a) The context of the surrounding words, and (b) In cases where just a couple words in a sentence are redacted, using the number of pixels to inform a likely combination of letters that would perfectly "fit" that space?

And what would be the legality?

  • keyle 7 years ago

    I haven't read it but I suspect they blacked out full sentences. So your sample data would have to have thousands of other reports from the same author for your suggestion to remotely make sense.

    Even, that would be beyond what's currently possible.

    Some forensic operations over the blacked out sections is potentially more viable.

    Interesting thought though.

  • nmstoker 7 years ago

    I'm no lawyer but it seems hard to imagine it could be illegal as it would be based on supposition rather than fact and only someone in possession of the unredacted report would know for sure if it were right.

    • delish 7 years ago

      While we're speculating:

      If the algorithm predicted, "Then CIA extraodinary-rendition'd $particular_person_of_interest_to_people_with_top_secret_clearance to a black site"

      you'd hope to get a judge who's technical enough to understand that the algorithm didn't "know;" it just "predicted."

      Point being, I don't personally have much faith that the justice system evaluates tech the way we would.

  • gattilorenz 7 years ago

    I don't know about the legality, but with a good/recent language model it could be quite feasible. The problem is getting the good language model.

    • elicash 7 years ago

      I assume you'd have to also give it a list of names that are possibly associated with the specific item.

  • polm23 7 years ago

    You can't do this because language models are not magic.

    Let's say you write a sentence like: "[your name] picked up the book." Censor it to take out your name and let a language model fill it in. It might give you "John picked up the book" or "Mary picked up the book", which are grammatically correct, but it has no way of guessing your name reliably because it has no information about the situation of the real world. Language models work by predicting the most likely filler for a slot - if they can predict something it's not surprising.

    Emily Bender wrote about this on Twitter.

    https://twitter.com/emilymbender/status/1119081131234611201

    If you want to use pixel data to fill in text that's a different approach that could work if they did a poor job with black bars, though it seems unlikely they'd do that.

    • elicash 7 years ago

      Would they need to have done a poor job with the black text? For example, let's say in the sentence: First, ______________ Next, this happened.

      If you know where the comma ends and the next sentence begins, you still know the exact number of pixels in between. Let's then assume that there are no proper nouns in the missing text. There probably aren't very many combinations of letters that fit the space perfectly while still making for valid words, given different letters have different pixel widths.

      But the idea wouldn't even be to come up with the CORRECT answer. It would be to assign a score to different options of what it could possibly say.

      I agree you couldn't do paragraphs.

  • ShamelessC 7 years ago

    https://threadreaderapp.com/thread/1119118085443559425.html

    This guy appears to be attempting exactly this.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection