Show HN: Fincher, a steganography tool for text
github.comVery interesting tool, although storing as typos does seem to be a bit visible and prone to mistaken 'correction'. Other approaches to consider might be:
* Changing punctuation for visually identical, but different characters. This would not work for printed documents however.
* Encoding only 'believable' typos, e.g. it's its. You could encode a binary stream across all instances of it(')s, or other substitutions.
* Encoding the stream in whitespace, e.g. Two/One spaces after a full stop. Printed documents would be lossy though (as full stops at line endings would be ambiguous). There are error detection/correction systems that can help though.
Typical OCR errors would be interesting too: confusion between the letter "n" with the letters "ri" for example.
It would be visually challenging to detect (and also, maybe, difficult for an OCR engine).
Yeah, I need to work on making the displacements and replacements a bit more context-aware (& probably linguistically aware). There are cases where it can "replace" a character with the same character, for example.
I do like your idea about visually similar but distinct character replacement. That would be a really fun one to implement.
I worked on something very similar, my version also mutated punctuation and common phrases/words with synonyms and sentence re-ordering. Instead of steganography the purpose was to create identifiable mutations in text acting as a canary to tie disclosures back to specific recipients. Each party receiving a confidential document had slight mutations unique to their own document and given a copy/paste from a fairly small fragment(s) could be used to identify the owner of the version.
This seems like a useful tool. Is it a product?
No Sorry it was constructed to catch an employee leaking confidential company information to media. I do not know how you could make this into a product and still maintain its reliability -- the more widely known the mutations are the easier it would be to mitigate the watermarking.
I did one of these many years ago, basically just abusing lex/flex: https://github.com/countrygeek/stegparty/blob/master/stegpar...
This is similar to steganos (https://github.com/fastforwardlabs/steganos), which tries to limit itself to changes that do not change the meaning of the text.
Oh, very cool! I like the data model for the changes. I've been thinking about adding an analysis pass using something similar to make it possible to implement more sophisticated strategies. The tricky bit will be retaining the stream-based approach.
first crystal codebase I've seen! niccce.