Settings

Theme

Invisible Characters

invisible-characters.com

131 points by 0xbkt 3 years ago · 35 comments

Reader

orbital-decay 3 years ago

My favorite is U+202E Right-to-Left Override, which doesn't appear to be listed there. A surprising amount of UIs (apps, sites) can be broken with it as they were never tested with right-to-left writing direction in mind. Even a Unicode reference website that I just used to recall the code is broken by it. [0] Entering RLO into arbitrary input forms for fun can bend spacetime, I swear.

[0] https://unicode-table.com/en/202E/

  • emsixteen 3 years ago

    I've blissfully ignored RTL so far, but I know that I shouldn't - I just can't imagine the pain of actually figuring out how to deal with it. :')

    My favourite character is the Greek question mark;

  • iamevn 3 years ago

    > Even a Unicode reference website that I just used to recall the code is broken by it.

    I like to imagine that sites like this have noticed the bug but have the sense of humor to choose not to fix it.

  • andreareina 3 years ago
interroboink 3 years ago

This is another good reason to have a text editor you really trust, which can show you these things. Whether it's different line-endings or weird invisible space stuff, I know I can just open it in Vim and figure out what's really going on pretty quickly. Wasted a lot of time earlier in my life on that nonsense (:

  • TacticalCoder 3 years ago

    I agree with you.

    I've got my Emacs set up to display in "bold, fluo foregound and a dark background underlined by a pink line" (yes, literally that obnoxious) any character which is not part of a list of characters I consider to be acceptable. And it's configured to show any "zero width" character as if it had a width. So any "invisible character" as well as any "invisible zero width character" does appear as a black square, underlined with a pink line.

    And that for any buffer/file.

    • enchiridion 3 years ago

      Can you share that config? Sounds useful!

      • TacticalCoder 3 years ago

        Sure... For a start I have my scratch buffer showing a few Unicode characters, one trailing spacing character on purpose (to be sure I can see it's highlighted), a zero-width-non-joiner 0x200C and an Hangul filler 0x3164 (may add some from TFA btw). This helps me quickly verify, upon startup, that my setup is working.

        I configured all that literally years ago so I don't remember where's what but here's what I've got:

            ;; probably cargo-culted from somewhere
            (update-glyphless-char-display 'glyphless-char-display-control '((format-control . empty-box) (no-font . empty-box)))
            
            ;; See https://emacs.stackexchange.com/questions/65108
            (set-face-background 'glyphless-char "purple")
        
        And then I've got this too (requires markchars.el):

            (markchars-global-mode)
        
        With:

            (defface markchars-heavy
              '((t :underline "magenta"))
              "Heavy face for `markchars-mode' char marking."
              :group 'markchars)
        
        It should get you started.

        (and, yup, I know it's overkill but I like it that way)

  • nervuri 3 years ago

    Vim does not display them all. The only program I checked which displays all such characters is `less -U`. You can test using this file:

    https://gitlab.com/nervuri/nervuri.net/-/raw/master/gopher/z...

    • interroboink 3 years ago

      Thanks for this! Good to add to the ol' repertoire (:

      Looks like the only one Vim misses is U+17B5? Though it there could be more not listed there. Unicode is a deep dark forest.

      ----

      For others readers, here's a non-gopher version of the article linked inside: https://nervuri.net/stega

  • userbinator 3 years ago

    I use a DOS text editor for this, where no Unicode support is an advantage. The majority of the time I'm dealing with plain ASCII anyway.

abrudz 3 years ago

Great for doing tacit programming[1] in JavaScript:

  avg=ㅤ=>ㅤ.reduce((ㅤㅤ,ㅤㅤㅤ)=>ㅤㅤ+ㅤㅤㅤ)/ㅤ.length
  avg([3,1,4,1,5])
  2.8
[1] https://en.wikipedia.org/wiki/Tacit_programming
  • shepherdjerred 3 years ago

    I love point-free functions languages that support them, but please tell me you're not actually do this

csswizardry 3 years ago

https://csswizardry.com/2014/01/use-zero-width-spaces-to-sto...

Mockapapella 3 years ago

A while back I used these kinds of characters to encode programs into invisible text: https://www.thelisowe.com/sleeper-cell-a-method-of-embedding...

It doesn't do much on its own. I feel like it could, but the most effective use case I've come up with it you can invisibly plant a piece of code in some piece of text, then later on run another script that looks for that piece of code and runs it. I'm guessing that splitting the code up like this would make it harder to detect (not to mention that this code could even reside in other programs' comments undetected).

nervuri 3 years ago

Zero-width characters can be used to covertly watermark text and to figure out who copied text from a page and pasted it somewhere else. Server software can encode a hidden number between every few words, which corresponds to a server log entry with your username (if logged in), IP address, browser fingerprint, etc. I wrote more about this here:

https://nervuri.net/stega

I think the best solution to this type of problem would be a clipboard utility that warns you when you copy text which contains hidden characters, homoglyps, rarely used whitespace characters, etc.

ludovicianul 3 years ago

I've built a tool specifically to test if these kind of characters will reach API backends: https://github.com/Endava/cats. My idea was that APIs should explicitly reject or sanitise input containing such characters.

thirtyseven 3 years ago

So I guess the only future-proof solution to check for this is to render user input off screen and count the number of solid pixels, at least until "falsehoods programmers believe about names" gets updated to include "Names must consist of at least one readable glyph".

30minAdayHN 3 years ago

back in 90s on windows, our secret directory used to be alt+255 (it looks like a space but not space i think)

  • lifthrasiir 3 years ago

    Which is a non-breaking space (U+00A0), which is mapped to 0xFF in the code page 437. (You need to put a leading zero to access to Unicode code points, like alt+0255 for U+00FF ÿ.)

  • brunorsini 3 years ago

    I did the same thing. One more of those habits that didn't survive the transition to Mac OS X.

dezen0ts 3 years ago

A great way for QA’s to mess with developers

  • squaredot 3 years ago

    The problem is maybe more when QA doesn't mess with developers, than when it does.

    • bombcar 3 years ago

      If QA doesn’t do it malicious users and crackers will.

      • 8n4vidtmkvmk 3 years ago

        yeah.. please do eff around with the staging environment so that i can get traces and poke at it. prod is too locked down.

franky47 3 years ago

𝅷𝅶 [1]

[1] https://twitter.com/fortysevenfx/status/1599483273864187904

Minor49er 3 years ago

𝅵

saliagato 3 years ago

    
hamiltonians 3 years ago

usefull for impersonation scammers , like on twitter

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection