Fix: remove a ZERO WIDTH NO-BREAK SPACE in front of an inline literal by JulienPalard · Pull Request #332 · spyder-ide/spyder-docs

This is literally the smallest PR I've ever done.

It removes a zero width no-break space.

But this char was breaking the inline literal next to it, see in this page, the ``is_dark_font_color`` should have been interpreted by Sphinx and rendered in red:

The removed character is obviously not rendered in github "files changed" interface. Not in git diff, and git show --color-words either. Not in your editor, and not in your terminal, ... The character is a space. And a space with no width!!!

If you really want to see it, a git show | cat -A can be helpfull, you'll see something like:

-in the ``mainwindow.py`` file we import the M-oM-;M-?``is_dark_font_color``
+in the ``mainwindow.py`` file we import the ``is_dark_font_color``

But the paragraph is way longer than that so it's a bit hard to spot.

For the curious the M-... notation denotes bytes in the range [128;255]. The 32 first of this range are then treated as if they were in the range [0; 32] and displayed using the ^ notation, so \x80 is M-^@, and the other ones are just substracted by 128, so \xa0 is M- (yes a space).

So M-o is \x6f + 128 (\x6f is the value for o in the ASCII table) = \xef. M-; is \xbb and M-? is \xbf. Gives us the sequence \xef\xbb\xbf.

Still curious? The file is encoded using UTF-8, so to decode this UTF-8 sequence we need to extract relevant bits from it. In binary it looks like:

11101111 10111011 10111111

The leading 1110 means "There's 3 bytes for this char" (Count the ones, three ones → three bytes. The zero is just a delimiter). The trailing two bytes starts with "10" meaning "we're trailing bytes".

If we drop those markers (1110 and 10 in front of bytes) and keep the remaining bits we're left with 1111111011111111, which evaluates to 65279, which is in hexadecimal0xfeff. Yes, you recognize it, it's a BOM. Because yes a BOM is just a ZERO WIDTH NO-BREAK SPACE, isn't it beautiful?

Do we really have to do the bit manipulation to discover what this character was? Obviously not, just use emacs' M-x describe char on it:

             position: 4646 of 14699 (32%), column: 380
            character:  (displayed as ) (codepoint 65279, #o177377, #xfeff)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0xFEFF
               script: arabic
               syntax: w 	which means: word
             to input: type "C-x 8 RET feff" or "C-x 8 RET ZERO WIDTH NO-BREAK SPACE"
          buffer code: #xEF #xBB #xBF
            file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix)
              display: by this font (glyph code):
    ftcrhb:-GOOG-Noto Naskh Arabic UI-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x5D5)

Character code properties: customize what to show
  name: ZERO WIDTH NO-BREAK SPACE
  old-name: BYTE ORDER MARK
  general-category: Cf (Other, Format)
  decomposition: (65279) ('')

And this is literally the longest PR description I've written.