The Unicode Text Debugger @ unicode.run

2 min read Original article ↗

Characters

Code Points

UTF-16 Units

UTF-8 Bytes


Try an example!

🧑🏾‍❤️‍💋‍🧑🏻 The most complex emoji in the current Unicode standard is composd of 10 code points including skin color modifiers, zero-width joiners, and a variation selector.

S̶t̶r̶i̶k̶e̶о𝘂𝘁 See how combining characters and misusing unusual characters can be used to create interesting text effects and homographs.

Å != Å Learn about composing characters and normalized forms.

‮12345‬ This text renders backwards from the order of its characters using BIDI control code points. Inspired by https://trojansource.codes/.

Hi! ‏(שלום!)‏ This example contains bidirectional text with BIDI glyph mirroring and right-to-left markers. Inspired by https://blog.georeactor.com/osm-1.

↙ ~ ↙️ and 你好! ~ 你好!︁ Examples of an emoji variation sequence and an East Asian punctuation positional variant using variation selectors.

Send me other interesting Unicode examples at @josh@joshdata.me on Mastodon.


About Unicode.run

Text is unexpectedly complicated. Use Unicode.run to debug text.

Here are some things you can do here:

  • See each code point’s escape code in a variety of programming languages.
  • See the “length” of the text as it would be reported in different programming languages.
  • See when characters (technically “extended grapheme clusters”) are composed of multiple code points.
  • Click code points in the debugger output to highlight them in the text. (In Firefox you can also select text to highlight the code points in the debugger output.)
  • Switch between the text and its UTF-32 or UTF-16BE hex encodings at the top of the page.
  • See where text changes direction in bidirectional text, and get warnings when text direction depends on where it is used. Mirrored glyphs in bidirectional text are also noted.
  • Get warnings about hidden code points that can alter the display of the text (see https://trojansource.codes/), invalidly placed combining code points, invalid code points, and characters that are not in normalized form.

This is a project by JoshData.

Source code is at https://github.com/JoshData/unicode.run.

Thanks to ucd-full (based on Unicode 15.1), stdlib-js/string-split-grapheme-clusters (based on Unicode 13), bidi-js (based on Unicode 13), html-entities, and the Inter Typeface.

Nikita Prokopov’s The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!) was inspiration for this project.