Settings

Theme

Show HN: unicode.style

unicode.style

125 points by ekmartin 7 years ago ยท 69 comments

Reader

JimDabell 7 years ago

Screen readers donโ€™t do very well with this, so please only use it for novelty purposes. Otherwise you will unnecessarily be locking people out from what you write.

  • nothrabannosir 7 years ago

    To be fair, combining some of the HN comments:

      $ python
      Python 3.7.0 (default, Jul 22 2018, 21:11:34)
      [Clang 9.1.0 (clang-902.0.39.2)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import unicodedata as ud
      >>> ud.normalize('NFKD', '''It's a kinda ascii-art thing that ๐”ฉ๐”ข๐”ฑ๐”ฐ ๐”ถ๐”ฌ๐”ฒ ๐”ž๐”ซ๐”ฐ๐”ด๐”ข๐”ฏ ๐”ฅ๐”ซ ๐” ๐”ฌ๐”ช๐”ช๐”ข๐”ซ๐”ฑ๐”ฐ ๐”ž๐”ฉ๐”ฉ ๐”ฃ๐”ž๐”ซ๐” ๐”ถ ๐”ฉ๐”ฆ๐”จ๐”ข ๐”ฑ๐”ฅ๐”ฆ๐”ฐ. ๐•†๐•ฃ ๐•๐•š๐•œ๐•– ๐•ฅ๐•™๐•š๐•ค ๐•š๐•— ๐•ช๐• ๐•ฆ ๐•จ๐•’๐•Ÿ๐•ฅ ๐•๐•–๐•ค๐•ค ๐•˜๐• ๐•ฅ๐•™๐•š๐•” ๐•ž๐• ๐•ฃ๐•– ๐• ๐•ฆ๐•ฅ๐•๐•š๐•Ÿ๐•–.
      ...
      ... ๐™ธ๐šœ๐š—'๐š ๐šž๐š—๐š’๐šŒ๐š˜๐š๐šŽ ๐š๐š›๐šŽ๐šŠ๐š?''')
      "It's a kinda ascii-art thing that lets you answer hn comments all fancy like this. Or like this if you want less gothic more outline.\n\nIsn't unicode great?"
      >>>
    
    You'd hope a screen reader would have more effort put into it than a 3 second read of a HN thread?
    • baddox 7 years ago

      I often wonder how many of the people who bring up the capabilities of screen readers are actually familiar with screen readers. (I'm admittedly not at all familiar with screen readers or their capabilities).

    • Someone 7 years ago

      As a counterpoint, selecting that text and choosing โ€œspeakโ€ from the pop up menu doesnโ€™t speak that text on iOS.

      VoiceOver on iOS doesnโ€™t speak it, either.

  • wgjordan 7 years ago

    > Screen readers don't do very well with this

    Badly-implemented screen readers don't do very well with this. The Unicode Standard provides a Normalization Form for Compatibility Decomposition (NFKD / NFKC) that screen readers definitely should adopt in their Unicode implementation [1].

    [1] http://www.unicode.org/reports/tr15/

  • netsharc 7 years ago

    Well, look forward to spammers using this to bypass the spam filters...

    • floatrock 7 years ago

      Also using similar unicode characters in your username for nefarious spoofing attacks or worse -- https://labs.spotify.com/2013/06/18/creative-usernames/

      This is called a Homoglyph attack.

      If you accept unicode for strings that should be "unique" (eg username), there are various normalization schemes that basically convert equivalent-ish looking characters into a consistent hash.

      I have no doubt spam filters use this.

    • TeMPOraL 7 years ago

      Followed by spam filters keying in on use of such alternative/modifier characters and marking them as spam. Better check your spam folder if you get a lot of e-mails with mathematical formulas.

    • vthriller 7 years ago

      Don't they already do that? I can clearly remember subjects that mixed latin, cyrillic, greek and some more exotic scripts to do just that.

  • cryptonector 7 years ago

    That really has to get fixed though.

    • SEMW 7 years ago

      It doesn't and shouldn't.

      The whole point of having Mathematical Alphanumeric Symbols as separate unicode code points, rather than just using normal latin characters with style markup, is so they can be used when the different letters have semantically different meanings -- in particular in maths when ๐˜น and ๐˜… can be in the same formula, representing different concepts. They're not a replacement for style markup.

      In other words, they're different characters specifically so that screen readers can know to read them out loud differently!

      Trying to 'fix' screenreaders by having them read anything that them as if they were normal latin characters, to accommodate people who like using the Mathematical Symbols block for fun in places which only allow plain text, would completely defeat the actual purpose of them.

      https://www.unicode.org/faq/ligature_digraph.html#Pf6

      • slavik81 7 years ago

        How do screen readers handle other aspects of mathematical equations? For example, if I have an equation like "A equals B to the C", how do I represent this such that a screen reader will say it correctly? As far as I can tell, I can't.

        I see the FAQ you linked states that I'm supposed to use markup for this. Unfortunately, that means the screen reader needs to understand math markup to work correctly. If that's the case, it seems like we could just include other concerns like Bold X vs X in the math markup as well.

      • cryptonector 7 years ago

        Am I confused as to what unicode.style is doing? Oh, yes I am. It's using U+1D400 MATHEMATICAL BOLD CAPITAL A and such. I see. Ok, thanks for setting me straight!

    • robin_reala 7 years ago

      Itโ€™s really not a screenreaderโ€™s job to guess what a character might be being pressed into service to represent in a given scenario, and to try to work out when thatโ€™s inappropriate and it should be using the original specified meaning.

      • bonoboTP 7 years ago

        Well, it kind of is. In my mind, a screen reader should read anything that a healthy human can. Ideally even text in images and any sort of weirdness.

        • gumby 7 years ago

          Then again different healthy humans read texts differently. There's no such thing as a canonical reading and any reading bring in prior semantics.

          There was a sign posted in the break room of a lab I used to work at. It read, "<long complex equation> is easy for you to read, but not for everyone. Volunteer to read for the blind." It was a good double whammy.

  • alphakappa 7 years ago

    Speaking of styling things for novelty, there's also this

    https://beta.observablehq.com/@mbostock/text-styles

alpb 7 years ago

I'm having trouble understanding what this does. Apparently there's a popover when you select the text you typed. I had to go to https://github.com/ekmartin/unicode-style to figure this out.

  • floatrock 7 years ago

    It's a kinda ascii-art thing that ๐”ฉ๐”ข๐”ฑ๐”ฐ ๐”ถ๐”ฌ๐”ฒ ๐”ž๐”ซ๐”ฐ๐”ด๐”ข๐”ฏ ๐”ฅ๐”ซ ๐” ๐”ฌ๐”ช๐”ช๐”ข๐”ซ๐”ฑ๐”ฐ ๐”ž๐”ฉ๐”ฉ ๐”ฃ๐”ž๐”ซ๐” ๐”ถ ๐”ฉ๐”ฆ๐”จ๐”ข ๐”ฑ๐”ฅ๐”ฆ๐”ฐ. ๐•†๐•ฃ ๐•๐•š๐•œ๐•– ๐•ฅ๐•™๐•š๐•ค ๐•š๐•— ๐•ช๐• ๐•ฆ ๐•จ๐•’๐•Ÿ๐•ฅ ๐•๐•–๐•ค๐•ค ๐•˜๐• ๐•ฅ๐•™๐•š๐•” ๐•ž๐• ๐•ฃ๐•– ๐• ๐•ฆ๐•ฅ๐•๐•š๐•Ÿ๐•–.

    ๐™ธ๐šœ๐š—'๐š ๐šž๐š—๐š’๐šŒ๐š˜๐š๐šŽ ๐š๐š›๐šŽ๐šŠ๐š?

  • nmstoker 7 years ago

    Yup, appears relatively hard to use via Chrome mobile with Gboard. You just about can, but most words get swallowed and the site popover is in the same place as the Android text assistance popover

  • rglover 7 years ago

    ๐”‡๐”ฌ๐”ฑ๐”ฅ ๐”จ๐”ซ๐”ฌ๐”ด ๐”ฅ๐”ฌ๐”ด ๐”ฑ๐”ฌ ๐”ฒ๐”ฐ๐”ข ๐”ฐ๐”ž๐”ฆ๐”ก ๐”ž๐”ฒ๐”ฑ๐”ฌ๐”ช๐”ž๐”ฑ๐”ข๐”ก ๐” ๐”ฌ๐”ช๐”ญ๐”ฒ๐”ฑ๐”ข๐”ฏ ๐”ฌ๐”ญ๐”ข๐”ฏ๐”ž๐”ฑ๐”ฆ๐”ฌ๐”ซ?

  • 8xde0wcNwpslOw 7 years ago

    Also seems to support the "standard" shortcuts for formatting (Ctrl+I/U/B, perhaps others), but yeah, the introduction might be lacking.

braythwayt 7 years ago

I have been using https://yaytext.com for quite a while. Only for novelty purposes, of course.

mejari 7 years ago

Underlines aฬฒrฬฒฬฒฬฒeฬฒฬฒฬฒฬฒฬฒฬฒฬฒ ฬฒฬฒฬฒฬฒฬฒฬฒฬฒaฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒ ฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒlฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒฬฒittle nesty when re-selecting already underlined characters from the left

Too 7 years ago

This seems like a major source of security issues. How can i protect myself from it?

Am i supposed to normalize ALL untrusted user input or will that break normal text in some language i'm not familiar with? Or only normalize things that are supposed to be unique, like urls, usernames and other identifiers?

kowdermeister 7 years ago

I would make it more clear what the app does and that you should highlight text to format it.

I discovered that by accident.

tambourine_man 7 years ago

On iOS, the selection popover interferes with the one you designed.

  • ekmartinOP 7 years ago

    Good point. Iโ€™m planning on implementing a static (always-enabled) toolbar for mobile devices โ€” hopefully thatโ€™ll make it better.

    • pohl 7 years ago

      Maybe if your options appeared below the selection rather than above they wouldnโ€™t interfere.

  • echelon 7 years ago

    Same on Android.

    Very neat utility, though!

  • richrichardsson 7 years ago

    On macOS (High Sierra) the Gothic and next font after that (sorry I forgot what the icon was) don't work for capital letters (get a rectangle with some small digits in). The "Courier" font and handwriting style ones work fine.

PeterisP 7 years ago

Doesn't work with accented latin alphabet characters ("รผ๐•“๐•–๐•ฃ") - is it a fundamental restriction of the styled unicode charset or a mapping issue? Seems like it should be supported but would need a transition from precomposed characters to the equivalent combination of letter + diacritic.

  • Veedrac 7 years ago

    It's not a fundamental restriction in the character set; "๐•ฆฬˆ๐•“๐•–๐•ฃ" is valid.

  • sebazzz 7 years ago

    It does not work with numbers either, though that can be worked around a little bit by using the number-in-circle variants.

app4soft 7 years ago

What about "strikethrough" and "slashthrough"?[0,1,2]

P.S.: It's look like you reinvent YayText[3] website ;-)

[0] http://adamvarga.com/strike/

[1] https://yaytext.com/strike/

[2] https://yaytext.com/slash/

[3] https://yaytext.com/

waynenilsen 7 years ago

https://github.com/ekmartin/unicode-style/blob/master/src/tr...

I learned something new today, thanks!

lucideer 7 years ago

Android Firefox is just showing me black boxes for those Unicode ranges, so this definitely can't be very effectively used everywhere (I haven't tried yet, but I'm guessing some native messaging apps may have similar issues).

nebulous1 7 years ago

You should keep the toolbar on the screen.

na85 7 years ago

Neat idea but completely unusable on Firefox mobile due to the context menu that pops up whenever text is selected and obscures the site's menu that does the same.

EdSharkey 7 years ago

There is something about rendering these unicode code points that seems to make Firefox jank more than usual (when scrolling this page up and down). Are rendering these code points especially costly for some reason?

ChrisGranger 7 years ago

People might find Unicode Text Converter useful as well.

http://qaz.wtf/u/convert.cgi?text=Hacker_News

bradleybuda 7 years ago

๐”ฅ๐”ฆ ๐•ฅ๐•™๐•–๐•ฃ๐•– ๐“ฏ๐“ป๐“ฒ๐“ฎ๐“ท๐“ญ

  • neogodless 7 years ago

    It was not initially ๐—ฎ๐—ฝ๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ป๐˜ to me that I could ๐˜ด๐˜ต๐˜บ๐˜ญ๐˜ฆ things that I typed.

pimlottc 7 years ago

Not clear what this does? Also, copy button doesnโ€™t work on iOS.

  • gkoberger 7 years ago

    It lets you ๐—ฏ๐—ผ๐—น๐—ฑ or ๐•ค๐•ฅ๐•ช๐•๐•– stuff in unicode, so plain-text places (like Hacker News) can get formatting.

    • Avery3R 7 years ago

      HN supports italics it's not entirely plain-text

      • function_seven 7 years ago

        ๐™พ๐š—๐šŽ ๐š˜๐š ๐š๐š‘๐šŽ ๐š๐š˜๐š๐šŒ๐š‘๐šŠ๐šœ ๐š˜๐š— ๐™ท๐™ฝ ๐š๐š˜๐š๐šŠ๐šข ๐š’๐šœ ๐š ๐š‘๐šŽ๐š— ๐šŒ๐š˜๐š–๐š–๐šŽ๐š—๐š๐š˜๐š›๐šœ ๐šž๐šœ๐šŽ ๐šŒ๐š˜๐š๐šŽ ๐š๐š˜๐š›๐š–๐šŠ๐š๐š๐š’๐š—๐š ๐š๐š˜ ๐š–๐šŠ๐š”๐šŽ ๐šŠ ๐š‹๐š•๐š˜๐šŒ๐š”๐šš๐šž๐š˜๐š๐šŽ, ๐š๐š‘๐šŽ๐š— ๐šœ๐š˜๐š–๐šŽ๐š˜๐š—๐šŽ ๐šŽ๐š•๐šœ๐šŽ ๐š‘๐šŠ๐šœ ๐š๐š˜ ๐š›๐šŽ๐š™๐š•๐šข ๐š ๐š’๐š๐š‘ ๐š’๐š ๐šž๐š—๐š๐š˜๐š›๐š–๐šŠ๐š๐š๐šŽ๐š ๐š๐š˜๐š› ๐š๐š‘๐šŽ ๐š‹๐šŽ๐š—๐šŽ๐š๐š’๐š ๐š˜๐š ๐š–๐š˜๐š‹๐š’๐š•๐šŽ ๐šž๐šœ๐šŽ๐š›๐šœ. ๐™ธ ๐šŒ๐š˜๐šž๐š•๐š ๐šœ๐šŽ๐šŽ ๐š๐š‘๐š’๐šœ ๐š‹๐šŽ๐š’๐š—๐š ๐šŠ ๐š ๐š˜๐š›๐š”๐šŠ๐š›๐š˜๐šž๐š—๐š ๐š๐š˜๐š› ๐š๐š‘๐šŠ๐š ๐š•๐š’๐š–๐š’๐š๐šŠ๐š๐š’๐š˜๐š—. ๐™ต๐š’๐šก๐šŽ๐š ๐š ๐š’๐š๐š๐š‘ ๐š๐šŽ๐šก๐š ๐š ๐š’๐š๐š‘๐š˜๐šž๐š ๐š๐š‘๐šŽ ๐š‘๐š˜๐š›๐š’๐šฃ๐š˜๐š—๐š๐šŠ๐š• ๐šœ๐šŒ๐š›๐š˜๐š•๐š•๐š’๐š—๐š ๐š˜๐š— ๐š—๐šŠ๐š›๐š›๐š˜๐š  ๐šœ๐šŒ๐š›๐šŽ๐šŽ๐š—๐šœ.

        • e_proxus 7 years ago

          Or perhaps HN should fix some properties formatting after all these years? Code gets posted here pretty regularly after all...

  • Groxx 7 years ago

    it's a wysiwyg-style editor for bold / italics / etc where it replaces the selected range with the unicode bold/etc characters.

    type some text, then select it. that'll give you a format editor.

jotato 7 years ago

Nice work. +1 I would ๐—น๐—ผ๐˜ƒ๐—ฒ to see this as a chrome plugin.

burgerdev 7 years ago

Is it just me, or does the BB style render the most interesting characters as rectangles? (Chrome on OS X)

๐”ธ ๐”น ๐”บ ๐”ป ๐”ผ ๐”ฝ ๐”พ ๐”ฟ ๐•€ ๐• ๐•‚ ๐•ƒ ๐•„ ๐•… ๐•† ๐•‡ ๐•ˆ ๐•‰ ๐•Š ๐•‹ ๐•Œ ๐• ๐•Ž ๐• ๐• ๐•‘

jtolmar 7 years ago

This is entertaining but it's completely broken on Chrome mobile in a way I've never seen before.

tmalsburg2 7 years ago

This would be great as an Emacs extension. Is there something like this?

username3 7 years ago

๐”ฒ๐”ซ๐”ฆ๐” ๐”ฌ๐”ก๐”ข.๐”ฐ๐”ฑ๐”ถ๐”ฉ๐”ข

๐”‰๐”ฌ๐”ฏ๐”ช๐”ž๐”ฑ ๐”ฑ๐”ข๐”ต๐”ฑ ๐”ฒ๐”ฐ๐”ฆ๐”ซ๐”ค ๐”ฒ๐”ซ๐”ฆ๐” ๐”ฌ๐”ก๐”ข ๐” ๐”ฅ๐”ž๐”ฏ๐”ž๐” ๐”ฑ๐”ข๐”ฏ๐”ฐ. ๐”“๐”ž๐”ฐ๐”ฑ๐”ข ๐”ž๐”ซ๐”ถ๐”ด๐”ฅ๐”ข๐”ฏ๐”ข ๐”ฑ๐”ฅ๐”ž๐”ฑ ๐”ž๐” ๐” ๐”ข๐”ญ๐”ฑ๐”ฐ ๐”ญ๐”ฉ๐”ž๐”ฆ๐”ซ ๐”ฑ๐”ข๐”ต๐”ฑ.

tambourine_man 7 years ago

Neat. Suggestion: maybe donโ€™t โ€œunderlineโ€ characters such as y or g

jancsika 7 years ago

This reveals all kinds of bugs in current software.

For example, if I use the tool to make a url italic, then pasting that url into Chrome's url bar gives me back a bunch of unicode rectangles.

But that's not what I wanted. I wanted Chrome's url bar to interpret those unicode code points as an italic version of the actual unicode code points I want. Chrome should add a check for edge cases like these and add branches to map the string to the corresponding non-styled code points automatically.

Someone needs to send lots of bug reports to all the relevant pieces of software that currently have this bug. Firefox, Chromium, probably Edge, Webkit. Those are just the browsers, but I'm sure there are more. I'm not actually sure about Firefox tbh, but maybe just send the bug report first and see if it gets accepted to find out.

Ooh, here's another one-- if you paste some unicode.style'd text into LibreOffice does it convert it to the "normal" code points and add the relevant styling? If not, it should, otherwise it's broken.

Actually, I just realized another issue. If I type something in the url bar that is styled with unicode.style, then there is no way for Chrome to know whether I want it displayed styled or not.

For example, maybe I'm pasting it there temporarily so that I can copy/paste it later in a Tweet. In that case I probably want to keep the current styling for the tweet.

So Chrome should map to the normalized unicode code points (just in case I'm typing a url or want to instantiate a search), but still display the styled version. Then when I copy it again, it should put the unicode.style version into the buffer. And the app which receives the pasted buffer should receive the unicode.style code points. And of course that app should also normalize it underneath while retaining the styled display for the same reasons.

To deal with this complexity, there should probably be a standardized way for all apps to deal with styled text.

Please help by testing every app and filing relevant bug reports.

  • SEMW 7 years ago

    > But that's not what I wanted. I wanted Chrome's url bar to interpret those unicode code points as an italic version of the actual unicode code points I want. Chrome should add a check for edge cases like these and add branches to map the string to the corresponding non-styled code points automatically.

    No, it shouldn't. They are semantically different code points. The _whole point_ is that they are semantically different code points (they're from the Mathematical Alphanumeric Symbols block, the purpose of which is for e.g. when you have a formula containing ๐˜น and ๐˜… as semantically different characters, where that difference needs to be preserved in copying & pasing, conveyed to screen readers, etc.

    > Ooh, here's another one-- if you paste some unicode.style'd text into LibreOffice does it convert it to the "normal" code points and add the relevant styling? If not, it should, otherwise it's broken.

    It really, really, really shouldn't, for the same reason as above.

    Mathematical symbols are not a replacement for text styles and markup. Trying to make them that will destroy the thing they're actually useful for, the thing they can do that text styling can't do: preserve their semantics when transmitted in plain text (including for accessibility purposes).

    This is not styling. It cannot be normalized away. They are semantically different characters.

    https://www.unicode.org/faq/ligature_digraph.html#Pf6

  • bjt2n3904 7 years ago

    Ugh. This just further poisons me against unicode being a good thing. ASCII or bust.

    "ABCD" and "๐“๐“‘๐“’๐““" are the same thing, but they also aren't. Am I supposed to normalize everything on username creation to prevent people from making duplicates?

    • function_seven 7 years ago

      You either have to do that, or restrict usernames to ASCII subset only (and face the wrath of non-Latin alphabet users), or restrict to a whitelist of ranges that only represent alpha characters across all languages (i.e. not math symbols, nor box-drawing, nor emoji, etc.)

      i8n is complicated because the diversity of language is itself complicated. I feel your pain, though, don't get me wrong.

    • codebje 7 years ago

      Not normalise, just set the rules you want to apply. Internationalised domain names suffer this issue, so cribbing their fix probably would work for you - pick a script, restrict allowable characters to it.

      You probably never want to allow unrestricted use of any character set for a username, even ASCII - otherwise I could take the username 'bjt2n3904 '.

  • function_seven 7 years ago

    I can't tell if you're being serious or tongue-in-cheek, but I like this comment nonetheless.

    Did Unicode ever have any business assigning separate code points for italic versions of latin glyphs? Am I ignorant as to their true purpose?

    EDIT: SEMI just answered my second question. Math. Makes sense.

    • wgjordan 7 years ago

      > Did Unicode ever have any business assigning separate code points for italic versions of latin glyphs?

      Here's the official answer to this question from the Unicode spec [1]:

      "Mathematical notation requires a number of Latin and Greek alphabets that initially appear to be mere font variations of one another. For example, the letter H can appear as plain or upright (H), bold (๐‡), italic (๐ป), and script (โ„‹). However, in any given document, these characters have distinct, and usually unrelated, mathematical semantics. For example, a normal H represents a different variable from a bold H, etc. If these attributes are dropped in plain text, the distinctions are lost and the meaning of the text is altered."

      [1] https://www.unicode.org/versions/Unicode11.0.0/UnicodeStanda...

    • jancsika 7 years ago

      > I can't tell if you're being serious or tongue-in-cheek, but I like this comment nonetheless.

      Thanks! I started by interpreting the tool as a serious and important feature of putting styling where UIs don't allow it. Then I did a quick stream-of-consciousness advocacy for fitting the feature in all the places where it isn't allowed.

      The results are obviously bad, which is reassuring.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection