Settings

Theme

Unicode 15 Released

lwn.net

58 points by asjo 3 years ago · 21 comments

Reader

pacaro 3 years ago

https://blog.unicode.org/2022/09/announcing-unicode-standard...

Seems like it might be a better link

  • gmfawcett 3 years ago

    > Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc.,

    TIL that CORBA is still around, and that someone considers it modern.

    • rektide 3 years ago

      Repeating my general request, someone please do a Speaking for the Dead on COBRA that includes the good, the hopeful, the smart.

      Also the bad too. These days it's just a sign a symbol for bad, but in this hyperreality we dont know the referrant, dont know the lessons of the bad either.

      But i want to hear the good most of all. And how good was avoided, missed, unfulfilled. Because it I think like so much else might possibly have gone great, if things & times & adoption had been only slightly barely different. It was just too soon for open source, is a huge possibility in my mind.

      Also comparison versus DCOM/OLE would be informative, for historical context.

teddyh 3 years ago

> UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.

Good, I guess, but I wonder how much will break.

  • rurban 3 years ago

    These are not used, outside of attacks. And the serious languages all forbid them already. And the hacker languages do not care at all about Unicode security so far.

AdamH12113 3 years ago

> This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.

That seems like a lot of new CJK characters! How did they end up with so many new characters after so long? Is there some gradual process of adding historical or extremely rare characters, or were some deliberately left out of earlier versions?

torstenvl 3 years ago

Wish there were more/better CJK compatibility characters. Lots of documents in old CJK encodings that simply cannot be converted to Unicode losslessly without special PUA markings to indicate round trip mechanics.

rektide 3 years ago

Relaying some LWN comments:

> Why do we get to use the "bottom left part of glyph is damaged" modifier only for hieroglyphs? :(

(It'd be interesting to consider modifiers not always necessarily needing to be pre-encoded... to allow casual flexibility in implementation)

Wifi and Honk are new symbols.

(Tiktok/instagram will be thrilled... "can I get a honkkkk". (Im ok with the idea but honking is in general a nuissance that needs to be shut the f down, huge anti-social problem)).

rurban 3 years ago

Does anyone have an idea about the extraordinary delay? Usually the yearly standard is released in April, May, but this time in September. All string processing libs and languages had to wait.

jwilk 3 years ago

Summary of changes:

https://unicode.org/versions/Unicode15.0.0/

pie_flavor 3 years ago

New astrological signs!

anthk 3 years ago

GNU Unifont will handle it well enough.

  • unknownaccount 3 years ago

    Unifont is missing tons of glyphs. I dont think any singlular font covers all the characters.

    • anthk 3 years ago

      Will. Unifont's versioning scheme it's bound to the Unicode numberings.

      • unknownaccount 3 years ago

        Only if someone volunteers and submits those glyphs which isnt necessarily guaranteed to happen as I mentioned many unicode 14 codepoints still dont have glyphs in unifont.

joshxyz 3 years ago

Time flies, thank you for sharing.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection