Unicode 15 Released

58 points by asjo 4 years ago · 21 comments

Reader

pacaro 4 years ago

https://blog.unicode.org/2022/09/announcing-unicode-standard...

Seems like it might be a better link

gmfawcett 4 years ago

> Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc.,
TIL that CORBA is still around, and that someone considers it modern.
- rektide 4 years ago
  
  Repeating my general request, someone please do a Speaking for the Dead on COBRA that includes the good, the hopeful, the smart.
  Also the bad too. These days it's just a sign a symbol for bad, but in this hyperreality we dont know the referrant, dont know the lessons of the bad either.
  But i want to hear the good most of all. And how good was avoided, missed, unfulfilled. Because it I think like so much else might possibly have gone great, if things & times & adoption had been only slightly barely different. It was just too soon for open source, is a huge possibility in my mind.
  Also comparison versus DCOM/OLE would be informative, for historical context.
  - gmfawcett 4 years ago
    
    I wasn't a CORBA developer in those days, just an interested observer, so my opinion isn't worth much. Ultimately I think it suffered because of:
    https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
    The design of CORBA was based on critical assumptions that simply weren't true, leaving the application developer to deal with all the essential complexity.

teddyh 4 years ago

> UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.

Good, I guess, but I wonder how much will break.

rurban 4 years ago

These are not used, outside of attacks. And the serious languages all forbid them already. And the hacker languages do not care at all about Unicode security so far.

AdamH12113 4 years ago

> This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.

That seems like a lot of new CJK characters! How did they end up with so many new characters after so long? Is there some gradual process of adding historical or extremely rare characters, or were some deliberately left out of earlier versions?

lifthrasiir 4 years ago

More like the former. There was indeed a deliberate omission in the past standard called Han unification [1], but it's now pretty much toned down thanks to the expansion of Unicode codepoint space in 2.0, following subsequent disunification processes and the eventual introduction of Ideographic Variation Database [2] to handle remaining cases.
[1] https://en.wikipedia.org/wiki/Han_unification
[2] https://unicode.org/ivd/
ks2048 4 years ago

On the wikipedia page for "CJK Unified Ideographs Extension H", under "History"(https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extensi...), you can find dozens of linked documents describing why someone thought they should be added.
One random example I opened (https://www.unicode.org/L2/L2017/17099-haifeng-county-uax45....) is a 9 page PDF proposing a single character used for "congee shop signs in Haifeng County".
ksec 4 years ago

I continue to think Han Unification, or Unicode for CJK is not the best solution to the problem.

torstenvl 4 years ago

Wish there were more/better CJK compatibility characters. Lots of documents in old CJK encodings that simply cannot be converted to Unicode losslessly without special PUA markings to indicate round trip mechanics.

rektide 4 years ago

Relaying some LWN comments:

> Why do we get to use the "bottom left part of glyph is damaged" modifier only for hieroglyphs? :(

(It'd be interesting to consider modifiers not always necessarily needing to be pre-encoded... to allow casual flexibility in implementation)

Wifi and Honk are new symbols.

(Tiktok/instagram will be thrilled... "can I get a honkkkk". (Im ok with the idea but honking is in general a nuissance that needs to be shut the f down, huge anti-social problem)).

rurban 4 years ago

Does anyone have an idea about the extraordinary delay? Usually the yearly standard is released in April, May, but this time in September. All string processing libs and languages had to wait.

jwilk 4 years ago

Summary of changes:

https://unicode.org/versions/Unicode15.0.0/

pie_flavor 4 years ago

New astrological signs!

anthk 4 years ago

GNU Unifont will handle it well enough.

unknownaccount 4 years ago

Unifont is missing tons of glyphs. I dont think any singlular font covers all the characters.
- anthk 4 years ago
  
  Will. Unifont's versioning scheme it's bound to the Unicode numberings.
  - unknownaccount 4 years ago
    
    Only if someone volunteers and submits those glyphs which isnt necessarily guaranteed to happen as I mentioned many unicode 14 codepoints still dont have glyphs in unifont.

joshxyz 4 years ago

Time flies, thank you for sharing.

Settings

Unicode 15 Released

Keyboard Shortcuts