Unicode 15 Released
lwn.nethttps://blog.unicode.org/2022/09/announcing-unicode-standard...
Seems like it might be a better link
> Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc.,
TIL that CORBA is still around, and that someone considers it modern.
Repeating my general request, someone please do a Speaking for the Dead on COBRA that includes the good, the hopeful, the smart.
Also the bad too. These days it's just a sign a symbol for bad, but in this hyperreality we dont know the referrant, dont know the lessons of the bad either.
But i want to hear the good most of all. And how good was avoided, missed, unfulfilled. Because it I think like so much else might possibly have gone great, if things & times & adoption had been only slightly barely different. It was just too soon for open source, is a huge possibility in my mind.
Also comparison versus DCOM/OLE would be informative, for historical context.
I wasn't a CORBA developer in those days, just an interested observer, so my opinion isn't worth much. Ultimately I think it suffered because of:
https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
The design of CORBA was based on critical assumptions that simply weren't true, leaving the application developer to deal with all the essential complexity.
> UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.
Good, I guess, but I wonder how much will break.
These are not used, outside of attacks. And the serious languages all forbid them already. And the hacker languages do not care at all about Unicode security so far.
> This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.
That seems like a lot of new CJK characters! How did they end up with so many new characters after so long? Is there some gradual process of adding historical or extremely rare characters, or were some deliberately left out of earlier versions?
More like the former. There was indeed a deliberate omission in the past standard called Han unification [1], but it's now pretty much toned down thanks to the expansion of Unicode codepoint space in 2.0, following subsequent disunification processes and the eventual introduction of Ideographic Variation Database [2] to handle remaining cases.
On the wikipedia page for "CJK Unified Ideographs Extension H", under "History"(https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extensi...), you can find dozens of linked documents describing why someone thought they should be added.
One random example I opened (https://www.unicode.org/L2/L2017/17099-haifeng-county-uax45....) is a 9 page PDF proposing a single character used for "congee shop signs in Haifeng County".
I continue to think Han Unification, or Unicode for CJK is not the best solution to the problem.
Wish there were more/better CJK compatibility characters. Lots of documents in old CJK encodings that simply cannot be converted to Unicode losslessly without special PUA markings to indicate round trip mechanics.
Relaying some LWN comments:
> Why do we get to use the "bottom left part of glyph is damaged" modifier only for hieroglyphs? :(
(It'd be interesting to consider modifiers not always necessarily needing to be pre-encoded... to allow casual flexibility in implementation)
Wifi and Honk are new symbols.
(Tiktok/instagram will be thrilled... "can I get a honkkkk". (Im ok with the idea but honking is in general a nuissance that needs to be shut the f down, huge anti-social problem)).
Does anyone have an idea about the extraordinary delay? Usually the yearly standard is released in April, May, but this time in September. All string processing libs and languages had to wait.
Summary of changes:
New astrological signs!
GNU Unifont will handle it well enough.
Unifont is missing tons of glyphs. I dont think any singlular font covers all the characters.
Will. Unifont's versioning scheme it's bound to the Unicode numberings.
Only if someone volunteers and submits those glyphs which isnt necessarily guaranteed to happen as I mentioned many unicode 14 codepoints still dont have glyphs in unifont.
Time flies, thank you for sharing.