33 %
67 %
Information about a8

Published on January 21, 2008

Author: Quintilliano

Source: authorstream.com

Surrogate Support in Microsoft Products:  Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer Trigeminal Software, Inc. What are surrogates?:  What are surrogates? "a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate" High/low surrogate?:  High/low surrogate? High: U+D800 - U+DBFF Low: U+DC00 - U+DFFF Terminology: "surrogate pair" preferred over "surrogate character" Conversion example #1:  Conversion example #1 Example #1: The first character in the Surrogate range (D800, DC00) as UTF-32: 1. D800: binary 1101100000000000 (lower ten bits: 0000000000) 2. DC00: binary 1101110000000000 (lower ten bits: 0000000000) 3. Concatenate 0000000000+0000000000 = x0000 4. Add x10000 Result: U+10000. This makes sense, since the first character in the Surrogate range follows immediately after the last character in the 16-bit Unicode range (U+FFFF) Conversion example #2:  Conversion example #2 Example #2. You have a Unicode character such as U+2040A (a CJK character in Plane2) and wish to encode it in UTF-16 1. Subtract x10000 - Result: 1040A 2. Split into two ten-bit pieces: 0001000001 0000001010 3. Add 1101100000000000 (D800) to the high 10 bits piece (0001000001) - Result: 1101100001000001 (D841) 4. Add 1101110000000000 (DC00) to the low 10 bits piece (0000001010) - Result: 1101110000001010 (DC0A) Your surrogate pair: D841, DC0A UTF-8 conversions:  UTF-8 conversions Illegal conversions: six-byte UTF-8 (two surrogate code points of UTF-16, converted separately) legal conversions: four-byte UTF-8 (one UTF-32 code point) UTF-8 example:  UTF-8 example Unicode surrogate pair: aaaabbbbbbcccccc, zzzzyyyyyyxxxxxx becomes incorrect UTF-8 total 6 bytes: 1110aaaa 10bbbbbb 10cccccc 1110zzzz 10yyyyyy 10xxxxxx Instead, you should take a Unicode surrogate pair: 110110wwwwzzzzyy, 110111yyyyxxxxxx and convert it to UTF-8 totaling 4 bytes (below, uuuuu is defined as = wwww+1): 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx Encoding choices for MS:  Encoding choices for MS UTF-16, mostly Occasionally UTF-8 Even more occasionally, UTF-32 REASONS: There was obviously an existing, well-tested set of APIs that support UCS-2, which is a total subset of UTF-16. A completely new API set was not required. A move to UTF-32 would require twice as much space for all characters. A move to UTF-8 would require even more than twice as much space in many cases. The products...:  The products... Mostly the new generation of products: Windows 2000/XP Office XP (some support in Office 2000) Most of these products supported Unicode already a little bit of extra work needed for surrogate pairs usually just UTF-8 support needed Windows 2000/XP:  Windows 2000/XP Uniscribe/GDI+ support for rendering Each surrogate pair is a single grapheme APIs like CharPrev/CharNext not changed Extensions to fallback fonts in XP Font CMAP extensions in XP Lots of UTF-8 issues fixed in XP No specific surrogate font/IME (yet) Collation for Supplementary chacacters:  Collation for Supplementary chacacters All Plane-1 (non-ideographic) characters sort after all the other non-ideographic scripts but before the ideographs. All Plane 2 (ideographic) characters will be sorted after all the ideographs on the BMP. All Plane 3-14 (currently not assigned) will be treated like any other unassigned characters. (includes plane 14 language tags) All characters encoded in Plane 15-16 (private use) will be sorted after all other characters. Other system components:  Other system components MLang Internet Explorer IIS 5.0/6.0 The downlevel story:  The downlevel story No good support for Unicode, let along supplementary characters Uniscribe/RichEdit does improve the downlevel story for display purposes, at least Officially, no surrgoate support on Win9x The Office suite:  The Office suite Word Frontpage Excel/Access Outlook RichEdit 4.0 Specific Features:  Specific Features Insertion/Deletion of text - All Cursor movement - All Font linking/fallback - All (Word's is best) UTF-8 issues fixed - All Enhanced word breaking - All (Word/RichEdit) Vertical text - Word/PowerPoint/Publisher/RichEdit Direct entry (Alt+nnnnnn, hhhhh + Alt+x) - Word/RichEdit CHS/CHT/CHP Office:  CHS/CHT/CHP Office The product and the langpacks support an extended Unicode IME that handles supplementary characters An Extension B font is also included Visual Studio[.NET]:  Visual Studio[.NET] String class and globalization namespace StringInfo GetTextElementEnumerator Handles supplementary characters Also handles composite characters GDI+ IDE support SQL Server:  SQL Server Past - no support Present - surrogate "safe" (neutral) Future - surrogate awaree Items not supported:  Items not supported Character Map Graph 10 Outlook 10 mail headers Collations for supplementary characters Fonts/IMEs Questions?:  Questions? Slide21:  Surrogate Support in Microsoft Products

Add a comment

Related presentations

Related pages

Bundesautobahn 8 – Wikipedia

Die Panorama-Autobahn A8 München - Salzburg: Jahr 1938 in der Onlineausstellung 100 Jahre Landschaftsarchitektur des bdla; Feierliche Verkehrsfreigabe A 8
Read more

Stau A8 - Staumeldungen, aktueller Staumelder für die A8

Staumeldungen, aktuelle Verkehrsmeldungen, Stau und Staumelder für die Autobahn A8 in ganz Deutschland
Read more

Discoplex A8 Saarbrücken - Startseite

Your Birthday!!! Wünsch Dir was und füll dein Glas! Wir feiern Dich und feiern deinen Geburtstag im A8! Melde Dich bis zu einer Woche nach deinem ...
Read more

Audi A8 – Wikipedia

Der Audi A8 ist ein Oberklassefahrzeug von Audi, das seit Mitte 1994 hergestellt wird und das Nachfolgemodell des Audi V8 darstellt. Ein Großteil seiner ...
Read more

Stau A8: Unfälle, Sperrung & Baustellen | Staumelder A8

Top-Aktuell: Stau auf der A8 - Staumeldungen, Sperrungen durch Unfall oder Baustelle im Überblick. Staumelder und Verkehrsinformationen für die Autobahn A8
Read more

A8 > Audi Deutschland

Der Audi A8: Stark und hoch effizient, vereint er Dynamik mit hohem Komfort. Im Innenraum fasziniert der Audi A8 mit seiner luxuriöser Ausstattung.
Read more

Audi A8 Gebrauchtwagen – mobile.de

Sie suchen einen Audi A8 in Ihrer Nähe? Finden Sie Audi A8 Angebote in allen Preiskategorien bei mobile.de – Deutschlands größtem Fahrzeugmarkt
Read more

A8 - Autobahnatlas

A 8 PERL - SAARLOUIS - PIRMASENS - KARLSRUHE - STUTTGART - ULM - MÜNCHEN - SALZBURG Saar-Autobahn / Saarland-Autobahn: Perl - Saarlouis (mit A 620)
Read more

A8 > Modelle > Audi Deutschland

Die Audi A8-Familie: Überlegenes Design, Verarbeitung in Manufakturqualität, konsequenter Leichtbau und Highend-Technologien sind charakteristisch.
Read more

Audi A8 - AUTOBILD.DE - Testberichte - Automarkt - Autokauf

Mehr als 99 News, Tests und Videos zum Audi A8 von 1994 (Audi A8 D2) bis 2009 (Audi A8 D5) finden Sie im Auto-Katalog von autobild.de
Read more