advertisement

UTF-8: The Secret of Character Encoding

0 %
100 %
advertisement
Information about UTF-8: The Secret of Character Encoding
Technology

Published on November 30, 2008

Author: Dextro

Source: slideshare.net

advertisement

UTF-8: The Secret of Character

Bert? Web developer at Netlash Dextro http://dextrose.be http://twitter.com/dextro

Some history... sorry ASCII = 7 bit → 2 7 possibilities 1 byte = 8 bit → 1 bit left parity bit ASCII extended (ISO 8859) Asia: DBCS

Unicode NOT: 1 char = 16 bits letter → a code point (U+234) U+0048 U+0065 U+006C U+006C U+006F

Unicode memory storing 00 48 00 65 00 6C 00 6C 00 6F (little endian) 48 00 65 00 6C 00 6C 00 6F 00 (big endian) Byte Order Mark • FF FE (little endian) • FE FF (big endian)

ways of encoding unicode UCS-4 (UTF-32) high endian low endian UCS-2 high endian low endian

ways of encoding unicode UTF-16: 2 or 4 bytes UTF-8: 1, 2, 3 or 4 bytes UTF-7: SMTP in mailtraffic

UTF-8 U+000 till U+127 → 1 byte above → 2, 3, up to 6 bytes ANSII → UTF-8 = no difference

UTF-8 sorting: byte oriented = sorting code points standard for XML (XHTML) documents easy recognized by an algorithm

Sidenote What if char is not known in the encoding?

In practice Which encoding to choose?

Questions Which characters am I going to use? In which encodings can my editor save files? Which encodings are supported by the various components in my publishing chain? Which encodings are supported by browsers?

1 character range single language or multilanguage? (curly) quotation marks, dashes and other special punctuation mathematical or other special symbols

2 text editor fixed or not? Zend Studio for Eclipse: ISO-8859-1, US- ASCII, UTF-16, UTF-16BE, UTF-16LE, UTF-8

3 other components webserver programming (or scripting) language database ...

4 browser support no problem: US-ASCII, ISO 8859 series and UTF-8 avoid the others (and US-ASCII...)

character not available? entity: © ë á NCR: © or © more bytes difficult to read SEO?

Biggest problem PHP5 at least full support in PHP6

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

UTF-8: The Secret of Character Encoding - HTML Purifier

Character encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without ...
Read more

UTF-8: The Secret of Character Encoding - 博客园 ...

Character encoding and character sets are not that difficult to understand, but so many people blithely stumble through the worlds of programming without ...
Read more

Why do we use ? - Quora - The best ...

Why do we use ? ... UTF-8: The Secret of Character Encoding. ... UTF-8 - Character encoding for Unicode.
Read more

Character encodings for beginners - World Wide Web Consortium

Character encodings for beginners. Intended audience: ... UTF-8, Character sets, coded character sets, and encodings, the document character set, ...
Read more

utf 8 - What is the proper way to URL encode Unicode ...

... UTF-8 must be used when encoding characters that are not ... UTF-8 encoding is a pretty good compromise ... How would a secret society ...
Read more

HTML Meta UTF 8

UTF-8: The Secret of Character Encoding ... .utf8 will probably be sent with the UTF-8 charset ... UTF-8 Because the character sets in ISO-8859 ...
Read more

UTF-8 and Unicode FAQ for Unix/Linux - The Computer Laboratory

UTF-8 and Unicode FAQ for ... and WinNT have to convert file name character encodings. UTF-8 is one of the ... Secrets; IBM’s Unicode Zone ...
Read more

Character Encoding - 녹차 프린스 :: 녹차 프린스

... UTF-8: The Secret of Character Encoding - How to Avoid Character Encoding Problems in PHP 02. Form Variables Encoding Form의 속성중에 ...
Read more

html - accented letters are not displayed correctly on the ...

accented letters are not displayed correctly on the server, even if the encoding is correct. ... (UTF-8: The Secret of Character Encoding) ...
Read more