Home »

Utf-16

The meaning of «utf-16»

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid character code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.[1]

UTF-16 is used internally by systems such as Microsoft Windows, the Java programming language and JavaScript/ECMAScript. It is also often used for plain text and for word-processing data files on Microsoft Windows. It is rarely used for files on Unix-like systems. As of May 2019, Microsoft reversed its course of only emphasizing UTF-16 for Unicode; for Windows applications, Microsoft recommends and supports UTF-8 (e.g. for Universal Windows Platform (UWP) apps.[2]).

UTF-16 is the only web-encoding incompatible with ASCII,[3] and never gained popularity on the web, where it is used by under 0.002% (little over 1 thousandth of 1 percent) of web pages.[4] UTF-8, by comparison, is used by 97% of all web pages.[5] The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16.[6]

In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" (UCS) that would replace earlier language-specific encodings with one coordinated system. The goal was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings, which required 1 byte per character, with an encoding using 65,536 (216) values, which would require 2 bytes (16 bits) per character.

Two groups worked on this in parallel, ISO/IEC JTC 1/SC 2 and the Unicode Consortium, the latter representing mostly manufacturers of computing equipment. The two groups attempted to synchronize their character assignments so that the developing encodings would be mutually compatible. The early 2-byte encoding was originally called "Unicode", but is now called "UCS-2".[7]

When it became increasingly clear that 216 characters would not suffice,[1] IEEE introduced a larger 31-bit space and an encoding (UCS-4) that would require 4 bytes per character. This was resisted by the Unicode Consortium, both because 4 bytes per character wasted a lot of memory and disk space, and because some manufacturers were already heavily invested in 2-byte-per-character technology. The UTF-16 encoding scheme was developed as a compromise and introduced with version 2.0 of the Unicode standard in July 1996.[8] It is fully specified in RFC 2781, published in 2000 by the IETF.[9][10]

Related Searches

UTF-1April Fools' Day Request for CommentsComparison of Unicode encodings
UTF1F1600 Championship Series

Choice of words

u-tf-16_ _
ut-f-16_ _
utf--16_ _
utf-16:_ _ _ _
utf-16_ _ _ _
utf-16_ - _ _ _
utf-16-_ _ _ _
utf-16 _ _ _ _ _
utf-16 _ - _ _ _ _
© 2015-2021, Wikiwordbook.info
Copying information without reference to the source is prohibited!
contact us mobile version