Home »

Utf-8

The meaning of «utf-8»

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.[1]

UTF-8 is capable of encoding all 1,112,064[nb 1] valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as / (slash) in filenames, \ (backslash) in escape sequences, and % in printf.

UTF-8 was designed as a superior alternative to UTF-1, a proposed variable-width encoding with partial ASCII compatibility which lacked some features including self-synchronization and fully ASCII-compatible handling of characters such as slashes. Ken Thompson and Rob Pike produced the first implementation for the Plan 9 operating system in September 1992.[2][3] This led to its adoption by X/Open as its specification for FSS-UTF, which would first be officially presented at USENIX in January 1993 and subsequently adopted by the Internet Engineering Task Force (IETF) in RFC 2277 (BCP 18) for future Internet standards work, replacing Single Byte Character Sets such as Latin-1 in older RFCs.

UTF-8 is by far the most common encoding for the World Wide Web, accounting for 97% of all web pages, and up to 100% for some languages, as of 2021.[4]

The official Internet Assigned Numbers Authority (IANA) code for the encoding is "UTF-8".[5] All letters are upper-case, and the name is hyphenated. This spelling is used in all the Unicode Consortium documents relating to the encoding.

Alternatively, the name "utf-8" may be used by all standards conforming to the IANA list (which include CSS, HTML, XML, and HTTP headers),[6] as the declaration is case insensitive.[5]

Other variants, such as those that omit the hyphen or replace it with a space, i.e. "utf8" or "UTF 8", are not accepted as correct by the governing standards.[7] Despite this, most web browsers can understand them, and so standards intended to describe existing practice (such as HTML5) may effectively require their recognition.[8]

Unofficially, UTF-8-BOM and UTF-8-NOBOM are sometimes used for text files which contain or don't contain a byte order mark (BOM), respectively.[citation needed] In Japan especially, UTF-8 encoding without a BOM is sometimes called "UTF-8N".[9][10]

Related Searches

Unicode equivalenceUTF-16UTF-32
UTF-7UTF-EBCDICUTF-1
Han unificationApril Fools' Day Request for Comments

Choice of words

u-tf-8_ _
ut-f-8_ _
utf--8_ _
utf-8:_ _ _ _
utf-8_ _ _ _
utf-8_ - _ _ _
utf-8-_ _ _ _
utf-8 _ _ _ _ _
utf-8 _ - _ _ _ _
© 2015-2021, Wikiwordbook.info
Copying information without reference to the source is prohibited!
contact us mobile version