Home »

Utf-7

The meaning of «utf-7»

UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

UTF-7 (according to its RFC) isn't a "Unicode Transformation Format", as the definition can only encode code points in the BMP (the first 65536 Unicode code points, which does not include emojis and many other characters). However if a UTF-7 translator is to/from UTF-16 then it can (and probably does) encode each surrogate half as though it was a 16-bit code point, and thus can encode all code points. It is unclear if other UTF-7 software (such as translators to UTF-32 or UTF-8) support this.

UTF-7 has never has been an official standard of the Unicode Consortium. It is known to have security issues, which is why software has been changed to disable its use.[citation needed] It is prohibited in HTML 5.[1][2]

MIME, the modern standard of E-mail format, forbids encoding of headers using byte values above the ASCII range. Although MIME allows encoding the message body in various character sets (broader than ASCII), the underlying transmission infrastructure (SMTP, the main E-mail transfer standard) is still not guaranteed to be 8-bit clean. Therefore, a non-trivial content transfer encoding has to be applied in case of doubt. Unfortunately base64 has a disadvantage of making even US-ASCII characters unreadable in non-MIME clients. On the other hand, UTF-8 combined with quoted-printable produces a very size-inefficient format requiring 6–9 bytes for non-ASCII characters from the BMP and 12 bytes for characters outside the BMP.

Provided certain rules are followed during encoding, UTF-7 can be sent in e-mail without using an underlying MIME transfer encoding, but still must be explicitly identified as the text character set. In addition, if used within e-mail headers such as "Subject:", UTF-7 must be contained in MIME encoded words identifying the character set. Since encoded words force use of either quoted-printable or base64, UTF-7 was designed to avoid using the = sign as an escape character to avoid double escaping when it is combined with quoted-printable (or its variant, the RFC 2047/1522 ?Q?-encoding of headers).

UTF-7 is generally not used as a native representation within applications as it is very awkward to process. Despite its size advantage over the combination of UTF-8 with either quoted-printable or base64, the now defunct Internet Mail Consortium recommended against its use.[3]

8BITMIME has also been introduced, which reduces the need to encode message bodies in a 7-bit format.

A modified form of UTF-7 (sometimes dubbed 'mUTF-7'[citation needed]) is currently used in the IMAP e-mail retrieval protocol for mailbox names.[4]

Related Searches

UTF-8UTF-16UTF-32
UTF-EBCDICUTF-1UTA 70 Class
Han unificationApril Fools' Day Request for CommentsUTFO

Choice of words

u-tf-7_ _
ut-f-7_ _
utf--7_ _
utf-7:_ _ _ _
utf-7_ _ _ _
utf-7_ - _ _ _
utf-7-_ _ _ _
utf-7 _ _ _ _ _
utf-7 _ - _ _ _ _
© 2015-2021, Wikiwordbook.info
Copying information without reference to the source is prohibited!
contact us mobile version