What does UTF-8 mean?
This is the meaning of “UTF”, or “Unicode Transformation Format.” There are other encoding systems for Unicode besides UTF-8, but UTF-8 is unique because it represents characters in one-byte units.
Why UTF-8 is the default character encoding in XML?
The World Wide Web Consortium recommends UTF-8 as the default encoding in XML and HTML (and not just using UTF-8, also stating it in metadata), “even when all characters are in the ASCII range.. Using non-UTF-8 encodings can have unexpected results”. Many other standards only support UTF-8, e.g. open JSON exchange requires it.
What is a high surrogate in UTF8 encoding?
The two UTF8Encoding instances encode a character array that contains two high surrogates (U+D801 and U+D802) in a row, which is an invalid character sequence; a high surrogate should always be followed by a low surrogate.
What are the most common character encodings?
The most commonly used encodings are UTF-8 and UTF-16: A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages.