Package net.sf.saxon.serialize.charcode
Class UTF8CharacterSet
java.lang.Object
net.sf.saxon.serialize.charcode.UTF8CharacterSet
- All Implemented Interfaces:
CharacterSet
This class defines properties of the UTF-8 character set
-
Method Summary
Modifier and TypeMethodDescriptionstatic intdecodeUTF8(byte[] in, int used) Decode a UTF8 characterGet the preferred Java name of the character set.static UTF8CharacterSetGet the singular instance of this classstatic intgetUTF8Encoding(char in, char in2, byte[] out) Static method to generate the UTF-8 representation of a Unicode characterbooleaninCharset(int c) Determine if a character is present in the character set
-
Method Details
-
getInstance
Get the singular instance of this class- Returns:
- the singular instance of this class
-
inCharset
public boolean inCharset(int c) Description copied from interface:CharacterSetDetermine if a character is present in the character set- Specified by:
inCharsetin interfaceCharacterSet
-
getCanonicalName
Description copied from interface:CharacterSetGet the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".- Specified by:
getCanonicalNamein interfaceCharacterSet
-
getUTF8Encoding
public static int getUTF8Encoding(char in, char in2, byte[] out) Static method to generate the UTF-8 representation of a Unicode character- Parameters:
in- the Unicode character, or the high half of a surrogate pairin2- the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)out- an array of at least 4 bytes to hold the UTF-8 representation.- Returns:
- the number of bytes in the UTF-8 representation
-
decodeUTF8
Decode a UTF8 character- Parameters:
in- array of bytes representing a single UTF-8 encoded characterused- number of bytes in the array that are actually used- Returns:
- the Unicode codepoint of this character
- Throws:
IllegalArgumentException- if the byte sequence is not a valid UTF-8 representation
-