Package net.sf.saxon.regex
Class UnicodeString
java.lang.Object
net.sf.saxon.regex.UnicodeString
- All Implemented Interfaces:
CharSequence,Comparable<UnicodeString>,AtomicMatchKey
- Direct Known Subclasses:
BMPString,EmptyString,GeneralUnicodeString,LatinString
public abstract class UnicodeString
extends Object
implements CharSequence, Comparable<UnicodeString>, AtomicMatchKey
An abstract class that efficiently handles Unicode strings including
non-BMP characters; it has three subclasses, respectively handling
strings whose maximum character code is 255, 65535, or 1114111.
-
Field Summary
Fields inherited from interface net.sf.saxon.expr.sort.AtomicMatchKey
NaN_MATCH_KEY -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionasAtomic()Get an atomic value that encapsulates this match key.intcompareTo(UnicodeString other) Compare two unicode strings in codepoint collating sequencestatic booleanTest whether a CharSequence contains Unicode codepoints outside the BMP rangebooleanImplementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceinthashCode()Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceabstract booleanisEnd(int pos) Ask whether a given position is at (or beyond) the end of the stringstatic UnicodeStringmakeUnicodeString(int[] in) Make a UnicodeString for a given array of codepointsstatic UnicodeStringMake a UnicodeString for a given CharSequenceabstract intuCharAt(int pos) Get the character at a specified positionabstract intuIndexOf(int search, int start) Get the first match for a given characterabstract intuLength()Get the length of the string, in Unicode codepointsabstract UnicodeStringuSubstring(int beginIndex, int endIndex) Get a substring of this stringMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.lang.CharSequence
charAt, chars, codePoints, isEmpty, length, subSequence, toString
-
Constructor Details
-
UnicodeString
public UnicodeString()
-
-
Method Details
-
makeUnicodeString
Make a UnicodeString for a given CharSequence- Parameters:
in- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
makeUnicodeString
Make a UnicodeString for a given array of codepoints- Parameters:
in- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
containsSurrogatePairs
Test whether a CharSequence contains Unicode codepoints outside the BMP range- Parameters:
value- the string to be tested- Returns:
- true if the string contains non-BMP codepoints
-
uSubstring
Get a substring of this string- Parameters:
beginIndex- the index of the first character to be included (counting codepoints, not 16-bit characters)endIndex- the index of the first character to be NOT included (counting codepoints, not 16-bit characters)- Returns:
- a substring
- Throws:
IndexOutOfBoundsException- if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
-
uIndexOf
public abstract int uIndexOf(int search, int start) Get the first match for a given character- Parameters:
search- the character to look forstart- the first position to look- Returns:
- the position of the first occurrence of the sought character, or -1 if not found
-
uCharAt
public abstract int uCharAt(int pos) Get the character at a specified position- Parameters:
pos- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- a character (Unicode codepoint) at the specified position.
-
uLength
public abstract int uLength()Get the length of the string, in Unicode codepoints- Returns:
- the number of codepoints in the string
-
isEnd
public abstract boolean isEnd(int pos) Ask whether a given position is at (or beyond) the end of the string- Parameters:
pos- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- true iff if the specified index is after the end of the character stream
-
hashCode
public int hashCode()Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence -
equals
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence -
compareTo
Compare two unicode strings in codepoint collating sequence- Specified by:
compareToin interfaceComparable<UnicodeString>- Parameters:
other- the object to be compared- Returns:
- less than 0, 0, or greater than 0 depending on the ordering of the two strings
-
asAtomic
Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.- Specified by:
asAtomicin interfaceAtomicMatchKey- Returns:
- an atomic value that encapsulates this match key
-