Package net.sf.saxon.regex
Class UnicodeString
java.lang.Object
net.sf.saxon.regex.UnicodeString
- All Implemented Interfaces:
CharSequence
,Comparable<UnicodeString>
,AtomicMatchKey
- Direct Known Subclasses:
BMPString
,EmptyString
,GeneralUnicodeString
,LatinString
public abstract class UnicodeString
extends Object
implements CharSequence, Comparable<UnicodeString>, AtomicMatchKey
An abstract class that efficiently handles Unicode strings including
non-BMP characters; it has three subclasses, respectively handling
strings whose maximum character code is 255, 65535, or 1114111.
-
Field Summary
Fields inherited from interface net.sf.saxon.expr.sort.AtomicMatchKey
NaN_MATCH_KEY
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionasAtomic()
Get an atomic value that encapsulates this match key.int
compareTo
(UnicodeString other) Compare two unicode strings in codepoint collating sequencestatic boolean
Test whether a CharSequence contains Unicode codepoints outside the BMP rangeboolean
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceint
hashCode()
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceabstract boolean
isEnd
(int pos) Ask whether a given position is at (or beyond) the end of the stringstatic UnicodeString
makeUnicodeString
(int[] in) Make a UnicodeString for a given array of codepointsstatic UnicodeString
Make a UnicodeString for a given CharSequenceabstract int
uCharAt
(int pos) Get the character at a specified positionabstract int
uIndexOf
(int search, int start) Get the first match for a given characterabstract int
uLength()
Get the length of the string, in Unicode codepointsabstract UnicodeString
uSubstring
(int beginIndex, int endIndex) Get a substring of this stringMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.lang.CharSequence
charAt, chars, codePoints, isEmpty, length, subSequence, toString
-
Constructor Details
-
UnicodeString
public UnicodeString()
-
-
Method Details
-
makeUnicodeString
Make a UnicodeString for a given CharSequence- Parameters:
in
- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
makeUnicodeString
Make a UnicodeString for a given array of codepoints- Parameters:
in
- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
containsSurrogatePairs
Test whether a CharSequence contains Unicode codepoints outside the BMP range- Parameters:
value
- the string to be tested- Returns:
- true if the string contains non-BMP codepoints
-
uSubstring
Get a substring of this string- Parameters:
beginIndex
- the index of the first character to be included (counting codepoints, not 16-bit characters)endIndex
- the index of the first character to be NOT included (counting codepoints, not 16-bit characters)- Returns:
- a substring
- Throws:
IndexOutOfBoundsException
- if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
-
uIndexOf
public abstract int uIndexOf(int search, int start) Get the first match for a given character- Parameters:
search
- the character to look forstart
- the first position to look- Returns:
- the position of the first occurrence of the sought character, or -1 if not found
-
uCharAt
public abstract int uCharAt(int pos) Get the character at a specified position- Parameters:
pos
- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- a character (Unicode codepoint) at the specified position.
-
uLength
public abstract int uLength()Get the length of the string, in Unicode codepoints- Returns:
- the number of codepoints in the string
-
isEnd
public abstract boolean isEnd(int pos) Ask whether a given position is at (or beyond) the end of the string- Parameters:
pos
- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- true iff if the specified index is after the end of the character stream
-
hashCode
public int hashCode()Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence -
equals
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence -
compareTo
Compare two unicode strings in codepoint collating sequence- Specified by:
compareTo
in interfaceComparable<UnicodeString>
- Parameters:
other
- the object to be compared- Returns:
- less than 0, 0, or greater than 0 depending on the ordering of the two strings
-
asAtomic
Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.- Specified by:
asAtomic
in interfaceAtomicMatchKey
- Returns:
- an atomic value that encapsulates this match key
-