Class UnicodeString

java.lang.Object
net.sf.saxon.regex.UnicodeString
All Implemented Interfaces:
CharSequence, Comparable<UnicodeString>, AtomicMatchKey
Direct Known Subclasses:
BMPString, EmptyString, GeneralUnicodeString, LatinString

public abstract class UnicodeString extends Object implements CharSequence, Comparable<UnicodeString>, AtomicMatchKey
An abstract class that efficiently handles Unicode strings including non-BMP characters; it has three subclasses, respectively handling strings whose maximum character code is 255, 65535, or 1114111.
  • Constructor Details

    • UnicodeString

      public UnicodeString()
  • Method Details

    • makeUnicodeString

      public static UnicodeString makeUnicodeString(CharSequence in)
      Make a UnicodeString for a given CharSequence
      Parameters:
      in - the input CharSequence
      Returns:
      a UnicodeString using an appropriate implementation class
    • makeUnicodeString

      public static UnicodeString makeUnicodeString(int[] in)
      Make a UnicodeString for a given array of codepoints
      Parameters:
      in - the input CharSequence
      Returns:
      a UnicodeString using an appropriate implementation class
    • containsSurrogatePairs

      public static boolean containsSurrogatePairs(CharSequence value)
      Test whether a CharSequence contains Unicode codepoints outside the BMP range
      Parameters:
      value - the string to be tested
      Returns:
      true if the string contains non-BMP codepoints
    • uSubstring

      public abstract UnicodeString uSubstring(int beginIndex, int endIndex)
      Get a substring of this string
      Parameters:
      beginIndex - the index of the first character to be included (counting codepoints, not 16-bit characters)
      endIndex - the index of the first character to be NOT included (counting codepoints, not 16-bit characters)
      Returns:
      a substring
      Throws:
      IndexOutOfBoundsException - if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
    • uIndexOf

      public abstract int uIndexOf(int search, int start)
      Get the first match for a given character
      Parameters:
      search - the character to look for
      start - the first position to look
      Returns:
      the position of the first occurrence of the sought character, or -1 if not found
    • uCharAt

      public abstract int uCharAt(int pos)
      Get the character at a specified position
      Parameters:
      pos - the index of the required character (counting codepoints, not 16-bit characters)
      Returns:
      a character (Unicode codepoint) at the specified position.
    • uLength

      public abstract int uLength()
      Get the length of the string, in Unicode codepoints
      Returns:
      the number of codepoints in the string
    • isEnd

      public abstract boolean isEnd(int pos)
      Ask whether a given position is at (or beyond) the end of the string
      Parameters:
      pos - the index of the required character (counting codepoints, not 16-bit characters)
      Returns:
      true iff if the specified index is after the end of the character stream
    • hashCode

      public int hashCode()
      Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
      Overrides:
      hashCode in class Object
      Returns:
      a hashCode that distinguishes this UnicodeString from others
    • equals

      public boolean equals(Object obj)
      Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
      Overrides:
      equals in class Object
      Parameters:
      obj - the object to be compared
      Returns:
      true if obj is a UnicodeString containing the same codepoints
    • compareTo

      public int compareTo(UnicodeString other)
      Compare two unicode strings in codepoint collating sequence
      Specified by:
      compareTo in interface Comparable<UnicodeString>
      Parameters:
      other - the object to be compared
      Returns:
      less than 0, 0, or greater than 0 depending on the ordering of the two strings
    • asAtomic

      public AtomicValue asAtomic()
      Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.
      Specified by:
      asAtomic in interface AtomicMatchKey
      Returns:
      an atomic value that encapsulates this match key