Class AhoCorasickDoubleArrayTrie<V>

java.lang.Object
com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
All Implemented Interfaces:
Serializable

public class AhoCorasickDoubleArrayTrie<V> extends Object implements Serializable
An implementation of Aho Corasick algorithm based on Double Array Trie
See Also:
  • Field Details

    • check

      protected int[] check
      check array of the Double Array Trie structure
    • base

      protected int[] base
      base array of the Double Array Trie structure
    • fail

      protected int[] fail
      fail table of the Aho Corasick automata
    • output

      protected int[][] output
      output table of the Aho Corasick automata
    • v

      protected V[] v
      outer value array
    • l

      protected int[] l
      the length of every key
    • size

      protected int size
      the size of base and check array
  • Constructor Details

    • AhoCorasickDoubleArrayTrie

      public AhoCorasickDoubleArrayTrie()
  • Method Details

    • parseText

      Parse text
      Parameters:
      text - The text
      Returns:
      a list of outputs
    • parseText

      public void parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
      Parse text
      Parameters:
      text - The text
      processor - A processor which handles the output
    • parseText

      public void parseText(CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)
      Parse text
      Parameters:
      text - The text
      processor - A processor which handles the output
    • parseText

      public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
      Parse text
      Parameters:
      text - The text
      processor - A processor which handles the output
    • parseText

      public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
      Parse text
      Parameters:
      text - The text
      processor - A processor which handles the output
    • matches

      public boolean matches(String text)
      Checks that string contains at least one substring
      Parameters:
      text - source text to check
      Returns:
      true if string contains at least one substring
    • findFirst

      public AhoCorasickDoubleArrayTrie.Hit<V> findFirst(String text)
      Search first match in string
      Parameters:
      text - source text to check
      Returns:
      first match or null if there are no matches
    • save

      public void save(ObjectOutputStream out) throws IOException
      Save
      Parameters:
      out - An ObjectOutputStream object
      Throws:
      IOException - Some IOException
    • load

      public void load(ObjectInputStream in) throws IOException, ClassNotFoundException
      Load
      Parameters:
      in - An ObjectInputStream object
      Throws:
      IOException
      ClassNotFoundException
    • get

      public V get(String key)
      Get value by a String key, just like a map.get() method
      Parameters:
      key - The key
      Returns:
    • set

      public boolean set(String key, V value)
      Update a value corresponding to a key
      Parameters:
      key - the key
      value - the value
      Returns:
      successful or not(failure if there is no key)
    • get

      public V get(int index)
      Pick the value by index in value array
      Notice that to be more efficiently, this method DO NOT check the parameter
      Parameters:
      index - The index
      Returns:
      The value
    • getState

      private int getState(int currentState, char character)
      transmit state, supports failure function
      Parameters:
      currentState -
      character -
      Returns:
    • storeEmits

      private void storeEmits(int position, int currentState, List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)
      store output
      Parameters:
      position -
      currentState -
      collectedEmits -
    • transition

      protected int transition(int current, char c)
      transition of a state
      Parameters:
      current -
      c -
      Returns:
    • transitionWithRoot

      protected int transitionWithRoot(int nodePos, char c)
      transition of a state, if the state is root and it failed, then returns the root
      Parameters:
      nodePos -
      c -
      Returns:
    • build

      public void build(Map<String,V> map)
      Build a AhoCorasickDoubleArrayTrie from a map
      Parameters:
      map - a map containing key-value pairs
    • exactMatchSearch

      public int exactMatchSearch(String key)
      match exactly by a key
      Parameters:
      key - the key
      Returns:
      the index of the key, you can use it as a perfect hash function
    • exactMatchSearch

      private int exactMatchSearch(String key, int pos, int len, int nodePos)
      match exactly by a key
      Parameters:
      key -
      pos -
      len -
      nodePos -
      Returns:
    • exactMatchSearch

      private int exactMatchSearch(char[] keyChars, int pos, int len, int nodePos)
      match exactly by a key
      Parameters:
      keyChars - the char array of the key
      pos - the begin index of char array
      len - the length of the key
      nodePos - the starting position of the node for searching
      Returns:
      the value index of the key, minus indicates null
    • size

      public int size()
      Get the size of the keywords
      Returns: