Class SpellingCheckRule

java.lang.Object
org.languagetool.rules.Rule
org.languagetool.rules.spelling.SpellingCheckRule
Direct Known Subclasses:
HunspellRule, MorfologikSpellerRule, SymSpellRule

public abstract class SpellingCheckRule extends Rule
An abstract rule for spellchecking rules.
  • Field Details

    • LANGUAGETOOL

      public static final String LANGUAGETOOL
      The string LanguageTool.
      Since:
      2.3
      See Also:
    • LANGUAGETOOLER

      public static final String LANGUAGETOOLER
      The string LanguageTooler.
      Since:
      4.4
      See Also:
    • language

      protected final Language language
    • languageModel

      @Nullable @Experimental protected @Nullable LanguageModel languageModel
      Since:
      4.5 For rules from @see Language.getRelevantLanguageModelCapableRules Optional, allows e.g. better suggestions when set
    • wordListLoader

      protected final CachingWordListLoader wordListLoader
    • SPELLING_IGNORE_FILE

      private static final String SPELLING_IGNORE_FILE
      See Also:
    • SPELLING_FILE

      private static final String SPELLING_FILE
      See Also:
    • CUSTOM_SPELLING_FILE

      private static final String CUSTOM_SPELLING_FILE
      See Also:
    • GLOBAL_SPELLING_FILE

      private static final String GLOBAL_SPELLING_FILE
      See Also:
    • SPELLING_PROHIBIT_FILE

      private static final String SPELLING_PROHIBIT_FILE
      See Also:
    • CUSTOM_SPELLING_PROHIBIT_FILE

      private static final String CUSTOM_SPELLING_PROHIBIT_FILE
      See Also:
    • SPELLING_FILE_VARIANT

      private static final String SPELLING_FILE_VARIANT
    • STRING_LENGTH_COMPARATOR

      private static final Comparator<String> STRING_LENGTH_COMPARATOR
    • userConfig

      private final UserConfig userConfig
    • wordsToBeIgnored

      private final Set<String> wordsToBeIgnored
    • wordsToBeProhibited

      private final Set<String> wordsToBeProhibited
    • altRules

      private final List<RuleWithLanguage> altRules
    • wordsToBeIgnoredDictionary

      private Map<String,Set<String>> wordsToBeIgnoredDictionary
    • wordsToBeIgnoredDictionaryIgnoreCase

      private Map<String,Set<String>> wordsToBeIgnoredDictionaryIgnoreCase
    • antiPatterns

      private List<DisambiguationPatternRule> antiPatterns
    • considerIgnoreWords

      private boolean considerIgnoreWords
    • convertsCase

      private boolean convertsCase
    • ignoreWordsWithLength

      protected int ignoreWordsWithLength
  • Constructor Details

  • Method Details

    • addSuggestionsToRuleMatch

      protected static void addSuggestionsToRuleMatch(String word, List<String> userCandidates, List<String> candidates, @Nullable @Nullable SuggestionsOrderer orderer, RuleMatch match)
      Parameters:
      word - misspelled word that suggestions should be generated for
      userCandidates - candidates from personal dictionary
      candidates - candidates from default dictionary
      orderer - model to rank suggestions / extract features, or null
      match - rule match to add suggestions to
    • createWrongSplitMatch

      protected RuleMatch createWrongSplitMatch(AnalyzedSentence sentence, List<RuleMatch> ruleMatchesSoFar, int pos, String coveredWord, String suggestion1, String suggestion2, int prevPos)
    • getId

      public abstract String getId()
      Description copied from class: Rule
      A string used to identify the rule in e.g. configuration files. This string is supposed to be unique and to stay the same in all upcoming versions of LanguageTool. It's supposed to contain only the characters A-Z and the underscore.
      Specified by:
      getId in class Rule
    • getDescription

      public abstract String getDescription()
      Description copied from class: Rule
      A short description of the error this rule can detect, usually in the language of the text that is checked.
      Specified by:
      getDescription in class Rule
    • match

      public abstract RuleMatch[] match(AnalyzedSentence sentence) throws IOException
      Description copied from class: Rule
      Check whether the given sentence matches this error rule, i.e. whether it contains the error detected by this rule. Note that the order in which this method is called is not always guaranteed, i.e. the sentence order in the text may be different than the order in which you get the sentences (this may be the case when LanguageTool is used as a LibreOffice/OpenOffice add-on, for example).
      Specified by:
      match in class Rule
      Parameters:
      sentence - a pre-analyzed sentence
      Returns:
      an array of RuleMatch objects
      Throws:
      IOException
    • isMisspelled

      @Experimental public abstract boolean isMisspelled(String word) throws IOException
      Throws:
      IOException
      Since:
      4.8
    • isDictionaryBasedSpellingRule

      public boolean isDictionaryBasedSpellingRule()
      Description copied from class: Rule
      Whether this is a spelling rule that uses a dictionary. Rules that return true here are basically rules that work like a simple hunspell-like spellchecker: they check words without considering the words' context.
      Overrides:
      isDictionaryBasedSpellingRule in class Rule
    • addIgnoreTokens

      public void addIgnoreTokens(List<String> tokens)
      Add the given words to the list of words to be ignored during spell check. You might want to use acceptPhrases(List) instead, as only that can also deal with phrases.
    • updateIgnoredWordDictionary

      private void updateIgnoredWordDictionary()
    • setConsiderIgnoreWords

      public void setConsiderIgnoreWords(boolean considerIgnoreWords)
      Set whether the list of words to be explicitly ignored (set with addIgnoreTokens(List)) is considered at all.
    • getAdditionalTopSuggestions

      protected List<String> getAdditionalTopSuggestions(List<String> suggestions, String word) throws IOException
      Get additional suggestions added before other suggestions (note the rule may choose to re-order the suggestions anyway). Only add suggestions here that you know are spelled correctly, they will not be checked again before being shown to the user.
      Throws:
      IOException
    • getAdditionalSuggestions

      protected List<String> getAdditionalSuggestions(List<String> suggestions, String word)
      Get additional suggestions added after other suggestions (note the rule may choose to re-order the suggestions anyway).
    • ignoreToken

      protected boolean ignoreToken(AnalyzedTokenReadings[] tokens, int idx) throws IOException
      Returns true iff the token at the given position should be ignored by the spell checker.
      Throws:
      IOException
    • ignoreWord

      protected boolean ignoreWord(String word) throws IOException
      Returns true iff the word should be ignored by the spell checker. If possible, use ignoreToken(AnalyzedTokenReadings[], int) instead.
      Throws:
      IOException
    • isIgnoredNoCase

      private boolean isIgnoredNoCase(String word)
    • ignoreWord

      protected boolean ignoreWord(List<String> words, int idx) throws IOException
      Returns true iff the word at the given position should be ignored by the spell checker. If possible, use ignoreToken(AnalyzedTokenReadings[], int) instead.
      Throws:
      IOException
      Since:
      2.6
    • setConvertsCase

      public void setConvertsCase(boolean convertsCase)
      Used to determine whether the dictionary will use case conversions for spell checking.
      Parameters:
      convertsCase - if true, then conversions are used.
      Since:
      2.5
    • isUrl

      protected boolean isUrl(String token)
    • isEMail

      protected boolean isEMail(String token)
    • filterDupes

      protected void filterDupes(List<String> words)
    • init

      protected void init() throws IOException
      Throws:
      IOException
    • getIgnoreFileName

      protected String getIgnoreFileName()
      Get the name of the ignore file, which lists words to be accepted, even when the spell checker would not accept them. Unlike with getSpellingFileName() the words in this file will not be used for creating suggestions for misspelled words.
      Since:
      2.7
    • getSpellingFileName

      public String getSpellingFileName()
      Get the name of the spelling file, which lists words to be accepted and used for suggestions, even when the spell checker would not accept them.
      Since:
      2.9, public since 3.5
    • getAdditionalSpellingFileNames

      public List<String> getAdditionalSpellingFileNames()
      Get the name of additional spelling file, which lists words to be accepted and used for suggestions, even when the spell checker would not accept them.
      Since:
      4.8
    • getLanguageVariantSpellingFileName

      public String getLanguageVariantSpellingFileName()
      Get the name of the spelling file for a language variant (e.g., en-US or de-AT), which lists words to be accepted and used for suggestions, even when the spell checker would not accept them.
      Since:
      4.3
    • getProhibitFileName

      protected String getProhibitFileName()
      Get the name of the prohibit file, which lists words not to be accepted, even when the spell checker would accept them.
      Since:
      2.8
    • getAdditionalProhibitFileNames

      protected List<String> getAdditionalProhibitFileNames()
      Get the name of the prohibit file, which lists words not to be accepted, even when the spell checker would accept them.
      Since:
      2.8
    • isProhibited

      protected boolean isProhibited(String word)
      Whether the word is prohibited, i.e. whether it should be marked as a spelling error even if the spell checker would accept it. (This is useful to improve our spell checker without waiting for the upstream checker to be updated.)
      Since:
      2.8
    • filterSuggestions

      protected List<String> filterSuggestions(List<String> suggestions, AnalyzedSentence sentence, int i)
      Remove prohibited words from suggestions.
      Since:
      2.8
    • isProperNoun

      private boolean isProperNoun(String wordWithoutS)
    • addIgnoreWords

      protected void addIgnoreWords(String line)
      Parameters:
      line - the line as read from spelling.txt.
      Since:
      2.9, signature modified in 3.9
    • addProhibitedWords

      protected void addProhibitedWords(List<String> words)
      Parameters:
      words - list of words to be prohibited.
      Since:
      4.2
    • expandLine

      protected List<String> expandLine(String line)
      Expand suffixes in a line. By default, the line is not expanded. Implementations might e.g. turn bicycle/S into [bicycle, bicycles].
      Since:
      3.0
    • getAlternativeLangSpellingRules

      protected List<RuleWithLanguage> getAlternativeLangSpellingRules(List<Language> alternativeLanguages)
    • acceptedInAlternativeLanguage

      protected Language acceptedInAlternativeLanguage(String word) throws IOException
      Throws:
      IOException
    • acceptPhrases

      public void acceptPhrases(List<String> phrases)
      Accept (case-sensitively, unless at the start of a sentence) the given phrases even though they are not in the built-in dictionary. Use this to avoid false alarms on e.g. names and technical terms. Unlike addIgnoreTokens(List) this can deal with phrases. A way to call this is like this: rule.acceptPhrases(Arrays.asList("duodenal atresia")) This way, checking would not create an error for "duodenal atresia", but it would still create and error for "duodenal" or "atresia" if they appear on their own.
      Since:
      3.3
    • getTokensForSentenceStart

      private List<PatternToken> getTokensForSentenceStart(String[] parts)
    • getAntiPatterns

      public List<DisambiguationPatternRule> getAntiPatterns()
      Description copied from class: Rule
      Overwrite this to avoid false alarms by ignoring these patterns - note that your Rule.match(AnalyzedSentence) method needs to call Rule.getSentenceWithImmunization(org.languagetool.AnalyzedSentence) for this to be used and you need to check AnalyzedTokenReadings.isImmunized()
      Overrides:
      getAntiPatterns in class Rule
    • startsWithIgnoredWord

      protected int startsWithIgnoredWord(String word, boolean caseSensitive)
      Checks whether a word starts with an ignored word. Note that a minimum word-length of 4 characters is expected. (This is for better performance. Moreover, such short words are most likely contained in the dictionary.)
      Parameters:
      word - - entire word
      caseSensitive - - determines whether the check is case-sensitive
      Returns:
      length of the ignored word (i.e., return value is 0, if the word does not start with an ignored word). If there are several matches from the set of ignored words, the length of the longest matching word is returned.
      Since:
      3.5
    • reorderSuggestions

      @Experimental protected List<String> reorderSuggestions(List<String> suggestions, String word)