java.lang.Object
org.apache.lucene.analysis.hunspell.Stemmer

final class Stemmer extends Object
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
  • Field Details

    • dictionary

      private final Dictionary dictionary
    • formStep

      private final int formStep
  • Constructor Details

    • Stemmer

      public Stemmer(Dictionary dictionary)
      Constructs a new Stemmer which will use the provided Dictionary to create its stems.
      Parameters:
      dictionary - Dictionary that will be used to create the stems
  • Method Details

    • stem

      public List<CharsRef> stem(String word)
      Find the stem(s) of the provided word.
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • stem

      public List<CharsRef> stem(char[] word, int length)
      Find the stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • analyze

      void analyze(char[] word, int length, Stemmer.RootProcessor processor)
    • varyCase

      boolean varyCase(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor)
    • caseOf

      WordCase caseOf(char[] word, int length)
      returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word
    • caseFoldTitle

      private char[] caseFoldTitle(char[] word, int length)
      folds titlecase variant of word to titleBuffer
    • caseFoldLower

      private char[] caseFoldLower(char[] word, int length)
      folds lowercase variant of word (title cased) to lowerBuffer
    • capitalizeAfterApostrophe

      private static char[] capitalizeAfterApostrophe(char[] word, int length)
    • varySharpS

      private boolean varySharpS(char[] word, int length, Stemmer.CaseVariationProcessor processor)
    • doStem

      boolean doStem(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor)
    • uniqueStems

      public List<CharsRef> uniqueStems(char[] word, int length)
      Find the unique stem(s) of the provided word
      Parameters:
      word - Word to find the stems for
      Returns:
      List of stems for the word
    • stemException

      private String stemException(int morphDataId)
    • newStem

      private CharsRef newStem(CharsRef stem, int morphDataId)
    • removeAffixes

      boolean removeAffixes(char[] word, int offset, int length, boolean doPrefix, int outerPrefix, int innerPrefix, int outerSuffix, Stemmer.StemCandidateProcessor processor)
      Generates a list of stems for the provided word. It's called recursively when applying affixes one by one, setting (inner/outer)(Suffix/Prefix) parameters to non-negative values as that happens.
      Parameters:
      word - Word to generate the stems for
      doPrefix - true if we should remove prefixes
      Returns:
      whether the processing should be continued
    • stripAffix

      private char[] stripAffix(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix)
      Returns:
      null if affix conditions isn't met; a reference to the same char[] if the affix has no strip data and can thus be simply removed, or a new char[] containing the word affix removal
    • isAffixCompatible

      private boolean isAffixCompatible(int affix, boolean isPrefix, int outerPrefix, int outerSuffix, WordContext context)
    • applyAffix

      private boolean applyAffix(char[] word, int offset, int length, int affix, boolean prefix, int outerPrefix, int innerPrefix, int outerSuffix, Stemmer.StemCandidateProcessor processor)
      Applies the affix rule to the given word, producing a list of stems if any are found. Non-negative (inner/outer)(Suffix/Prefix) parameters indicate the already applied affixes.
      Parameters:
      word - Char array containing the word with the affix removed and the strip added
      offset - where the word actually starts in the array
      length - the length of the stripped word
      affix - the id of the affix in Dictionary.affixData
      prefix - true if we are removing a prefix (false if it's a suffix)
      Returns:
      whether the processing should be continued
    • isRootCompatibleWithContext

      private boolean isRootCompatibleWithContext(WordContext context, int lastAffix, int entryId)
    • morphDataId

      private int morphDataId(IntsRef forms, int i)
    • needsAnotherAffix

      private boolean needsAnotherAffix(int affix, int previousAffix, boolean isSuffix, int prefixId)