Class HMMChineseTokenizerFactory


public final class HMMChineseTokenizerFactory extends TokenizerFactory
Factory for HMMChineseTokenizer

Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

Since:
4.10.0
  • Field Details

  • Constructor Details

    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory(Map<String,String> args)
      Creates a new HMMChineseTokenizerFactory
    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory()
      Default ctor for compatibility with SPI
  • Method Details