Class PersianAnalyzer

All Implemented Interfaces:
Closeable, AutoCloseable

public final class PersianAnalyzer extends StopwordAnalyzerBase
Analyzer for Persian.

This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

  • Field Details

    • DEFAULT_STOPWORD_FILE

      public static final String DEFAULT_STOPWORD_FILE
      File containing default Persian stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
      See Also:
    • STOPWORDS_COMMENT

      public static final String STOPWORDS_COMMENT
      The comment character in the stopwords file. All lines prefixed with this will be ignored
      See Also:
  • Constructor Details

    • PersianAnalyzer

      public PersianAnalyzer(Version matchVersion)
      Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
    • PersianAnalyzer

      public PersianAnalyzer(Version matchVersion, CharArraySet stopwords)
      Builds an analyzer with the given stop words
      Parameters:
      matchVersion - lucene compatibility version
      stopwords - a stopword set
  • Method Details

    • getDefaultStopSet

      public static CharArraySet getDefaultStopSet()
      Returns an unmodifiable instance of the default stop-words set.
      Returns:
      an unmodifiable instance of the default stop-words set.