Class LuceneTextIndexCreator
- java.lang.Object
-
- org.apache.pinot.segment.local.segment.index.text.AbstractTextIndexCreator
-
- org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator
-
- All Implemented Interfaces:
Closeable,AutoCloseable,TextIndexCreator,IndexCreator
public class LuceneTextIndexCreator extends AbstractTextIndexCreator
This is used to create Lucene based text index. Used for both offline fromSegmentColumnarIndexCreatorand realtime fromRealtimeLuceneTextIndex
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.lucene.analysis.CharArraySetENGLISH_STOP_WORDS_SETstatic StringLUCENE_INDEX_DOC_ID_COLUMN_NAME
-
Constructor Summary
Constructors Constructor Description LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, List<String> stopWordsInclude, List<String> stopWordsExclude, boolean useCompoundFile, int maxBufferSizeMB)Called bySegmentColumnarIndexCreatorwhen building an offline segment.LuceneTextIndexCreator(IndexCreationContext context, TextIndexConfig indexConfig)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(String document)voidadd(String[] documents, int length)voidclose()static HashSet<String>getDefaultEnglishStopWordsSet()org.apache.lucene.index.IndexWritergetIndexWriter()voidseal()-
Methods inherited from class org.apache.pinot.segment.local.segment.index.text.AbstractTextIndexCreator
add, add
-
-
-
-
Field Detail
-
LUCENE_INDEX_DOC_ID_COLUMN_NAME
public static final String LUCENE_INDEX_DOC_ID_COLUMN_NAME
- See Also:
- Constant Field Values
-
ENGLISH_STOP_WORDS_SET
public static final org.apache.lucene.analysis.CharArraySet ENGLISH_STOP_WORDS_SET
-
-
Constructor Detail
-
LuceneTextIndexCreator
public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, @Nullable List<String> stopWordsInclude, @Nullable List<String> stopWordsExclude, boolean useCompoundFile, int maxBufferSizeMB)
Called bySegmentColumnarIndexCreatorwhen building an offline segment. Similar to how it creates per column dictionary, forward and inverted index, a text index is also created if text search is enabled on a column.- Parameters:
column- column namesegmentIndexDir- segment index directorycommit- true if the index should be committed (at the end after all documents have been added), false if index should not be committed Note on commit: OnceSegmentColumnarIndexCreatorfinishes indexing all documents/rows for the segment, we need to commit and close the Lucene index which will internally persist the index on disk, do the necessary resource cleanup etc. We commit duringInvertedIndexCreator.seal()and close duringCloseable.close(). This lucene index writer is used by both offline and realtime (both during indexing in-memory MutableSegment and later during conversion to offline). Since realtime segment conversion is again going to go through the offline indexing path and will do everything (indexing, commit, close etc), there is no need to commit the index from the realtime side. So when the realtime segment is destroyed (which is after the realtime segment has been committed and converted to offline), we close this lucene index writer to release resources but don't commit.stopWordsInclude- the words to include in addition to the default stop word liststopWordsExclude- the words to exclude from the default stop word list
-
LuceneTextIndexCreator
public LuceneTextIndexCreator(IndexCreationContext context, TextIndexConfig indexConfig)
-
-
Method Detail
-
getIndexWriter
public org.apache.lucene.index.IndexWriter getIndexWriter()
-
add
public void add(String document)
-
add
public void add(String[] documents, int length)
-
seal
public void seal()
-
close
public void close() throws IOException- Throws:
IOException
-
-