Class LuceneTextIndexCreator
- java.lang.Object
-
- org.apache.pinot.segment.local.segment.creator.impl.text.LuceneTextIndexCreator
-
- All Implemented Interfaces:
Closeable,AutoCloseable,TextIndexCreator
public class LuceneTextIndexCreator extends Object implements TextIndexCreator
This is used to create Lucene based text index. Used for both offline fromSegmentColumnarIndexCreatorand realtime fromRealtimeLuceneTextIndex
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.lucene.analysis.CharArraySetENGLISH_STOP_WORDS_SETstatic StringLUCENE_INDEX_DOC_ID_COLUMN_NAME
-
Constructor Summary
Constructors Constructor Description LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, List<String> stopWordsInclude, List<String> stopWordsExclude)Called bySegmentColumnarIndexCreatorwhen building an offline segment.
-
Method Summary
Modifier and Type Method Description voidadd(String document)voidadd(String[] documents, int length)voidclose()static HashSet<String>getDefaultEnglishStopWordsSet()org.apache.lucene.index.IndexWritergetIndexWriter()voidseal()
-
-
-
Field Detail
-
LUCENE_INDEX_DOC_ID_COLUMN_NAME
public static final String LUCENE_INDEX_DOC_ID_COLUMN_NAME
- See Also:
- Constant Field Values
-
ENGLISH_STOP_WORDS_SET
public static final org.apache.lucene.analysis.CharArraySet ENGLISH_STOP_WORDS_SET
-
-
Constructor Detail
-
LuceneTextIndexCreator
public LuceneTextIndexCreator(String column, File segmentIndexDir, boolean commit, @Nullable List<String> stopWordsInclude, @Nullable List<String> stopWordsExclude)
Called bySegmentColumnarIndexCreatorwhen building an offline segment. Similar to how it creates per column dictionary, forward and inverted index, a text index is also created if text search is enabled on a column.- Parameters:
column- column namesegmentIndexDir- segment index directorycommit- true if the index should be committed (at the end after all documents have been added), false if index should not be committed Note on commit: OnceSegmentColumnarIndexCreatorfinishes indexing all documents/rows for the segment, we need to commit and close the Lucene index which will internally persist the index on disk, do the necessary resource cleanup etc. We commit duringInvertedIndexCreator.seal()and close duringCloseable.close(). This lucene index writer is used by both offline and realtime (both during indexing in-memory MutableSegment and later during conversion to offline). Since realtime segment conversion is again going to go through the offline indexing path and will do everything (indexing, commit, close etc), there is no need to commit the index from the realtime side. So when the realtime segment is destroyed (which is after the realtime segment has been committed and converted to offline), we close this lucene index writer to release resources but don't commit.stopWordsInclude- the words to include in addition to the default stop word liststopWordsExclude- the words to exclude from the default stop word list
-
-
Method Detail
-
getIndexWriter
public org.apache.lucene.index.IndexWriter getIndexWriter()
-
add
public void add(String document)
- Specified by:
addin interfaceTextIndexCreator
-
add
public void add(String[] documents, int length)
- Specified by:
addin interfaceTextIndexCreator
-
seal
public void seal()
- Specified by:
sealin interfaceTextIndexCreator
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
-