public interface TextTokenizer
A stop-word based string tokenizer.
TextIndexingService,
EnglishTextTokenizer,
NitriteBuilder.textTokenizer(TextTokenizer)| Modifier and Type | Method and Description |
|---|---|
java.util.Set<java.lang.String> |
stopWords()
Gets all stop-words for a language.
|
java.util.Set<java.lang.String> |
tokenize(java.lang.String text)
Tokenize a
text and discards all stop-words from it. |
java.util.Set<java.lang.String> tokenize(java.lang.String text)
throws java.io.IOException
Tokenize a text and discards all stop-words from it.
text - the text to tokenizejava.io.IOException - if a low-level I/O error occurs.java.util.Set<java.lang.String> stopWords()
Gets all stop-words for a language.