| Interface | Description |
|---|---|
| Vectorizer | |
| Weight |
| Class | Description |
|---|---|
| DictionaryVectorizer |
This class converts a set of input documents in the sequence file format to vectors.
|
| DocumentProcessor |
This class converts a set of input documents in the sequence file format of
StringTuples.The
SequenceFile input should have a Text key
containing the unique document identifier and a
Text value containing the whole document. |
| EncodedVectorsFromSequenceFiles |
Converts a given set of sequence files into SparseVectors
|
| EncodingMapper |
The Mapper that does the work of encoding text
|
| HighDFWordsPruner | |
| SimpleTextEncodingVectorizer |
Runs a Map/Reduce job that encodes
FeatureVectorEncoder the
input and writes it to the output as a sequence file. |
| SparseVectorsFromSequenceFiles |
Converts a given set of sequence files into SparseVectors
|
| TF |
Weight based on term frequency only |
| TFIDF | |
| VectorizerConfig |
The config for a Vectorizer.
|
Copyright © 2008–2017 The Apache Software Foundation. All rights reserved.