package utils
Ordering
- Alphabetic
Visibility
- Public
- All
Type Members
- abstract class AbstractTextClassificationParams extends Serializable
-
case class
TextClassificationParams(baseDir: String = "./", maxSequenceLength: Int = 500, maxWordsNum: Int = 5000, trainingSplit: Double = 0.8, batchSize: Int = 128, embeddingDim: Int = 200, learningRate: Double = 0.01, partitionNum: Int = 4) extends AbstractTextClassificationParams with Product with Serializable
- baseDir
The root directory which containing the training and embedding data
- maxSequenceLength
number of the tokens
- maxWordsNum
maximum word to be included
- trainingSplit
percentage of the training data
- batchSize
size of the mini-batch
- embeddingDim
size of the embedding vector
- learningRate
learning rate
-
class
TextClassifier extends Serializable
This example use a (pre-trained GloVe embedding) to convert word to vector, and uses it to train a text classification model on the 20 Newsgroup dataset with 20 different categories.
This example use a (pre-trained GloVe embedding) to convert word to vector, and uses it to train a text classification model on the 20 Newsgroup dataset with 20 different categories. This model can achieve around 90% accuracy after 2 epochs training.
-
case class
WordMeta(count: Int, index: Int) extends Product with Serializable
- count
frequency of the word.
- index
index of the word which ranked by the frequency from high to low.
Value Members
- object SimpleTokenizer