| Package | Description |
|---|---|
| com.hankcs.hanlp.mining.word2vec |
Java移植版的word2vec,最大程度上与原版一致。大部分代码来自:https://github.com/kojisekig/word2vec-lucene ,额外做了一些性能优化
|
| Modifier and Type | Method and Description |
|---|---|
Word2VecTrainer |
Word2VecTrainer.setDownSamplingRate(float downSampleRate)
设置高频词的下采样频率(高频词频率一旦高于此频率,训练时将被随机忽略),在不使用停用词词典的情况下,停用词就符合高频词的标准
|
Word2VecTrainer |
Word2VecTrainer.setInitialLearningRate(float initialLearningRate)
设置初始学习率
|
Word2VecTrainer |
Word2VecTrainer.setLayerSize(int layerSize)
词向量的维度(等同于神经网络模型隐藏层的大小)
|
Word2VecTrainer |
Word2VecTrainer.setMinVocabFrequency(int minFrequency)
最低词频,低于此数值将被过滤掉
|
Word2VecTrainer |
Word2VecTrainer.setNumIterations(int iterations)
设置迭代次数
|
Word2VecTrainer |
Word2VecTrainer.setWindowSize(int windowSize)
窗口大小
|
Word2VecTrainer |
Word2VecTrainer.type(NeuralNetworkType type)
神经网络类型
|
Word2VecTrainer |
Word2VecTrainer.useHierarchicalSoftmax()
启用 hierarchical softmax
|
Word2VecTrainer |
Word2VecTrainer.useNegativeSamples(int negativeSamples)
负采样样本数
一般在 5 到 10 之间
|
Word2VecTrainer |
Word2VecTrainer.useNumThreads(int numThreads)
并行化训练线程数
|
Copyright © 2014–2021 码农场. All rights reserved.