public class NewWordDiscover extends Object
| 构造器和说明 |
|---|
NewWordDiscover() |
NewWordDiscover(int max_word_len,
float min_freq,
float min_entropy,
float min_aggregation,
boolean filter)
构造一个新词识别工具
|
public NewWordDiscover()
public NewWordDiscover(int max_word_len,
float min_freq,
float min_entropy,
float min_aggregation,
boolean filter)
max_word_len - 词语最长长度min_freq - 词语最低频率min_entropy - 词语最低熵min_aggregation - 词语最低互信息filter - 是否过滤掉HanLP中的词库中已存在的词语public List<WordInfo> discover(BufferedReader reader, int size) throws IOException
reader - 大文本size - 需要提取词语的数量IOExceptionCopyright © 2014–2017 码农场. All rights reserved.