public class NewWordDiscover extends Object
| Constructor and Description |
|---|
NewWordDiscover() |
NewWordDiscover(int max_word_len,
float min_freq,
float min_entropy,
float min_aggregation,
boolean filter)
构造一个新词识别工具
|
| Modifier and Type | Method and Description |
|---|---|
List<WordInfo> |
discover(BufferedReader reader,
int size)
提取词语
|
List<WordInfo> |
discover(String doc,
int size)
提取词语
|
public NewWordDiscover()
public NewWordDiscover(int max_word_len,
float min_freq,
float min_entropy,
float min_aggregation,
boolean filter)
max_word_len - 词语最长长度min_freq - 词语最低频率min_entropy - 词语最低熵min_aggregation - 词语最低互信息filter - 是否过滤掉HanLP中的词库中已存在的词语public List<WordInfo> discover(BufferedReader reader, int size) throws IOException
reader - 大文本size - 需要提取词语的数量IOExceptionCopyright © 2014–2021 码农场. All rights reserved.