public class TextProcessUtility extends Object
| 构造器和说明 |
|---|
TextProcessUtility() |
| 限定符和类型 | 方法和说明 |
|---|---|
static String[] |
extractKeywords(String text)
提取关键词,在真实的应用场景中,还应该涉及到短语
|
static Map<String,Integer> |
getKeywordCounts(String[] keywordArray)
统计每个词的词频
|
static Map<String,String[]> |
loadCorpus(String path)
加载一个文件夹下的所有语料
|
static Map<String,String[]> |
loadCorpusWithException(String corpusPath) |
static Map<String,String[]> |
loadCorpusWithException(String folderPath,
String charsetName)
加载一个文件夹下的所有语料
|
static String |
preprocess(String text)
预处理,去除标点,空格和停用词
|
static String |
readTxt(File file,
String charsetName) |
public static String[] extractKeywords(String text)
text - public static Map<String,Integer> getKeywordCounts(String[] keywordArray)
keywordArray - public static Map<String,String[]> loadCorpusWithException(String folderPath, String charsetName) throws IOException
folderPath - IOExceptionpublic static String readTxt(File file, String charsetName) throws IOException
IOExceptionpublic static Map<String,String[]> loadCorpusWithException(String corpusPath) throws IOException
IOExceptionCopyright © 2014–2018 码农场. All rights reserved.