public interface IDataSet extends Iterable<Document>
| 限定符和类型 | 方法和说明 |
|---|---|
IDataSet |
add(Map<String,String[]> testingDataSet) |
Document |
add(String category,
String text)
往训练集中加入一个文档
|
void |
clear()
清空数据集
|
Document |
convert(String category,
String text)
利用本数据集的词表和类目表将文本形式的文档转换为内部通用的文档
|
Catalog |
getCatalog()
获取类目表
|
Lexicon |
getLexicon()
获取词表
|
ITokenizer |
getTokenizer()
获取分词器
|
boolean |
isTestingDataSet()
是否是测试集
|
IDataSet |
load(String folderPath)
加载数据集
|
IDataSet |
load(String folderPath,
double rate) |
IDataSet |
load(String folderPath,
String charsetName)
加载数据集
|
IDataSet |
load(String folderPath,
String charsetName,
double percentage) |
IDataSet |
setTokenizer(ITokenizer tokenizer)
设置分词器
|
IDataSet |
shrink(int[] idMap) |
int |
size()
数据集的样本大小
|
forEach, iterator, spliteratorIDataSet load(String folderPath) throws IllegalArgumentException, IOException
folderPath - 分类语料的根目录.目录必须满足如下结构:IllegalArgumentExceptionIOExceptionIDataSet load(String folderPath, double rate) throws IllegalArgumentException, IOException
IDataSet load(String folderPath, String charsetName) throws IllegalArgumentException, IOException
folderPath - 分类语料的根目录.目录必须满足如下结构:charsetName - 文件编码IllegalArgumentExceptionIOExceptionIDataSet load(String folderPath, String charsetName, double percentage) throws IllegalArgumentException, IOException
Document convert(String category, String text)
category - text - IDataSet setTokenizer(ITokenizer tokenizer)
tokenizer - int size()
ITokenizer getTokenizer()
Catalog getCatalog()
Lexicon getLexicon()
void clear()
boolean isTestingDataSet()
IDataSet shrink(int[] idMap)
Copyright © 2014–2017 码农场. All rights reserved.