public abstract class AbstractSegmentation extends Object implements DictionaryBasedSegmentation
| Modifier and Type | Field and Description |
|---|---|
protected org.slf4j.Logger |
LOGGER |
| Constructor and Description |
|---|
AbstractSegmentation() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addWord(List<Word> result,
String text,
int start,
int len)
将识别出的词放入队列
|
protected void |
addWord(Stack<Word> result,
String text,
int start,
int len)
将识别出的词入栈
|
Dictionary |
getDictionary()
获取词典操作接口
|
int |
getInterceptLength()
分词时截取的字符串的最大长度
|
protected Word |
getWord(String text,
int start,
int len)
获取一个已经识别的词
|
boolean |
isParallelSeg() |
static void |
main(String[] args) |
Map<List<Word>,Float> |
ngram(List<Word>... sentences)
利用ngram进行评分
|
boolean |
ngramEnabled()
是否启用ngram
|
List<Word> |
seg(String text)
默认分词算法实现:
1、把要分词的文本根据标点符号进行分割
2、对分割后的文本进行分词
3、组合分词结果
|
abstract List<Word> |
segImpl(String text)
具体的分词实现,留待子类实现
|
void |
setDictionary(Dictionary dictionary)
为基于词典的中文分词接口指定词典操作接口
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetSegmentationAlgorithmpublic boolean isParallelSeg()
public void setDictionary(Dictionary dictionary)
setDictionary in interface DictionaryBasedSegmentationdictionary - 词典操作接口public Dictionary getDictionary()
getDictionary in interface DictionaryBasedSegmentationpublic abstract List<Word> segImpl(String text)
text - 文本public boolean ngramEnabled()
public Map<List<Word>,Float> ngram(List<Word>... sentences)
sentences - 多个分词结果public int getInterceptLength()
public List<Word> seg(String text)
seg in interface Segmentationtext - 文本protected void addWord(List<Word> result, String text, int start, int len)
result - 队列text - 文本start - 词开始索引len - 词长度protected void addWord(Stack<Word> result, String text, int start, int len)
result - 栈text - 文本start - 词开始索引len - 词长度protected Word getWord(String text, int start, int len)
text - 文本start - 词开始索引len - 词长度public static void main(String[] args)
Copyright © 2014–2015 APDPlat. All rights reserved.