public class AbstractLexicalAnalyzer extends CharacterBasedSegment implements LexicalAnalyzer
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
enableRuleBasedSegment
是否执行规则分词(英文数字标点等的规则预处理)。规则永远是丑陋的,默认关闭。
|
protected NERecognizer |
neRecognizer |
protected POSTagger |
posTagger |
protected Segmenter |
segmenter |
protected static byte[] |
typeTable
字符类型表
|
config, customDictionary| Modifier | Constructor and Description |
|---|---|
protected |
AbstractLexicalAnalyzer() |
|
AbstractLexicalAnalyzer(Segmenter segmenter) |
|
AbstractLexicalAnalyzer(Segmenter segmenter,
POSTagger posTagger) |
|
AbstractLexicalAnalyzer(Segmenter segmenter,
POSTagger posTagger,
NERecognizer neRecognizer) |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
acceptCustomWord(int begin,
int end,
CoreDictionary.Attribute value)
Deprecated.
自1.6.7起废弃,强制模式下为最长匹配,否则按分词结果合并
|
Sentence |
analyze(String sentence)
对句子进行词法分析
|
protected List<CoreDictionary.Attribute> |
combineWithCustomDictionary(List<String> vertexList)
使用用户词典合并粗分结果
|
AbstractLexicalAnalyzer |
enableRuleBasedSegment(boolean enableRuleBasedSegment)
是否执行规则分词(英文数字标点等的规则预处理)。规则永远是丑陋的,默认关闭。
|
NERTagSet |
getNERTagSet() |
String[] |
recognize(String[] wordArray,
String[] posArray)
命名实体识别
|
protected List<Term> |
roughSegSentence(char[] sentence)
单纯的分词模型实现该方法,仅输出词
|
List<String> |
segment(String sentence)
中文分词
|
List<String> |
segment(String sentence,
String normalized)
这个方法会查询用户词典
|
void |
segment(String sentence,
String normalized,
List<String> wordList) |
protected void |
segment(String sentence,
String normalized,
List<String> wordList,
List<CoreDictionary.Attribute> attributeList)
分词
|
protected void |
segmentAfterRule(String sentence,
String normalized,
List<String> wordList)
丑陋的规则系统
|
protected List<Term> |
segSentence(char[] sentence)
以下方法用于纯分词模型
分词、词性标注联合模型则直接重载segSentence
|
String[] |
tag(List<String> wordList)
词性标注
|
String[] |
tag(String... words)
词性标注
|
guessAttribute, toVertexListatomSegment, combineByCustomDictionary, combineByCustomDictionary, combineByCustomDictionary, combineByCustomDictionary, convert, enableAllNamedEntityRecognize, enableCustomDictionary, enableCustomDictionary, enableCustomDictionaryForcing, enableIndexMode, enableIndexMode, enableJapaneseNameRecognize, enableMultithreading, enableMultithreading, enableNameRecognize, enableNumberQuantifierRecognize, enableOffset, enableOrganizationRecognize, enablePartOfSpeechTagging, enablePlaceRecognize, enableTranslatedNameRecognize, mergeNumberQuantifier, quickAtomSegment, seg, seg, seg2sentence, seg2sentence, simpleAtomSegmentprotected Segmenter segmenter
protected POSTagger posTagger
protected NERecognizer neRecognizer
protected static byte[] typeTable
protected boolean enableRuleBasedSegment
protected AbstractLexicalAnalyzer()
public AbstractLexicalAnalyzer(Segmenter segmenter)
public AbstractLexicalAnalyzer(Segmenter segmenter, POSTagger posTagger, NERecognizer neRecognizer)
protected void segment(String sentence, String normalized, List<String> wordList, List<CoreDictionary.Attribute> attributeList)
sentence - 文本normalized - 正规化后的文本wordList - 储存单词列表attributeList - 储存用户词典中的词性,设为null表示不查询用户词典public String[] recognize(String[] wordArray, String[] posArray)
NERecognizerrecognize in interface NERecognizerwordArray - 单词posArray - 词性public NERTagSet getNERTagSet()
getNERTagSet in interface NERecognizerpublic Sentence analyze(String sentence)
LexicalAnalyzeranalyze in interface LexicalAnalyzersentence - 纯文本句子public List<String> segment(String sentence, String normalized)
sentence - normalized - protected boolean acceptCustomWord(int begin,
int end,
CoreDictionary.Attribute value)
begin - 起始位置end - 终止位置value - 词性protected List<Term> roughSegSentence(char[] sentence)
CharacterBasedSegmentroughSegSentence in class CharacterBasedSegmentprotected List<Term> segSentence(char[] sentence)
CharacterBasedSegmentsegSentence in class CharacterBasedSegmentsentence - 待分词句子protected void segmentAfterRule(String sentence, String normalized, List<String> wordList)
sentence - normalized - wordList - protected List<CoreDictionary.Attribute> combineWithCustomDictionary(List<String> vertexList)
vertexList - 粗分结果public AbstractLexicalAnalyzer enableRuleBasedSegment(boolean enableRuleBasedSegment)
enableRuleBasedSegment - 是否激活Copyright © 2014–2021 码农场. All rights reserved.