| Modifier and Type | Method and Description |
|---|---|
static List<Term> |
HanLP.segment(String text)
分词
|
| Modifier and Type | Method and Description |
|---|---|
void |
Occurrence.addAll(List<Term> resultList) |
| Modifier and Type | Method and Description |
|---|---|
static CoNLLSentence |
MaxEntDependencyParser.compute(List<Term> termList)
Deprecated.
分析句子的依存句法
|
static CoNLLSentence |
WordNatureDependencyParser.compute(List<Term> termList)
分析句子的依存句法
|
CoNLLSentence |
MinimumSpanningTreeParser.parse(List<Term> termList) |
CoNLLSentence |
IDependencyParser.parse(List<Term> termList)
分析句子的依存句法
|
| Constructor and Description |
|---|
Node(Term term,
int id) |
| Modifier and Type | Method and Description |
|---|---|
static CoNLLSentence |
NeuralNetworkDependencyParser.compute(List<Term> termList)
分析句子的依存句法
|
CoNLLSentence |
NeuralNetworkDependencyParser.parse(List<Term> termList) |
| Modifier and Type | Method and Description |
|---|---|
static List<String> |
PosTagUtil.to863(List<Term> termList)
转为863标注集
863词性标注集,其各个词性含义如下表: |
| Modifier and Type | Method and Description |
|---|---|
CoNLLSentence |
KBeamArcEagerDependencyParser.parse(List<Term> termList) |
CoNLLSentence |
KBeamArcEagerDependencyParser.parse(List<Term> termList,
int beamWidth,
int numOfThreads)
执行句法分析
|
| Modifier and Type | Method and Description |
|---|---|
static List<Long[]> |
CoreSynonymDictionaryEx.convert(List<Term> sentence,
boolean withUndefinedItem)
将分词结果转换为同义词列表
|
static List<CommonSynonymDictionary.SynonymItem> |
CoreSynonymDictionary.createSynonymList(List<Term> sentence,
boolean withUndefinedItem)
将分词结果转换为同义词列表
|
| Modifier and Type | Method and Description |
|---|---|
static List<Term> |
CoreStopWordDictionary.apply(List<Term> termList)
对分词结果应用过滤
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
StopWordDictionary.shouldInclude(Term term) |
boolean |
Filter.shouldInclude(Term term)
是否应当将这个term纳入计算
|
static boolean |
CoreStopWordDictionary.shouldInclude(Term term)
是否应当将这个term纳入计算
|
static boolean |
CoreStopWordDictionary.shouldRemove(Term term)
是否应当去掉这个词
|
| Modifier and Type | Method and Description |
|---|---|
static List<Term> |
CoreStopWordDictionary.apply(List<Term> termList)
对分词结果应用过滤
|
| Modifier and Type | Method and Description |
|---|---|
void |
TfIdfCounter.add(List<Term> termList) |
void |
TermFrequencyCounter.add(List<Term> termList) |
void |
TfIdfCounter.add(Object id,
List<Term> termList) |
List<String> |
TfIdfCounter.getKeywords(List<Term> termList,
int size) |
List<String> |
TermFrequencyCounter.getKeywords(List<Term> termList,
int size)
提取关键词(非线程安全)
|
List<Map.Entry<String,Double>> |
TfIdfCounter.getKeywordsWithTfIdf(List<Term> termList,
int size) |
| Modifier and Type | Method and Description |
|---|---|
protected static List<Term> |
WordBasedSegment.convert(List<Vertex> vertexList)
将一条路径转为最终结果
|
protected static List<Term> |
Segment.convert(List<Vertex> vertexList,
boolean offsetEnabled)
将一条路径转为最终结果
|
protected List<Term> |
WordBasedSegment.decorateResultForIndexMode(List<Vertex> vertexList,
WordNet wordNetAll)
为了索引模式修饰结果
|
List<Term> |
SegmentPipeline.flow(String input) |
protected abstract List<Term> |
CharacterBasedSegment.roughSegSentence(char[] sentence)
单纯的分词模型实现该方法,仅输出词
|
List<Term> |
Segment.seg(char[] text)
分词
|
List<Term> |
Segment.seg(String text)
分词
此方法是线程安全的 |
List<Term> |
SegmentPipeline.seg(String text) |
List<List<Term>> |
Segment.seg2sentence(String text)
分词断句 输出句子形式
|
List<List<Term>> |
Segment.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
protected abstract List<Term> |
Segment.segSentence(char[] sentence)
给一个句子分词
|
protected List<Term> |
SegmentPipeline.segSentence(char[] sentence) |
protected List<Term> |
CharacterBasedSegment.segSentence(char[] sentence)
以下方法用于纯分词模型
分词、词性标注联合模型则直接重载segSentence
|
| Modifier and Type | Method and Description |
|---|---|
static CoreDictionary.Attribute |
CharacterBasedSegment.guessAttribute(Term term)
查询或猜测一个词语的属性,
先查词典,然后对字母、数字串的属性进行判断,最后猜测未登录词
|
| Modifier and Type | Method and Description |
|---|---|
protected List<Vertex> |
CharacterBasedSegment.toVertexList(List<Term> wordList,
boolean appendStart)
将中间结果转换为词网顶点,
这样就可以利用基于Vertex开发的功能, 如词性标注、NER等
|
| Modifier and Type | Method and Description |
|---|---|
Term |
SegmentWrapper.next() |
| Modifier and Type | Method and Description |
|---|---|
protected List<Term> |
CRFSegment.roughSegSentence(char[] sentence)
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
List<Term> |
DijkstraSegment.segSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
protected List<Term> |
HMMSegment.roughSegSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
static List<Term> |
NShortSegment.parse(String text)
一句话分词
|
List<Term> |
NShortSegment.segSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
protected List<Term> |
DoubleArrayTrieSegment.segSentence(char[] sentence) |
protected List<Term> |
AhoCorasickDoubleArrayTrieSegment.segSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
protected List<Term> |
ViterbiSegment.segSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
KeywordExtractor.shouldInclude(Term term)
是否应当将这个term纳入计算,词性属于名词、动词、副词、形容词
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
KeywordExtractor.filter(List<Term> termList) |
List<String> |
TextRankKeyword.getKeywords(List<Term> termList,
int size) |
abstract List<String> |
KeywordExtractor.getKeywords(List<Term> termList,
int size) |
Map<String,Float> |
TextRankKeyword.getTermAndRank(List<Term> termList)
使用已经分好的词来计算rank
|
| Modifier and Type | Method and Description |
|---|---|
static List<List<Term>> |
BasicTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
NLPTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
IndexTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
TraditionalChineseTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
SpeedTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
StandardTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
NotionalTokenizer.seg2sentence(String text)
切分为句子形式
|
static List<List<Term>> |
BasicTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
NLPTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
IndexTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
TraditionalChineseTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
SpeedTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
StandardTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
NotionalTokenizer.seg2sentence(String text,
boolean shortest)
分词断句 输出句子形式
|
static List<List<Term>> |
NotionalTokenizer.seg2sentence(String text,
Filter... filterArrayChain)
切分为句子形式
|
static List<Term> |
BasicTokenizer.segment(char[] text)
分词
|
static List<Term> |
NLPTokenizer.segment(char[] text)
分词
|
static List<Term> |
IndexTokenizer.segment(char[] text)
分词
|
static List<Term> |
TraditionalChineseTokenizer.segment(char[] text)
分词
|
static List<Term> |
SpeedTokenizer.segment(char[] text)
分词
|
static List<Term> |
StandardTokenizer.segment(char[] text)
分词
|
static List<Term> |
NotionalTokenizer.segment(char[] text)
分词
|
static List<Term> |
BasicTokenizer.segment(String text)
分词
|
static List<Term> |
NLPTokenizer.segment(String text) |
static List<Term> |
URLTokenizer.segment(String text)
分词
|
static List<Term> |
IndexTokenizer.segment(String text) |
static List<Term> |
TraditionalChineseTokenizer.segment(String text) |
static List<Term> |
SpeedTokenizer.segment(String text) |
static List<Term> |
StandardTokenizer.segment(String text)
分词
|
static List<Term> |
NotionalTokenizer.segment(String text) |
| Modifier and Type | Method and Description |
|---|---|
protected List<Term> |
AbstractLexicalAnalyzer.roughSegSentence(char[] sentence) |
protected List<Term> |
AbstractLexicalAnalyzer.segSentence(char[] sentence) |
| Modifier and Type | Method and Description |
|---|---|
static CoreDictionary.Attribute |
LexiconUtility.getAttribute(Term term)
从HanLP的词库中提取某个单词的属性(包括核心词典和用户词典)
|
| Modifier and Type | Method and Description |
|---|---|
static boolean |
SentencesUtil.hasNature(List<Term> sentence,
Nature nature)
句子中是否含有词性
|
Copyright © 2014–2021 码农场. All rights reserved.