public class DynamicCustomDictionary extends Object
| Modifier and Type | Field and Description |
|---|---|
DoubleArrayTrie<CoreDictionary.Attribute> |
dat
用于储存文件中的词条
|
String[] |
path
本词典是从哪些路径加载得到的
|
BinTrie<CoreDictionary.Attribute> |
trie
用于储存用户动态插入词条的二分trie树
|
| Constructor and Description |
|---|
DynamicCustomDictionary()
构造一份词典对象,并加载
com.hankcs.hanlp.HanLP.Config#CustomDictionaryPath |
DynamicCustomDictionary(DoubleArrayTrie<CoreDictionary.Attribute> dat,
BinTrie<CoreDictionary.Attribute> trie,
String[] path)
使用高级数据结构构造词典对象,并加载指定路径的词典
|
DynamicCustomDictionary(String... path)
构造一份词典对象,并加载指定路径的词典
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
add(String word)
往自定义词典中插入一个新词(非覆盖模式)
动态增删不会持久化到词典文件 |
boolean |
add(String word,
String natureWithFrequency)
往自定义词典中插入一个新词(非覆盖模式)
动态增删不会持久化到词典文件 |
LinkedList<Map.Entry<String,CoreDictionary.Attribute>> |
commonPrefixSearch(char[] chars,
int begin)
前缀查询
|
LinkedList<Map.Entry<String,CoreDictionary.Attribute>> |
commonPrefixSearch(String key)
前缀查询
|
boolean |
contains(String key)
词典中是否含有词语
|
CoreDictionary.Attribute |
get(String key)
查单词
|
BaseSearcher |
getSearcher(char[] charArray)
获取一个BinTrie的查询工具
|
BaseSearcher |
getSearcher(String text) |
BinTrie<CoreDictionary.Attribute> |
getTrie()
Deprecated.
谨慎操作,有可能废弃此接口
|
boolean |
insert(String word)
以覆盖模式增加新词
动态增删不会持久化到词典文件 |
boolean |
insert(String word,
String natureWithFrequency)
往自定义词典中插入一个新词(覆盖模式)
动态增删不会持久化到词典文件 |
static boolean |
isDicNeedUpdate(String mainPath,
String[] path)
获取本地词典更新状态
|
boolean |
load(String... path)
加载指定路径的词典
|
static boolean |
load(String path,
Nature defaultNature,
TreeMap<String,CoreDictionary.Attribute> map,
LinkedHashSet<Nature> customNatureCollector)
加载用户词典(追加)
|
static boolean |
loadDat(String path,
DoubleArrayTrie<CoreDictionary.Attribute> dat) |
static boolean |
loadDat(String path,
String[] customDicPath,
DoubleArrayTrie<CoreDictionary.Attribute> dat)
从磁盘加载双数组
|
boolean |
loadMainDictionary(String mainPath)
使用词典路径为缓存路径,加载指定词典
|
static boolean |
loadMainDictionary(String mainPath,
String[] path,
DoubleArrayTrie<CoreDictionary.Attribute> dat,
boolean isCache)
加载词典
|
void |
parseLongestText(String text,
AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
最长匹配
|
void |
parseText(char[] text,
AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
解析一段文本(目前采用了BinTrie+DAT的混合储存形式,此方法可以统一两个数据结构)
|
void |
parseText(String text,
AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
解析一段文本(目前采用了BinTrie+DAT的混合储存形式,此方法可以统一两个数据结构)
|
boolean |
reload()
热更新(重新加载)
集群环境(或其他IOAdapter)需要自行删除缓存文件(路径 = HanLP.Config.CustomDictionaryPath[0] + Predefine.BIN_EXT) |
void |
remove(String key)
删除单词
动态增删不会持久化到词典文件 |
String |
toString() |
public BinTrie<CoreDictionary.Attribute> trie
public DoubleArrayTrie<CoreDictionary.Attribute> dat
public String[] path
public DynamicCustomDictionary()
com.hankcs.hanlp.HanLP.Config#CustomDictionaryPathpublic DynamicCustomDictionary(String... path)
path - 词典路径public DynamicCustomDictionary(DoubleArrayTrie<CoreDictionary.Attribute> dat, BinTrie<CoreDictionary.Attribute> trie, String[] path)
dat - 双数组trie树trie - trie树path - 词典路径public boolean load(String... path)
path - 词典路径public static boolean loadMainDictionary(String mainPath, String[] path, DoubleArrayTrie<CoreDictionary.Attribute> dat, boolean isCache)
mainPath - 缓存文件文件名path - 自定义词典isCache - 是否缓存结果public boolean loadMainDictionary(String mainPath)
mainPath - 词典路径(+.bin等于缓存路径)public static boolean load(String path, Nature defaultNature, TreeMap<String,CoreDictionary.Attribute> map, LinkedHashSet<Nature> customNatureCollector)
path - 词典路径defaultNature - 默认词性customNatureCollector - 收集用户词性public boolean add(String word, String natureWithFrequency)
word - 新词 如“裸婚”natureWithFrequency - 词性和其对应的频次,比如“nz 1 v 2”,null时表示“nz 1”public boolean add(String word)
word - 新词 如“裸婚”public boolean insert(String word, String natureWithFrequency)
word - 新词 如“裸婚”natureWithFrequency - 词性和其对应的频次,比如“nz 1 v 2”,null时表示“nz 1”。public boolean insert(String word)
word - public static boolean loadDat(String path, DoubleArrayTrie<CoreDictionary.Attribute> dat)
public static boolean loadDat(String path, String[] customDicPath, DoubleArrayTrie<CoreDictionary.Attribute> dat)
path - 主词典路径customDicPath - 用户词典路径public static boolean isDicNeedUpdate(String mainPath, String[] path)
public CoreDictionary.Attribute get(String key)
key - public void remove(String key)
key - public LinkedList<Map.Entry<String,CoreDictionary.Attribute>> commonPrefixSearch(String key)
key - public LinkedList<Map.Entry<String,CoreDictionary.Attribute>> commonPrefixSearch(char[] chars, int begin)
chars - begin - public BaseSearcher getSearcher(String text)
public boolean contains(String key)
key - 词语public BaseSearcher getSearcher(char[] charArray)
charArray - 文本public BinTrie<CoreDictionary.Attribute> getTrie()
public void parseText(char[] text,
AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
text - 文本processor - 处理器public void parseText(String text, AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
text - 文本processor - 处理器public void parseLongestText(String text, AhoCorasickDoubleArrayTrie.IHit<CoreDictionary.Attribute> processor)
text - 文本processor - 处理器public boolean reload()
Copyright © 2014–2021 码农场. All rights reserved.