public class TfIdf extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
TfIdf.Normalization
tf-idf 向量的正规化算法
|
static class |
TfIdf.TfType
词频统计方式
|
| Constructor and Description |
|---|
TfIdf() |
| Modifier and Type | Method and Description |
|---|---|
static <TERM> Map<TERM,Double> |
idf(Iterable<Iterable<TERM>> documentVocabularies)
平滑处理后的一系列文档的倒排词频
|
static <TERM> Map<TERM,Double> |
idf(Iterable<Iterable<TERM>> documentVocabularies,
boolean smooth,
boolean addOne)
一系列文档的倒排词频
|
static <TERM> Map<TERM,Double> |
idfFromTfs(Iterable<Map<TERM,Double>> tfs)
从词频集合建立倒排频率(默认平滑词频,且加一平滑tf-idf)
|
static <TERM> Map<TERM,Double> |
idfFromTfs(Iterable<Map<TERM,Double>> tfs,
boolean smooth,
boolean addOne)
从词频集合建立倒排频率
|
static <TERM> Map<TERM,Double> |
tf(Collection<TERM> document)
单文档词频
|
static <TERM> Map<TERM,Double> |
tf(Collection<TERM> document,
TfIdf.TfType type)
单文档词频
|
static <TERM> Map<TERM,Double> |
tfIdf(Map<TERM,Double> tf,
Map<TERM,Double> idf)
计算文档的tf-idf(不正规化)
|
static <TERM> Map<TERM,Double> |
tfIdf(Map<TERM,Double> tf,
Map<TERM,Double> idf,
TfIdf.Normalization normalization)
计算文档的tf-idf
|
static <TERM> Iterable<Map<TERM,Double>> |
tfs(Iterable<Collection<TERM>> documents)
多文档词频
|
static <TERM> Iterable<Map<TERM,Double>> |
tfs(Iterable<Collection<TERM>> documents,
TfIdf.TfType type)
多文档词频
|
public static <TERM> Map<TERM,Double> tf(Collection<TERM> document, TfIdf.TfType type)
TERM - 词语类型document - 词袋type - 词频计算方式public static <TERM> Map<TERM,Double> tf(Collection<TERM> document)
TERM - 词语类型document - 词袋public static <TERM> Iterable<Map<TERM,Double>> tfs(Iterable<Collection<TERM>> documents, TfIdf.TfType type)
TERM - 词语类型documents - 多个文档,每个文档都是一个词袋type - 词频计算方式public static <TERM> Iterable<Map<TERM,Double>> tfs(Iterable<Collection<TERM>> documents)
TERM - 词语类型documents - 多个文档,每个文档都是一个词袋public static <TERM> Map<TERM,Double> idf(Iterable<Iterable<TERM>> documentVocabularies, boolean smooth, boolean addOne)
TERM - 词语类型documentVocabularies - 词表smooth - 平滑参数,视作额外有一个文档,该文档含有smooth个每个词语addOne - tf-idf加一平滑public static <TERM> Map<TERM,Double> idf(Iterable<Iterable<TERM>> documentVocabularies)
TERM - 词语类型documentVocabularies - 词表public static <TERM> Map<TERM,Double> tfIdf(Map<TERM,Double> tf, Map<TERM,Double> idf, TfIdf.Normalization normalization)
TERM - 词语类型tf - 词频idf - 倒排频率normalization - 正规化public static <TERM> Map<TERM,Double> tfIdf(Map<TERM,Double> tf, Map<TERM,Double> idf)
TERM - 词语类型tf - 词频idf - 倒排频率public static <TERM> Map<TERM,Double> idfFromTfs(Iterable<Map<TERM,Double>> tfs, boolean smooth, boolean addOne)
TERM - 词语类型tfs - 次品集合smooth - 平滑参数,视作额外有一个文档,该文档含有smooth个每个词语addOne - tf-idf加一平滑Copyright © 2014–2021 码农场. All rights reserved.