public class TFIDF extends AbstractStatisticalTokenDistance
| Modifier and Type | Class and Description |
|---|---|
protected class |
TFIDF.UnitVector
Marker class extending BagOfTokens
|
collectionSize, documentFrequency, totalTokenCounttokenizer| Modifier and Type | Method and Description |
|---|---|
protected TFIDF.UnitVector |
asUnitVector(StringWrapper w) |
String |
explainScore(StringWrapper s,
StringWrapper t)
Explain how the distance was computed.
|
int |
getCollectionSize() |
int |
getDocumentFrequency(Token token)
Get the document frequency of the token.
|
Token[] |
getTokens()
Access the tokens of the last prepare()-ed string.
|
int |
getVocabularySize() |
double |
getWeight(Token token)
Access the weight of a token in the vector created for the last prepare()-ed string.
|
static void |
main(String[] argv) |
StringWrapper |
prepare(String s)
Preprocess a string by finding tokens and giving them TFIDF weights
|
double |
score(StringWrapper s,
StringWrapper t)
This method needs to be implemented by subclasses.
|
void |
setCollectionSize(int n)
Setting the collectionSize and alsoSet the size of the collection that this TFIDF measure was
trained on to some value.
|
void |
setDocumentFrequency(Token token,
int df)
Set the document frequency of the token to some value.
|
void |
setTokenCount(int tc) |
String |
toString() |
checkTrainingHasHappened, tokenIterator, trainasBagOfTokens, prepare, setStringWrapperPooladdExample, doMain, explainScore, getDistance, hasNextQuery, nextQuery, prepare, score, setDistanceInstancePoolpublic TFIDF(Tokenizer tokenizer)
public TFIDF()
public double score(StringWrapper s, StringWrapper t)
AbstractStringDistancescore in interface StringDistancescore in class AbstractStringDistanceprotected TFIDF.UnitVector asUnitVector(StringWrapper w)
public StringWrapper prepare(String s)
prepare in interface StringDistanceprepare in class AbstractStringDistancepublic Token[] getTokens()
public double getWeight(Token token)
public int getDocumentFrequency(Token token)
getDocumentFrequency in class AbstractStatisticalTokenDistancepublic void setDocumentFrequency(Token token, int df)
public void setTokenCount(int tc)
public int getCollectionSize()
public void setCollectionSize(int n)
public int getVocabularySize()
public String explainScore(StringWrapper s, StringWrapper t)
explainScore in interface StringDistanceexplainScore in class AbstractStringDistancepublic static void main(String[] argv)
Copyright © 2016. All rights reserved.