public class SourcedTFIDF extends AbstractSourcedStatisticalTokenDistance
| Modifier and Type | Class and Description |
|---|---|
protected class |
SourcedTFIDF.UnitVector
Marker class extending BagOfTokens
|
collectionSize, documentFrequency, totalTokenCounttokenizer| Constructor and Description |
|---|
SourcedTFIDF() |
SourcedTFIDF(SourcedTokenizer tokenizer) |
| Modifier and Type | Method and Description |
|---|---|
protected SourcedTFIDF.UnitVector |
asUnitVector(SourcedStringWrapper w) |
String |
explainScore(StringWrapper s,
StringWrapper t)
Explain how the distance was computed.
|
int |
getCollectionSize() |
int |
getDocumentFrequency(Token token)
Get the document frequency of the token.
|
Token[] |
getTokens()
Access the tokens of the last prepare()-ed string.
|
double |
getWeight(Token token)
Access the weight of a token in the vector created for the last prepare()-ed string.
|
static void |
main(String[] argv) |
StringWrapper |
prepare(String s)
Preprocess a string by finding tokens and giving them TFIDF weights
|
double |
score(StringWrapper s0,
StringWrapper t0)
This method needs to be implemented by subclasses.
|
void |
setCollectionSize(int n)
Setting the collectionSize and alsoSet the size of the collection that this TFIDF measure was
trained on to some value.
|
void |
setDocumentFrequency(Token token,
int df)
Set the document frequency of the token to some value.
|
String |
toString() |
checkTrainingHasHappened, trainasBagOfSourcedTokens, prepare, setStringWrapperPooladdExample, doMain, explainScore, getDistance, hasNextQuery, nextQuery, prepare, score, setDistanceInstancePoolpublic SourcedTFIDF(SourcedTokenizer tokenizer)
public SourcedTFIDF()
public double score(StringWrapper s0, StringWrapper t0)
AbstractStringDistancescore in interface StringDistancescore in class AbstractStringDistanceprotected SourcedTFIDF.UnitVector asUnitVector(SourcedStringWrapper w)
public StringWrapper prepare(String s)
prepare in interface StringDistanceprepare in class AbstractStringDistancepublic Token[] getTokens()
public double getWeight(Token token)
public int getDocumentFrequency(Token token)
getDocumentFrequency in class AbstractSourcedStatisticalTokenDistancepublic void setDocumentFrequency(Token token, int df)
public int getCollectionSize()
public void setCollectionSize(int n)
public String explainScore(StringWrapper s, StringWrapper t)
explainScore in interface StringDistanceexplainScore in class AbstractStringDistancepublic static void main(String[] argv)
Copyright © 2016. All rights reserved.