See: Description
| Class | Description |
|---|---|
| AbbreviationAlignment |
Abbreviation distance metric which evaluates the probability of a short-form string being an abbreviation/acronym
of another long-form string.
|
| AbstractSourcedStatisticalTokenDistance |
Abstract token distance metric that uses frequency statistics.
|
| AbstractSourcedTokenizedStringDistance |
Abstract distance metric for tokenized strings.
|
| AbstractStatisticalTokenDistance |
Abstract token distance metric that uses frequency statistics.
|
| AbstractStringDistance |
Abstract class which implements StringDistanceLearner as well as StringDistance.
|
| AbstractTokenizedStringDistance |
Abstract distance metric for tokenized strings.
|
| AffineGap |
Affine-gap string distance, following Durban et al.
|
| ApproxMemoMatrix |
Variant of MemoMatrix that only stores values near the diagonal,
for better efficiency.
|
| ApproxNeedlemanWunsch |
Needleman-Wunsch string distance, following Durban et al.
|
| AveragedStringDistanceLearner |
Abstract StringDistanceLearner class which averages results of a number of
inner distance metrics, learned by a number of inner distance learners.
|
| BasicDistanceInstanceIterator |
A simple DistanceInstanceIterator implementation.
|
| BasicSourcedStringWrapperIterator |
A simple StringWrapperIterator implementation.
|
| BasicStringWrapper |
An extendible (non-final) class that implements some of the
functionality of a string.
|
| BasicStringWrapperIterator |
A simple StringWrapperIterator implementation.
|
| CharMatchScore |
Abstract distance between characters.
|
| CombinedStringDistanceLearner |
Abstract StringDistanceLearner class which combines results of a number of
inner distance metrics, learned by a number of inner distance learners.
|
| DirichletJS |
Jensen-Shannon distance of two unigram language models, smoothed
using Dirichlet prior.
|
| DistanceLearnerFactory |
Creates distance metric learners from string descriptions.
|
| Jaccard |
Jaccard distance implementation.
|
| Jaro |
Jaro distance metric.
|
| JaroTFIDF |
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the Jaro distance metric.
|
| JaroWinkler |
Jaro distance metric, as extended by Winkler.
|
| JaroWinklerTFIDF |
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the JaroWinkler distance metric.
|
| JelinekMercerJS |
Jensen-Shannon distance of two unigram language models, smoothed
using Jelinek-Mercer mixture model.
|
| JensenShannonDistance |
Distance metrics based on Jensen-Shannon distance of two smoothed
unigram language models.
|
| Level2 |
Generic version of Monge & Elkan's "level 2" recursive field
matching.
|
| Level2Jaro |
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
| Level2JaroWinkler |
"Level 2" recursive field matching algorithm, based on Jaro
distance.
|
| Level2Levenstein |
"Level 2" recursive field matching algorithm using Levenstein
distance.
|
| Level2MongeElkan |
Monge & Elkan's "level 2" recursive field matching algorithm.
|
| Levenstein |
Levenstein string distance.
|
| MemoMatrix |
A matrix of doubles, defined recursively by the compute(i,j)
method, that will not be recomputed more than necessary.
|
| Mixture |
Mixture-based distance metric.
|
| MongeElkan |
The match method proposed by Monge and Elkan.
|
| MongeElkanTFIDF |
Soft TFIDF-based distance metric, extended to use "soft" token-matching
with the MongeElkan distance metric.
|
| MultiStringAvgDistance |
StringDistance defined over Strings that are broken into fields,
with distance defined as the average distance between any field.
|
| MultiStringDistance |
Abstract class StringDistance defined over Strings that are broken
into fields.
|
| MultiStringWrapper |
A StringWrapper that stores a version of the string
that has been either (a) split into a number of distinct fields,
or (b) duplicated k times, so that k different StringDistance's
can preprocess it, of (b) both of the above.
|
| NeedlemanWunsch |
Needleman-Wunsch string distance, following Durban et al.
|
| PrintfFormat |
PrintfFormat allows the formatting of an array of
objects embedded within a string.
|
| ScaledLevenstein |
Levenstein string distance.
|
| SmithWaterman |
Smith-Waterman string distance, following Durban et al.
|
| SoftTFIDF |
TFIDF-based distance metric, extended to use "soft" token-matching.
|
| SoftTokenFelligiSunter |
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
| SourcedSoftTFIDF |
TFIDF-based distance metric, extended to use "soft" token-matching.
|
| SourcedTFIDF |
Sourced-based distance metric.
|
| TagLink | |
| TagLink.Candidates | |
| TFIDF |
TFIDF-based distance metric.
|
| TokenFelligiSunter |
Highly simplified model of Felligi-Sunter's method 1,
applied to tokens.
|
| UnsmoothedJS |
Jensen-Shannon distance of two unsmoothed unigram language models.
|
| WinklerRescorer |
Winkler's reweighting scheme for distance metrics.
|
A StringDistance is the basic class
for computing distances. The score() function of this class outputs a
distance measure between its two arguments. The other methods are
there for efficiency, so that preprocessing steps (like tokenization)
can be amortized over multiple comparisons with the same string.
The way that preprocessing steps are saved is by creating a StringWrapper object which contains the
preprocessed string, plus whatever else needs to be cached. To do
this, extend default implementation of StringWrapper.
Almost everything in this package implements StringDistance. The only
(public) exceptions are StringWrapper; PrintfFormat, pilfered from Sun
to make the explanations easier; CharMatchScore, which is a character-based
distance metric; and MemoMatrix, a
utility for defining edit-distance-based methods.
Copyright © 2016. All rights reserved.