Package de.undercouch.citeproc.helper
Class Levenshtein
- java.lang.Object
-
- de.undercouch.citeproc.helper.Levenshtein
-
public class Levenshtein extends java.lang.ObjectUsesLevenshteinDistance.apply(CharSequence, CharSequence)to calculate the edit distance between two strings. Provides useful helper methods to traverse a set of strings and select the most similar ones to a given input string.- Author:
- Michel Kraemer
-
-
Constructor Summary
Constructors Constructor Description Levenshtein()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T extends java.lang.CharSequence>
TfindMinimum(java.util.Collection<T> ss, java.lang.CharSequence t)Searches the given collection of strings and returns the string that has the lowest Levenshtein distance to a given second stringt.static <T extends java.lang.CharSequence>
java.util.Collection<T>findMinimum(java.util.Collection<T> ss, java.lang.CharSequence t, int n, int threshold)Searches the given collection of strings and returns a collection of at mostnstrings that have the lowest Levenshtein distance to a given stringt.static <T extends java.lang.CharSequence>
java.util.Collection<T>findSimilar(java.util.Collection<T> ss, java.lang.CharSequence t)Searches the given collection of strings and returns a collection of strings similar to a given stringt.
-
-
-
Method Detail
-
findMinimum
public static <T extends java.lang.CharSequence> T findMinimum(java.util.Collection<T> ss, java.lang.CharSequence t)Searches the given collection of strings and returns the string that has the lowest Levenshtein distance to a given second stringt. If the collection contains multiple strings with the same distance totonly the first one will be returned.- Type Parameters:
T- the type of the strings in the given collection- Parameters:
ss- the collection to searcht- the second string- Returns:
- the string with the lowest Levenshtein distance
-
findMinimum
public static <T extends java.lang.CharSequence> java.util.Collection<T> findMinimum(java.util.Collection<T> ss, java.lang.CharSequence t, int n, int threshold)Searches the given collection of strings and returns a collection of at mostnstrings that have the lowest Levenshtein distance to a given stringt. The returned collection will be sorted according to the distance with the string with the lowest distance at the first position.- Type Parameters:
T- the type of the strings in the given collection- Parameters:
ss- the collection to searcht- the string to compare ton- the maximum number of strings to returnthreshold- a threshold for individual item distances. Only items with a distance below this threshold will be included in the result.- Returns:
- the strings with the lowest Levenshtein distance
-
findSimilar
public static <T extends java.lang.CharSequence> java.util.Collection<T> findSimilar(java.util.Collection<T> ss, java.lang.CharSequence t)Searches the given collection of strings and returns a collection of strings similar to a given stringt. Uses reasonable default values for human-readable strings. The returned collection will be sorted according to their similarity with the string with the best match at the first position.- Type Parameters:
T- the type of the strings in the given collection- Parameters:
ss- the collection to searcht- the string to compare to- Returns:
- a collection with similar strings
-
-