Class Levenshtein


  • public class Levenshtein
    extends java.lang.Object
    Uses LevenshteinDistance.apply(CharSequence, CharSequence) to calculate the edit distance between two strings. Provides useful helper methods to traverse a set of strings and select the most similar ones to a given input string.
    Author:
    Michel Kraemer
    • Constructor Summary

      Constructors 
      Constructor Description
      Levenshtein()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T extends java.lang.CharSequence>
      T
      findMinimum​(java.util.Collection<T> ss, java.lang.CharSequence t)
      Searches the given collection of strings and returns the string that has the lowest Levenshtein distance to a given second string t.
      static <T extends java.lang.CharSequence>
      java.util.Collection<T>
      findMinimum​(java.util.Collection<T> ss, java.lang.CharSequence t, int n, int threshold)
      Searches the given collection of strings and returns a collection of at most n strings that have the lowest Levenshtein distance to a given string t.
      static <T extends java.lang.CharSequence>
      java.util.Collection<T>
      findSimilar​(java.util.Collection<T> ss, java.lang.CharSequence t)
      Searches the given collection of strings and returns a collection of strings similar to a given string t.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Levenshtein

        public Levenshtein()
    • Method Detail

      • findMinimum

        public static <T extends java.lang.CharSequence> T findMinimum​(java.util.Collection<T> ss,
                                                                       java.lang.CharSequence t)
        Searches the given collection of strings and returns the string that has the lowest Levenshtein distance to a given second string t. If the collection contains multiple strings with the same distance to t only the first one will be returned.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the second string
        Returns:
        the string with the lowest Levenshtein distance
      • findMinimum

        public static <T extends java.lang.CharSequence> java.util.Collection<T> findMinimum​(java.util.Collection<T> ss,
                                                                                             java.lang.CharSequence t,
                                                                                             int n,
                                                                                             int threshold)
        Searches the given collection of strings and returns a collection of at most n strings that have the lowest Levenshtein distance to a given string t. The returned collection will be sorted according to the distance with the string with the lowest distance at the first position.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the string to compare to
        n - the maximum number of strings to return
        threshold - a threshold for individual item distances. Only items with a distance below this threshold will be included in the result.
        Returns:
        the strings with the lowest Levenshtein distance
      • findSimilar

        public static <T extends java.lang.CharSequence> java.util.Collection<T> findSimilar​(java.util.Collection<T> ss,
                                                                                             java.lang.CharSequence t)
        Searches the given collection of strings and returns a collection of strings similar to a given string t. Uses reasonable default values for human-readable strings. The returned collection will be sorted according to their similarity with the string with the best match at the first position.
        Type Parameters:
        T - the type of the strings in the given collection
        Parameters:
        ss - the collection to search
        t - the string to compare to
        Returns:
        a collection with similar strings