Class OptimaizeLangDetector

java.lang.Object
org.apache.tika.language.detect.LanguageDetector
org.apache.tika.langdetect.optimaize.OptimaizeLangDetector

public class OptimaizeLangDetector extends org.apache.tika.language.detect.LanguageDetector
Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector
  • Field Details

    • DEFAULT_MAX_CHARS_FOR_DETECTION

      public static final int DEFAULT_MAX_CHARS_FOR_DETECTION
      See Also:
    • DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION

      public static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
      See Also:
  • Constructor Details

    • OptimaizeLangDetector

      public OptimaizeLangDetector()
    • OptimaizeLangDetector

      public OptimaizeLangDetector(int maxCharsForDetection)
  • Method Details

    • loadModels

      public org.apache.tika.language.detect.LanguageDetector loadModels()
      Specified by:
      loadModels in class org.apache.tika.language.detect.LanguageDetector
    • loadModels

      public org.apache.tika.language.detect.LanguageDetector loadModels(Set<String> languages) throws IOException
      Specified by:
      loadModels in class org.apache.tika.language.detect.LanguageDetector
      Throws:
      IOException
    • hasModel

      public boolean hasModel(String language)
      Specified by:
      hasModel in class org.apache.tika.language.detect.LanguageDetector
    • setPriors

      public org.apache.tika.language.detect.LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
      Specified by:
      setPriors in class org.apache.tika.language.detect.LanguageDetector
      Throws:
      IOException
    • reset

      public void reset()
      Specified by:
      reset in class org.apache.tika.language.detect.LanguageDetector
    • addText

      public void addText(char[] cbuf, int off, int len)
      Specified by:
      addText in class org.apache.tika.language.detect.LanguageDetector
    • detectAll

      public List<org.apache.tika.language.detect.LanguageResult> detectAll()
      Specified by:
      detectAll in class org.apache.tika.language.detect.LanguageDetector
      Returns:
      the detected list of languages
      Throws:
      IllegalStateException - if no models have been loaded with loadModels() or loadModels(java.util.Set)
    • hasEnoughText

      public boolean hasEnoughText()
      Overrides:
      hasEnoughText in class org.apache.tika.language.detect.LanguageDetector