Package org.apache.tika.language
Class LanguageIdentifier
java.lang.Object
org.apache.tika.language.LanguageIdentifier
Deprecated.
Identifier of the language that best matches a given content profile.
The content profile is compared to generic language profiles based on
material from various sources.
- Since:
- Apache Tika 0.5
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionLanguageIdentifier(String content) Deprecated.Constructs a language identifier based on a String of text contentLanguageIdentifier(LanguageProfile profile) Deprecated.Constructs a language identifier based on a LanguageProfile -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaddProfile(String language, LanguageProfile profile) Deprecated.Adds a single language profilestatic voidDeprecated.Clears the current map of language profilesstatic StringDeprecated.Returns a string of error messages related to initializing language profilesDeprecated.Gets the identified languageDeprecated.Returns what languages are supported for language identificationstatic booleanDeprecated.Tests whether there were errors initializing language configstatic voidDeprecated.Builds the language profiles.static voidinitProfiles(Map<String, LanguageProfile> profilesMap) Deprecated.Initializes the language profiles from a user supplied initialized Map.booleanDeprecated.Tries to judge whether the identification is certain enough to be trusted.toString()Deprecated.
-
Constructor Details
-
LanguageIdentifier
Deprecated.Constructs a language identifier based on a LanguageProfile- Parameters:
profile- the language profile
-
LanguageIdentifier
Deprecated.Constructs a language identifier based on a String of text content- Parameters:
content- the text
-
-
Method Details
-
addProfile
Deprecated.Adds a single language profile- Parameters:
language- an ISO 639 code representing languageprofile- the language profile
-
getLanguage
Deprecated.Gets the identified language- Returns:
- an ISO 639 code representing the detected language
-
isReasonablyCertain
public boolean isReasonablyCertain()Deprecated.Tries to judge whether the identification is certain enough to be trusted. WARNING: Will never return true for small amount of input texts.- Returns:
trueif the distance is smaller then 0.022,falseotherwise
-
initProfiles
public static void initProfiles()Deprecated.Builds the language profiles. The list of languages are fetched from a property file named "tika.language.properties" If a file called "tika.language.override.properties" is found on classpath, this is used instead The property file contains a key "languages" with values being comma-separated language codes -
initProfiles
Deprecated.Initializes the language profiles from a user supplied initialized Map. This overrides the default set of profiles initialized at startup, and provides an alternative to configuring profiles through property file- Parameters:
profilesMap- map of language profiles
-
clearProfiles
public static void clearProfiles()Deprecated.Clears the current map of language profiles -
hasErrors
public static boolean hasErrors()Deprecated.Tests whether there were errors initializing language config- Returns:
- true if there are errors. Use getErrors() to retrieve.
-
getErrors
Deprecated.Returns a string of error messages related to initializing language profiles- Returns:
- the String containing the error messages
-
getSupportedLanguages
Deprecated.Returns what languages are supported for language identification- Returns:
- A set of Strings being the ISO 639 language codes
-
toString
Deprecated.
-
LanguageDetector