public class Tesseract4OcrEngineProperties extends OcrEngineProperties
IOcrEngine.| Constructor and Description |
|---|
Tesseract4OcrEngineProperties()
Creates a new
Tesseract4OcrEngineProperties instance. |
Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Creates a new
Tesseract4OcrEngineProperties instance
based on another Tesseract4OcrEngineProperties instance (copy
constructor). |
| Modifier and Type | Method and Description |
|---|---|
String |
getDefaultLanguage()
Gets default language for ocr.
|
String |
getDefaultUserWordsSuffix()
Gets default user words suffix.
|
ImagePreprocessingOptions |
getImagePreprocessingOptions()
|
int |
getMinimalConfidenceLevel()
Gets minimal confidence level for HOCR line to be considered as properly recognized.
|
Integer |
getPageSegMode()
Gets Page Segmentation Mode.
|
File |
getPathToTessData()
Gets path to directory with tess data.
|
TextPositioning |
getTextPositioning()
Defines the way text is retrieved from tesseract output using
TextPositioning. |
boolean |
isPreprocessingImages()
Checks whether image preprocessing is needed.
|
boolean |
isUseTxtToImproveHocrParsing()
|
Tesseract4OcrEngineProperties |
setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
|
Tesseract4OcrEngineProperties |
setMinimalConfidenceLevel(int minimalConfidenceLevel)
Sets minimal confidence level for HOCR line to be considered as properly recognized.
|
Tesseract4OcrEngineProperties |
setPageSegMode(Integer mode)
Sets Page Segmentation Mode.
|
Tesseract4OcrEngineProperties |
setPathToTessData(File tessData)
Sets path to directory with tess data.
|
Tesseract4OcrEngineProperties |
setPreprocessingImages(boolean preprocess)
Sets true if image preprocessing is needed.
|
Tesseract4OcrEngineProperties |
setTextPositioning(TextPositioning positioning)
Defines the way text is retrieved from tesseract output
using
TextPositioning. |
Tesseract4OcrEngineProperties |
setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
|
getLanguages, setLanguagespublic Tesseract4OcrEngineProperties()
Tesseract4OcrEngineProperties instance.public Tesseract4OcrEngineProperties(Tesseract4OcrEngineProperties other)
Tesseract4OcrEngineProperties instance
based on another Tesseract4OcrEngineProperties instance (copy
constructor).other - the other Tesseract4OcrEngineProperties instancepublic final String getDefaultLanguage()
public final String getDefaultUserWordsSuffix()
public final File getPathToTessData()
public final Tesseract4OcrEngineProperties setPathToTessData(File tessData)
tessData - path to train directory as FileTesseract4OcrEngineProperties instancePdfOcrTesseract4Exception - if path to tess data directory is
null or empty, or provided directory does not exist? or it is not
a directorypublic final Integer getPageSegMode()
Integerpublic final Tesseract4OcrEngineProperties setPageSegMode(Integer mode)
mode - psm mode as IntegerTesseract4OcrEngineProperties instancepublic final boolean isPreprocessingImages()
public final Tesseract4OcrEngineProperties setPreprocessingImages(boolean preprocess)
preprocess - true if images need to be preprocessed,
otherwise - falseTesseract4OcrEngineProperties instancepublic final TextPositioning getTextPositioning()
TextPositioning.public final Tesseract4OcrEngineProperties setTextPositioning(TextPositioning positioning)
TextPositioning.positioning - the way text is retrievedTesseract4OcrEngineProperties instancepublic final boolean isUseTxtToImproveHocrParsing()
useTxtToImproveHocrParsing.
Used to make HOCR recognition result more precise.
This is needed for cases of Thai language or some Chinese dialects
where every character is interpreted as a single word.
For more information see https://github.com/tesseract-ocr/tesseract/issues/2702useTxtToImproveHocrParsingpublic final Tesseract4OcrEngineProperties setUseTxtToImproveHocrParsing(boolean useTxtToImproveHocrParsing)
useTxtToImproveHocrParsing.
Used to make HOCR recognition result more precise.
This is needed for cases of Thai language or some Chinese dialects
where every character is interpreted as a single word.
For more information see https://github.com/tesseract-ocr/tesseract/issues/2702useTxtToImproveHocrParsing - useTxtToImproveHocrParsingTesseract4OcrEngineProperties instance.public final ImagePreprocessingOptions getImagePreprocessingOptions()
ImagePreprocessingOptionspublic final Tesseract4OcrEngineProperties setImagePreprocessingOptions(ImagePreprocessingOptions imagePreprocessingOptions)
imagePreprocessingOptions - ImagePreprocessingOptionsTesseract4OcrEngineProperties instancepublic final int getMinimalConfidenceLevel()
public final Tesseract4OcrEngineProperties setMinimalConfidenceLevel(int minimalConfidenceLevel)
minimalConfidenceLevel - minimal confidence level valueTesseract4OcrEngineProperties instanceCopyright © 1998–2024 Apryse Group NV. All rights reserved.