java.lang.Object
org.sejda.impl.sambox.component.PdfTextExtractorByArea
Stateless component responsible for extracting text from a given area of a document page
- Author:
- Andrea Vacondio
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionextractAddedText(org.sejda.sambox.pdmodel.PDPage page, Point2D position) extractFooterText(org.sejda.sambox.pdmodel.PDPage page) extractHeaderText(org.sejda.sambox.pdmodel.PDPage page) extractTextFromArea(org.sejda.sambox.pdmodel.PDPage page, Rectangle2D area) Extracts the text found in a specific page bound to a specific rectangle area Eg: extract footer text from a certain pageextractTextFromAreas(org.sejda.sambox.pdmodel.PDPage page, List<Rectangle> areas)
-
Constructor Details
-
PdfTextExtractorByArea
public PdfTextExtractorByArea()
-
-
Method Details
-
extractHeaderText
- Throws:
TaskIOException
-
extractAddedText
public String extractAddedText(org.sejda.sambox.pdmodel.PDPage page, Point2D position) throws TaskIOException - Throws:
TaskIOException
-
extractTextFromArea
public String extractTextFromArea(org.sejda.sambox.pdmodel.PDPage page, Rectangle2D area) throws TaskIOException Extracts the text found in a specific page bound to a specific rectangle area Eg: extract footer text from a certain page- Parameters:
page- the page to extract the text fromarea- the rectangular area to extract- Returns:
- the extracted text
- Throws:
TaskIOException
-
extractTextFromAreas
public List<String> extractTextFromAreas(org.sejda.sambox.pdmodel.PDPage page, List<Rectangle> areas) throws TaskIOException - Throws:
TaskIOException