Class PdfTextExtractorByArea

java.lang.Object
org.sejda.impl.sambox.component.PdfTextExtractorByArea

public class PdfTextExtractorByArea extends Object
Stateless component responsible for extracting text from a given area of a document page
Author:
Andrea Vacondio
  • Constructor Details

    • PdfTextExtractorByArea

      public PdfTextExtractorByArea()
  • Method Details

    • extractFooterText

      public String extractFooterText(org.sejda.sambox.pdmodel.PDPage page) throws TaskIOException
      Parameters:
      page -
      Returns:
      the extracted text from the footer of the document, assuming a footer height of 50
      Throws:
      TaskIOException
    • extractHeaderText

      public String extractHeaderText(org.sejda.sambox.pdmodel.PDPage page) throws TaskIOException
      Throws:
      TaskIOException
    • extractAddedText

      public String extractAddedText(org.sejda.sambox.pdmodel.PDPage page, Point2D position) throws TaskIOException
      Throws:
      TaskIOException
    • extractTextFromArea

      public String extractTextFromArea(org.sejda.sambox.pdmodel.PDPage page, Rectangle2D area) throws TaskIOException
      Extracts the text found in a specific page bound to a specific rectangle area Eg: extract footer text from a certain page
      Parameters:
      page - the page to extract the text from
      area - the rectangular area to extract
      Returns:
      the extracted text
      Throws:
      TaskIOException
    • extractTextFromAreas

      public List<String> extractTextFromAreas(org.sejda.sambox.pdmodel.PDPage page, List<Rectangle> areas) throws TaskIOException
      Throws:
      TaskIOException