Class PDFTextStreamEngine

java.lang.Object
org.sejda.sambox.contentstream.PDFStreamEngine
org.sejda.sambox.text.PDFTextStreamEngine
Direct Known Subclasses:
PDFMarkedContentExtractor, PDFTextStripper

public class PDFTextStreamEngine extends PDFStreamEngine
PDFStreamEngine subclass for advanced processing of text via TextPosition.
Author:
Ben Litchfield, John Hewson
See Also:
  • Constructor Details

  • Method Details

    • processPage

      public void processPage(PDPage page) throws IOException
      This will initialise and process the contents of the stream.
      Overrides:
      processPage in class PDFStreamEngine
      Parameters:
      page - the page to process
      Throws:
      IOException - if there is an error accessing the stream.
    • showGlyph

      protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement) throws IOException
      Called when a glyph is to be processed. The heuristic calculations here were originally written by Ben Litchfield for PDFStreamEngine.
      Overrides:
      showGlyph in class PDFStreamEngine
      Parameters:
      textRenderingMatrix - the current text rendering matrix, Trm
      font - the current font
      code - internal PDF character code for the glyph
      displacement - the displacement (i.e. advance) of the glyph in text space
      Throws:
      IOException - if the glyph cannot be processed
    • computeFontHeight

      protected float computeFontHeight(PDFont font) throws IOException
      Compute the font height. Override this if you want to use own calculations.
      Parameters:
      font - the font.
      Returns:
      the font height.
      Throws:
      IOException - if there is an error while getting the font bounding box.
    • processTextPosition

      protected void processTextPosition(TextPosition text)
      A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.
      Parameters:
      text - The text to be processed.