Class PDFTextStripperByArea


  • public class PDFTextStripperByArea
    extends PDFTextStripper
    This will extract text from a specified region in the PDF.
    • Constructor Detail

      • PDFTextStripperByArea

        public PDFTextStripperByArea()
                              throws IOException
        Constructor.
        Throws:
        IOException - If there is an error loading properties.
    • Method Detail

      • setShouldSeparateByBeads

        public void setShouldSeparateByBeads​(boolean aShouldSeparateByBeads)
        This method does nothing in this derived class, because beads and regions are incompatible. Beads are ignored when stripping by area.
        Overrides:
        setShouldSeparateByBeads in class PDFTextStripper
        Parameters:
        aShouldSeparateByBeads - The new grouping of beads.
      • addRegion

        public void addRegion​(String regionName,
                              android.graphics.RectF rect)
        Add a new region to group text by.
        Parameters:
        regionName - The name of the region.
        rect - The rectangle area to retrieve the text from.
      • getRegions

        public List<String> getRegions()
        Get the list of regions that have been setup.
        Returns:
        A list of java.lang.String objects to identify the region names.
      • getTextForRegion

        public String getTextForRegion​(String regionName)
        Get the text for the region, this should be called after extractRegions().
        Parameters:
        regionName - The name of the region to get the text from.
        Returns:
        The text that was identified in that region.
      • extractRegions

        public void extractRegions​(PDPage page)
                            throws IOException
        Process the page to extract the region text.
        Parameters:
        page - The page to extract the regions from.
        Throws:
        IOException - If there is an error while extracting text.
      • processTextPosition

        protected void processTextPosition​(TextPosition text)
        This will process a TextPosition object and add the text to the list of characters on a page. It takes care of overlapping text.
        Overrides:
        processTextPosition in class PDFTextStripper
        Parameters:
        text - The text to process.
      • writePage

        protected void writePage()
                          throws IOException
        This will print the processed page text to the output stream.
        Overrides:
        writePage in class PDFTextStripper
        Throws:
        IOException - If there is an error writing the text.
      • showText

        protected void showText​(byte[] string)
                         throws IOException
        Description copied from class: PDFStreamEngine
        Process text from the PDF Stream. You should override this method if you want to perform an action when encoded text is being processed.
        Overrides:
        showText in class PDFStreamEngine
        Parameters:
        string - the encoded text
        Throws:
        IOException - if there is an error processing the string
      • showGlyph

        protected void showGlyph​(Matrix textRenderingMatrix,
                                 PDFont font,
                                 int code,
                                 String unicode,
                                 Vector displacement)
                          throws IOException
        This method was originally written by Ben Litchfield for PDFStreamEngine.
        Overrides:
        showGlyph in class PDFStreamEngine
        Parameters:
        textRenderingMatrix - the current text rendering matrix, Trm
        font - the current font
        code - internal PDF character code for the glyph
        unicode - the Unicode text for this glyph, or null if the PDF does provide it
        displacement - the displacement (i.e. advance) of the glyph in text space
        Throws:
        IOException - if the glyph cannot be processed