Class PDFLayoutTextStripper


  • public class PDFLayoutTextStripper
    extends org.apache.pdfbox.text.PDFTextStripper
    Java doc to be completed
    Author:
    Jonathan Link
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static boolean DEBUG  
      static int OUTPUT_SPACE_CHARACTER_WIDTH_IN_PT  
      • Fields inherited from class org.apache.pdfbox.text.PDFTextStripper

        charactersByArticle, document, LINE_SEPARATOR, output
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected float computeFontHeight​(org.apache.pdfbox.pdmodel.font.PDFont arg0)  
      void processPage​(org.apache.pdfbox.pdmodel.PDPage page)  
      protected void showGlyph​(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, String arg3, org.apache.pdfbox.util.Vector arg4)  
      protected void writePage()  
      • Methods inherited from class org.apache.pdfbox.text.PDFTextStripper

        endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
      • Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine

        addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
    • Constructor Detail

    • Method Detail

      • processPage

        public void processPage​(org.apache.pdfbox.pdmodel.PDPage page)
                         throws IOException
        Overrides:
        processPage in class org.apache.pdfbox.text.PDFTextStripper
        Parameters:
        page - page to parse
        Throws:
        IOException
      • writePage

        protected void writePage()
                          throws IOException
        Overrides:
        writePage in class org.apache.pdfbox.text.PDFTextStripper
        Throws:
        IOException
      • showGlyph

        protected void showGlyph​(org.apache.pdfbox.util.Matrix arg0,
                                 org.apache.pdfbox.pdmodel.font.PDFont arg1,
                                 int arg2,
                                 String arg3,
                                 org.apache.pdfbox.util.Vector arg4)
                          throws IOException
        Overrides:
        showGlyph in class org.apache.pdfbox.contentstream.PDFStreamEngine
        Throws:
        IOException
      • computeFontHeight

        protected float computeFontHeight​(org.apache.pdfbox.pdmodel.font.PDFont arg0)
                                   throws IOException
        Throws:
        IOException