Package io.github.jonathanlink
Class PDFLayoutTextStripper
- java.lang.Object
-
- org.apache.pdfbox.contentstream.PDFStreamEngine
-
- org.apache.pdfbox.text.PDFTextStripper
-
- io.github.jonathanlink.PDFLayoutTextStripper
-
public class PDFLayoutTextStripper extends org.apache.pdfbox.text.PDFTextStripperJava doc to be completed- Author:
- Jonathan Link
-
-
Field Summary
Fields Modifier and Type Field Description static booleanDEBUGstatic intOUTPUT_SPACE_CHARACTER_WIDTH_IN_PT
-
Constructor Summary
Constructors Constructor Description PDFLayoutTextStripper()Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected floatcomputeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0)voidprocessPage(org.apache.pdfbox.pdmodel.PDPage page)protected voidshowGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, String arg3, org.apache.pdfbox.util.Vector arg4)protected voidwritePage()-
Methods inherited from class org.apache.pdfbox.text.PDFTextStripper
endArticle, endDocument, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPages, processTextPosition, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator
-
Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
-
-
-
Field Detail
-
DEBUG
public static final boolean DEBUG
- See Also:
- Constant Field Values
-
OUTPUT_SPACE_CHARACTER_WIDTH_IN_PT
public static final int OUTPUT_SPACE_CHARACTER_WIDTH_IN_PT
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PDFLayoutTextStripper
public PDFLayoutTextStripper() throws IOExceptionConstructor- Throws:
IOException
-
-
Method Detail
-
processPage
public void processPage(org.apache.pdfbox.pdmodel.PDPage page) throws IOException- Overrides:
processPagein classorg.apache.pdfbox.text.PDFTextStripper- Parameters:
page- page to parse- Throws:
IOException
-
writePage
protected void writePage() throws IOException- Overrides:
writePagein classorg.apache.pdfbox.text.PDFTextStripper- Throws:
IOException
-
showGlyph
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0, org.apache.pdfbox.pdmodel.font.PDFont arg1, int arg2, String arg3, org.apache.pdfbox.util.Vector arg4) throws IOException- Overrides:
showGlyphin classorg.apache.pdfbox.contentstream.PDFStreamEngine- Throws:
IOException
-
computeFontHeight
protected float computeFontHeight(org.apache.pdfbox.pdmodel.font.PDFont arg0) throws IOException- Throws:
IOException
-
-