java.lang.Object
org.sejda.sambox.contentstream.PDFStreamEngine
org.sejda.sambox.text.PDFTextStreamEngine
org.sejda.sambox.text.PDFTextStripper
org.sejda.impl.sambox.component.PdfVisibleTextStripper
- All Implemented Interfaces:
Closeable,AutoCloseable
public class PdfVisibleTextStripper
extends org.sejda.sambox.text.PDFTextStripper
implements Closeable
A custom text stripper that extracts only visible text and unload the decoded page stream once used
- Author:
- Andrea Vacondio
-
Field Summary
Fields inherited from class org.sejda.sambox.text.PDFTextStripper
charactersByArticle, document, LINE_SEPARATOR, output -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()protected voidendPage(org.sejda.sambox.pdmodel.PDPage page) voidextract(org.sejda.sambox.pdmodel.PDPage page) protected voidprocessTextPosition(org.sejda.sambox.text.TextPosition text) Methods inherited from class org.sejda.sambox.text.PDFTextStripper
endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparatorMethods inherited from class org.sejda.sambox.text.PDFTextStreamEngine
computeFontHeight, showGlyphMethods inherited from class org.sejda.sambox.contentstream.PDFStreamEngine
addOperator, addOperatorIfAbsent, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processStream, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Constructor Details
-
PdfVisibleTextStripper
- Throws:
IOException
-
-
Method Details
-
processTextPosition
protected void processTextPosition(org.sejda.sambox.text.TextPosition text) - Overrides:
processTextPositionin classorg.sejda.sambox.text.PDFTextStripper
-
extract
- Throws:
TaskIOException
-
endPage
protected void endPage(org.sejda.sambox.pdmodel.PDPage page) - Overrides:
endPagein classorg.sejda.sambox.text.PDFTextStripper
-
close
public void close()- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-