Class PdfVisibleTextStripper

java.lang.Object
org.sejda.sambox.contentstream.PDFStreamEngine
org.sejda.sambox.text.PDFTextStreamEngine
org.sejda.sambox.text.PDFTextStripper
org.sejda.impl.sambox.component.PdfVisibleTextStripper
All Implemented Interfaces:
Closeable, AutoCloseable

public class PdfVisibleTextStripper extends org.sejda.sambox.text.PDFTextStripper implements Closeable
A custom text stripper that extracts only visible text and unload the decoded page stream once used
Author:
Andrea Vacondio
  • Field Summary

    Fields inherited from class org.sejda.sambox.text.PDFTextStripper

    charactersByArticle, document, LINE_SEPARATOR, output
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    protected void
    endPage(org.sejda.sambox.pdmodel.PDPage page)
     
    void
    extract(org.sejda.sambox.pdmodel.PDPage page)
     
    protected void
    processTextPosition(org.sejda.sambox.text.TextPosition text)
     

    Methods inherited from class org.sejda.sambox.text.PDFTextStripper

    endArticle, endDocument, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCharactersByArticle, getCurrentPageNo, getDropThreshold, getEndBookmark, getEndPage, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getStartPage, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPage, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setEndPage, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startDocument, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeText, writeWordSeparator

    Methods inherited from class org.sejda.sambox.text.PDFTextStreamEngine

    computeFontHeight, showGlyph

    Methods inherited from class org.sejda.sambox.contentstream.PDFStreamEngine

    addOperator, addOperatorIfAbsent, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processStream, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

  • Method Details

    • processTextPosition

      protected void processTextPosition(org.sejda.sambox.text.TextPosition text)
      Overrides:
      processTextPosition in class org.sejda.sambox.text.PDFTextStripper
    • extract

      public void extract(org.sejda.sambox.pdmodel.PDPage page) throws TaskIOException
      Throws:
      TaskIOException
    • endPage

      protected void endPage(org.sejda.sambox.pdmodel.PDPage page)
      Overrides:
      endPage in class org.sejda.sambox.text.PDFTextStripper
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable