public class PDFDomTree extends PDFBoxTree
| Modifier and Type | Class and Description |
|---|---|
protected class |
PDFDomTree.HtmlDivLine
Maps input line to an HTML div rectangle, since HTML does not support standard lines
|
| Modifier and Type | Field and Description |
|---|---|
protected Element |
body
The body element of the resulting document.
|
protected PDFDomTreeConfig |
config |
protected Element |
curpage
The element representing the page currently being created in the resulting document.
|
protected String |
defaultStyle
Default style placed in the begining of the resulting document
|
protected Document |
doc
The resulting document representing the PDF file.
|
protected Element |
globalStyle
The global style element of the resulting document.
|
protected Element |
head
The head element of the resulting document.
|
protected int |
pagecnt
Page counter for assigning IDs to the pages.
|
protected int |
textcnt
Text element counter for assigning IDs to the text elements.
|
protected Element |
title
The title element of the resulting document.
|
cssFontFamily, cssFontStyle, cssFontWeight, cur_x, cur_y, curstyle, disableGraphics, disableImageData, disableImages, endPage, fontTable, graphicsPath, lastDia, lastText, path_start_x, path_start_y, path_x, path_y, pdFontType, pdpage, startPage, style, textLine, textMetrics, UNIT| Constructor and Description |
|---|
PDFDomTree()
Creates a new PDF DOM parser.
|
PDFDomTree(PDFDomTreeConfig config)
Creates a new PDF DOM parser.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
createDocument()
Creates a new empty HTML document tree.
|
Document |
createDOM(org.apache.pdfbox.pdmodel.PDDocument doc)
Loads a PDF document and creates a DOM tree from it.
|
protected String |
createFontFaces() |
protected String |
createGlobalStyle()
Generate the global CSS style for the whole document.
|
protected Element |
createImageElement(float x,
float y,
float width,
float height,
ImageResource resource)
Creates an element that represents an image drawn at the specified coordinates in the page.
|
protected Element |
createLineElement(float x1,
float y1,
float x2,
float y2)
Create an element that represents a horizntal or vertical line.
|
protected Element |
createPageElement()
Creates an element that represents a single page.
|
protected Element |
createPathImage(List<PathSegment> path) |
protected Element |
createRectangleElement(float x,
float y,
float width,
float height,
boolean stroke,
boolean fill)
Creates an element that represents a rectangle drawn at the specified coordinates in the page.
|
protected Element |
createTextElement(float width)
Creates an element that represents a single positioned box with no content.
|
protected Element |
createTextElement(String data,
float width)
Creates an element that represents a single positioned box containing the specified text string.
|
protected void |
endDocument(org.apache.pdfbox.pdmodel.PDDocument document) |
Document |
getDocument()
Obtains the resulting document tree.
|
protected void |
renderImage(float x,
float y,
float width,
float height,
ImageResource resource)
Adds an image to the current page.
|
protected void |
renderPath(List<PathSegment> path,
boolean stroke,
boolean fill)
Adds a rectangle to the current page on the specified position.
|
protected void |
renderText(String data,
TextMetrics metrics)
Creates a new text box in the current page.
|
protected void |
showGlyph(org.apache.pdfbox.util.Matrix arg0,
org.apache.pdfbox.pdmodel.font.PDFont arg1,
int arg2,
String arg3,
org.apache.pdfbox.util.Vector arg4) |
void |
startDocument(org.apache.pdfbox.pdmodel.PDDocument document) |
protected void |
startNewPage()
Adds a new page to the resulting document and makes it a current (active) page.
|
protected void |
updateFontTable()
Updates the font table by adding new fonts used at the current page.
|
void |
writeText(org.apache.pdfbox.pdmodel.PDDocument doc,
Writer outputStream)
Parses a PDF document and serializes the resulting DOM tree to an output.
|
colorString, colorString, colorString, createCurrentPageTransformation, finishBox, floatValue, getCurrentMediaBox, getDisableGraphics, getDisableImageData, getDisableImages, getEndPage, getLength, getStartPage, getTextDirectionality, getTextDirectionality, getTitle, intValue, isReversed, processImageOperation, processOperator, processPage, processTextPosition, setDisableGraphics, setDisableImageData, setDisableImages, setEndPage, setStartPage, stringValue, toRectangle, transformLength, transformPosition, updateStyleendArticle, endPage, getAddMoreFormatting, getArticleEnd, getArticleStart, getAverageCharTolerance, getCurrentPageNo, getDropThreshold, getEndBookmark, getCharactersByArticle, getIndentThreshold, getLineSeparator, getListItemPatterns, getOutput, getPageEnd, getPageStart, getParagraphEnd, getParagraphStart, getSeparateByBeads, getSortByPosition, getSpacingTolerance, getStartBookmark, getSuppressDuplicateOverlappingText, getText, getWordSeparator, matchPattern, processPages, setAddMoreFormatting, setArticleEnd, setArticleStart, setAverageCharTolerance, setDropThreshold, setEndBookmark, setIndentThreshold, setLineSeparator, setListItemPatterns, setPageEnd, setPageStart, setParagraphEnd, setParagraphStart, setShouldSeparateByBeads, setSortByPosition, setSpacingTolerance, setStartBookmark, setSuppressDuplicateOverlappingText, setWordSeparator, startArticle, startArticle, startPage, writeCharacters, writeLineSeparator, writePage, writePageEnd, writePageStart, writeParagraphEnd, writeParagraphSeparator, writeParagraphStart, writeString, writeString, writeWordSeparatoraddOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getResources, getTextLineMatrix, getTextMatrix, operatorException, processAnnotation, processChildStream, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showForm, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, transformedPoint, transformWidth, unsupportedOperatorprotected String defaultStyle
protected Document doc
protected Element head
protected Element body
protected Element title
protected Element globalStyle
protected Element curpage
protected int textcnt
protected int pagecnt
protected PDFDomTreeConfig config
public PDFDomTree()
throws IOException,
ParserConfigurationException
public PDFDomTree(PDFDomTreeConfig config) throws IOException, ParserConfigurationException
protected void createDocument()
throws ParserConfigurationException
ParserConfigurationExceptionpublic Document getDocument()
public void startDocument(org.apache.pdfbox.pdmodel.PDDocument document)
throws IOException
startDocument in class org.apache.pdfbox.text.PDFTextStripperIOExceptionprotected void endDocument(org.apache.pdfbox.pdmodel.PDDocument document)
throws IOException
endDocument in class org.apache.pdfbox.text.PDFTextStripperIOExceptionpublic void writeText(org.apache.pdfbox.pdmodel.PDDocument doc,
Writer outputStream)
throws IOException
writeText in class org.apache.pdfbox.text.PDFTextStripperIOExceptionpublic Document createDOM(org.apache.pdfbox.pdmodel.PDDocument doc) throws IOException
doc - the source documentIOExceptionprotected void startNewPage()
PDFBoxTreestartNewPage in class PDFBoxTreeprotected void renderText(String data, TextMetrics metrics)
PDFBoxTreePDFBoxTree.curstyle property.renderText in class PDFBoxTreedata - The text contents.protected void renderPath(List<PathSegment> path, boolean stroke, boolean fill) throws IOException
PDFBoxTreerenderPath in class PDFBoxTreestroke - should there be a stroke around?fill - should the rectangle be filled?IOExceptionprotected void renderImage(float x,
float y,
float width,
float height,
ImageResource resource)
throws IOException
PDFBoxTreerenderImage in class PDFBoxTreex - the X coordinate of the imagey - the Y coordinate of the imagewidth - the width coordinate of the imageheight - the height coordinate of the imageresource - the image data depending on the specified typeIOExceptionprotected Element createPageElement()
protected Element createTextElement(float width)
protected Element createTextElement(String data, float width)
data - the text string to be contained in the created box.protected Element createRectangleElement(float x, float y, float width, float height, boolean stroke, boolean fill)
x - the X coordinate of the rectangley - the Y coordinate of the rectanglewidth - the width of the rectangleheight - the height of the rectanglestroke - should there be a stroke around?fill - should the rectangle be filled?protected Element createLineElement(float x1, float y1, float x2, float y2)
x1 - y1 - x2 - y2 - protected Element createPathImage(List<PathSegment> path) throws IOException
IOExceptionprotected Element createImageElement(float x, float y, float width, float height, ImageResource resource) throws IOException
x - the X coordinate of the imagey - the Y coordinate of the imagewidth - the width coordinate of the imageheight - the height coordinate of the imagetype - the image type: "png" or "jpeg"resource - the image data depending on the specified typeIOExceptionprotected String createGlobalStyle()
protected void updateFontTable()
PDFBoxTreeupdateFontTable in class PDFBoxTreeprotected String createFontFaces()
protected void showGlyph(org.apache.pdfbox.util.Matrix arg0,
org.apache.pdfbox.pdmodel.font.PDFont arg1,
int arg2,
String arg3,
org.apache.pdfbox.util.Vector arg4)
throws IOException
showGlyph in class org.apache.pdfbox.contentstream.PDFStreamEngineIOExceptionCopyright © 2019. All rights reserved.