Class PDFDomTree


  • public class PDFDomTree
    extends PDFBoxTree
    A DOM representation of a PDF file.
    Author:
    burgetr
    • Field Detail

      • defaultStyle

        protected String defaultStyle
        Default style placed in the begining of the resulting document
      • doc

        protected Document doc
        The resulting document representing the PDF file.
      • head

        protected Element head
        The head element of the resulting document.
      • body

        protected Element body
        The body element of the resulting document.
      • title

        protected Element title
        The title element of the resulting document.
      • globalStyle

        protected Element globalStyle
        The global style element of the resulting document.
      • curpage

        protected Element curpage
        The element representing the page currently being created in the resulting document.
      • textcnt

        protected int textcnt
        Text element counter for assigning IDs to the text elements.
      • pagecnt

        protected int pagecnt
        Page counter for assigning IDs to the pages.
    • Method Detail

      • getDocument

        public Document getDocument()
        Obtains the resulting document tree.
        Returns:
        The DOM root element.
      • startDocument

        public void startDocument​(org.apache.pdfbox.pdmodel.PDDocument document)
                           throws IOException
        Overrides:
        startDocument in class org.apache.pdfbox.text.PDFTextStripper
        Throws:
        IOException
      • endDocument

        protected void endDocument​(org.apache.pdfbox.pdmodel.PDDocument document)
                            throws IOException
        Overrides:
        endDocument in class org.apache.pdfbox.text.PDFTextStripper
        Throws:
        IOException
      • writeText

        public void writeText​(org.apache.pdfbox.pdmodel.PDDocument doc,
                              Writer outputStream)
                       throws IOException
        Parses a PDF document and serializes the resulting DOM tree to an output. This requires a DOM Level 3 capable implementation to be available.
        Overrides:
        writeText in class org.apache.pdfbox.text.PDFTextStripper
        Throws:
        IOException
      • createDOM

        public Document createDOM​(org.apache.pdfbox.pdmodel.PDDocument doc)
                           throws IOException
        Loads a PDF document and creates a DOM tree from it.
        Parameters:
        doc - the source document
        Returns:
        a DOM Document representing the DOM tree
        Throws:
        IOException
      • startNewPage

        protected void startNewPage()
        Description copied from class: PDFBoxTree
        Adds a new page to the resulting document and makes it a current (active) page.
        Specified by:
        startNewPage in class PDFBoxTree
      • renderText

        protected void renderText​(String data,
                                  TextMetrics metrics)
        Description copied from class: PDFBoxTree
        Creates a new text box in the current page. The style and position of the text are contained in the PDFBoxTree.curstyle property.
        Specified by:
        renderText in class PDFBoxTree
        Parameters:
        data - The text contents.
      • renderPath

        protected void renderPath​(List<PathSegment> path,
                                  boolean stroke,
                                  boolean fill)
                           throws IOException
        Description copied from class: PDFBoxTree
        Adds a rectangle to the current page on the specified position.
        Specified by:
        renderPath in class PDFBoxTree
        stroke - should there be a stroke around?
        fill - should the rectangle be filled?
        Throws:
        IOException
      • renderImage

        protected void renderImage​(float x,
                                   float y,
                                   float width,
                                   float height,
                                   ImageResource resource)
                            throws IOException
        Description copied from class: PDFBoxTree
        Adds an image to the current page.
        Specified by:
        renderImage in class PDFBoxTree
        Parameters:
        x - the X coordinate of the image
        y - the Y coordinate of the image
        width - the width coordinate of the image
        height - the height coordinate of the image
        resource - the image data depending on the specified type
        Throws:
        IOException
      • createPageElement

        protected Element createPageElement()
        Creates an element that represents a single page.
        Returns:
        the resulting DOM element
      • createTextElement

        protected Element createTextElement​(float width)
        Creates an element that represents a single positioned box with no content.
        Returns:
        the resulting DOM element
      • createTextElement

        protected Element createTextElement​(String data,
                                            float width)
        Creates an element that represents a single positioned box containing the specified text string.
        Parameters:
        data - the text string to be contained in the created box.
        Returns:
        the resulting DOM element
      • createRectangleElement

        protected Element createRectangleElement​(float x,
                                                 float y,
                                                 float width,
                                                 float height,
                                                 boolean stroke,
                                                 boolean fill)
        Creates an element that represents a rectangle drawn at the specified coordinates in the page.
        Parameters:
        x - the X coordinate of the rectangle
        y - the Y coordinate of the rectangle
        width - the width of the rectangle
        height - the height of the rectangle
        stroke - should there be a stroke around?
        fill - should the rectangle be filled?
        Returns:
        the resulting DOM element
      • createLineElement

        protected Element createLineElement​(float x1,
                                            float y1,
                                            float x2,
                                            float y2)
        Create an element that represents a horizntal or vertical line.
        Parameters:
        x1 -
        y1 -
        x2 -
        y2 -
        Returns:
        the created DOM element
      • createImageElement

        protected Element createImageElement​(float x,
                                             float y,
                                             float width,
                                             float height,
                                             ImageResource resource)
                                      throws IOException
        Creates an element that represents an image drawn at the specified coordinates in the page.
        Parameters:
        x - the X coordinate of the image
        y - the Y coordinate of the image
        width - the width coordinate of the image
        height - the height coordinate of the image
        type - the image type: "png" or "jpeg"
        resource - the image data depending on the specified type
        Returns:
        Throws:
        IOException
      • createGlobalStyle

        protected String createGlobalStyle()
        Generate the global CSS style for the whole document.
        Returns:
        the CSS code used in the generated document header
      • updateFontTable

        protected void updateFontTable()
        Description copied from class: PDFBoxTree
        Updates the font table by adding new fonts used at the current page.
        Overrides:
        updateFontTable in class PDFBoxTree
      • createFontFaces

        protected String createFontFaces()
      • showGlyph

        protected void showGlyph​(org.apache.pdfbox.util.Matrix arg0,
                                 org.apache.pdfbox.pdmodel.font.PDFont arg1,
                                 int arg2,
                                 String arg3,
                                 org.apache.pdfbox.util.Vector arg4)
                          throws IOException
        Overrides:
        showGlyph in class org.apache.pdfbox.contentstream.PDFStreamEngine
        Throws:
        IOException
      • computeFontHeight

        protected float computeFontHeight​(org.apache.pdfbox.pdmodel.font.PDFont arg0)
                                   throws IOException
        Throws:
        IOException