Class LagartoDOMBuilderTagVisitor

  • All Implemented Interfaces:
    TagVisitor

    public class LagartoDOMBuilderTagVisitor
    extends java.lang.Object
    implements TagVisitor
    Lagarto tag visitor that builds a DOM tree. It (still) does not build the tree fully by the HTML specs, however, it works good enough for any sane HTML out there. In the default mode, the tree builder does not change the order of the elements, so the returned tree reflects the input. So if the input contains crazy stuff, the tree will be weird, too :)
    • Constructor Detail

      • LagartoDOMBuilderTagVisitor

        public LagartoDOMBuilderTagVisitor​(LagartoDOMBuilder domBuilder)
    • Method Detail

      • getDocument

        public Document getDocument()
        Returns root document node of parsed DOM tree.
      • start

        public void start()
        Starts with DOM building. Creates root Document node.
        Specified by:
        start in interface TagVisitor
      • end

        public void end()
        Finishes the tree building. Closes unclosed tags.
        Specified by:
        end in interface TagVisitor
      • createElementNode

        protected Element createElementNode​(Tag tag)
        Creates new element with correct configuration.
      • tag

        public void tag​(Tag tag)
        Visits tags.
        Specified by:
        tag in interface TagVisitor
      • removeLastChildNodeIfEmptyText

        protected void removeLastChildNodeIfEmptyText​(Node parentNode,
                                                      boolean closedTag)
        Removes last child node if contains just empty text.
      • findMatchingParentOpenTag

        protected Node findMatchingParentOpenTag​(java.lang.String tagName)
        Finds matching parent open tag or null if not found.
      • fixUnclosedTagsUpToMatchingParent

        protected void fixUnclosedTagsUpToMatchingParent​(Tag tag,
                                                         Node matchingParent)
        Fixes all unclosed tags up to matching parent. Missing end tags will be added just before parent tag is closed, making the whole inner content as its tag body.

        Tags that can be closed implicitly are checked and closed.

        There is optional check for detecting orphan tags inside the table or lists. If set, tags can be closed beyond the border of the table and the list and it is reported as orphan tag.

        This is just a generic solutions, closest to the rules.

      • script

        public void script​(Tag tag,
                           java.lang.CharSequence body)
        Description copied from interface: TagVisitor
        Invoked on script tag.
        Specified by:
        script in interface TagVisitor
      • comment

        public void comment​(java.lang.CharSequence comment)
        Description copied from interface: TagVisitor
        Invoked on comment.
        Specified by:
        comment in interface TagVisitor
      • text

        public void text​(java.lang.CharSequence text)
        Description copied from interface: TagVisitor
        Invoked on text i.e. anything other than a tag.
        Specified by:
        text in interface TagVisitor
      • cdata

        public void cdata​(java.lang.CharSequence cdata)
        Description copied from interface: TagVisitor
        Invoked on CDATA sequence.
        Specified by:
        cdata in interface TagVisitor
      • xml

        public void xml​(java.lang.CharSequence version,
                        java.lang.CharSequence encoding,
                        java.lang.CharSequence standalone)
        Description copied from interface: TagVisitor
        Invoked on xml declaration.
        Specified by:
        xml in interface TagVisitor
      • doctype

        public void doctype​(Doctype doctype)
        Description copied from interface: TagVisitor
        Invoked on DOCTYPE directive.
        Specified by:
        doctype in interface TagVisitor
      • condComment

        public void condComment​(java.lang.CharSequence expression,
                                boolean isStartingTag,
                                boolean isHidden,
                                boolean isHiddenEndTag)
        Description copied from interface: TagVisitor
        Invoked on IE conditional comment. By default, the parser does not process the conditional comments, so you need to turn them on. Once conditional comments are enabled, this even will be fired.

        The following conditional comments are recognized: <!--[if IE 6]>one<![endif]--> <!--[if IE 6]><!-->two<!---<![endif]--> <!--[if IE 6]>three<!--xx<![endif]--> <![if IE 6]>four<![endif]>

        Specified by:
        condComment in interface TagVisitor
      • errorEnabled

        protected boolean errorEnabled()
        Returns true if error logging or collecting is enabled.
      • error

        public void error​(java.lang.String message)
        Actually collects and logs the errors messages.
        Specified by:
        error in interface TagVisitor
        Parameters:
        message - parsing error message