Package jodd.lagarto.dom
Class LagartoDOMBuilderTagVisitor
- java.lang.Object
-
- jodd.lagarto.dom.LagartoDOMBuilderTagVisitor
-
- All Implemented Interfaces:
TagVisitor
public class LagartoDOMBuilderTagVisitor extends java.lang.Object implements TagVisitor
Lagarto tag visitor that builds a DOM tree. It (still) does not build the tree fully by the HTML specs, however, it works good enough for any sane HTML out there. In the default mode, the tree builder does not change the order of the elements, so the returned tree reflects the input. So if the input contains crazy stuff, the tree will be weird, too :)
-
-
Field Summary
Fields Modifier and Type Field Description protected LagartoDOMBuilderdomBuilderprotected booleanenabledWhile enabled, nodes will be added to the DOM tree.protected HtmlCCommentExpressionMatcherhtmlCCommentExpressionMatcherprotected HtmlVoidRuleshtmlVoidRulesprotected HtmlImplicitClosingRulesimplRulesprotected NodeparentNodeprotected DocumentrootNode
-
Constructor Summary
Constructors Constructor Description LagartoDOMBuilderTagVisitor(LagartoDOMBuilder domBuilder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcdata(java.lang.CharSequence cdata)Invoked on CDATA sequence.voidcomment(java.lang.CharSequence comment)Invoked on comment.voidcondComment(java.lang.CharSequence expression, boolean isStartingTag, boolean isHidden, boolean isHiddenEndTag)Invoked on IE conditional comment.protected ElementcreateElementNode(Tag tag)Creates new element with correct configuration.voiddoctype(Doctype doctype)Invoked on DOCTYPE directive.voidend()Finishes the tree building.voiderror(java.lang.String message)Actually collects and logs the errors messages.protected booleanerrorEnabled()Returnstrueif error logging or collecting is enabled.protected NodefindMatchingParentOpenTag(java.lang.String tagName)Finds matching parent open tag ornullif not found.protected voidfixUnclosedTagsUpToMatchingParent(Tag tag, Node matchingParent)Fixes all unclosed tags up to matching parent.DocumentgetDocument()Returns rootdocumentnode of parsed DOM tree.protected voidremoveLastChildNodeIfEmptyText(Node parentNode, boolean closedTag)Removes last child node if contains just empty text.voidscript(Tag tag, java.lang.CharSequence body)Invoked on script tag.voidstart()Starts with DOM building.voidtag(Tag tag)Visits tags.voidtext(java.lang.CharSequence text)Invoked on text i.e.voidxml(java.lang.CharSequence version, java.lang.CharSequence encoding, java.lang.CharSequence standalone)Invoked on xml declaration.
-
-
-
Field Detail
-
domBuilder
protected final LagartoDOMBuilder domBuilder
-
implRules
protected final HtmlImplicitClosingRules implRules
-
htmlVoidRules
protected HtmlVoidRules htmlVoidRules
-
rootNode
protected Document rootNode
-
parentNode
protected Node parentNode
-
enabled
protected boolean enabled
While enabled, nodes will be added to the DOM tree. Useful for skipping some tags.
-
htmlCCommentExpressionMatcher
protected HtmlCCommentExpressionMatcher htmlCCommentExpressionMatcher
-
-
Constructor Detail
-
LagartoDOMBuilderTagVisitor
public LagartoDOMBuilderTagVisitor(LagartoDOMBuilder domBuilder)
-
-
Method Detail
-
start
public void start()
Starts with DOM building. Creates rootDocumentnode.- Specified by:
startin interfaceTagVisitor
-
end
public void end()
Finishes the tree building. Closes unclosed tags.- Specified by:
endin interfaceTagVisitor
-
createElementNode
protected Element createElementNode(Tag tag)
Creates new element with correct configuration.
-
tag
public void tag(Tag tag)
Visits tags.- Specified by:
tagin interfaceTagVisitor
-
removeLastChildNodeIfEmptyText
protected void removeLastChildNodeIfEmptyText(Node parentNode, boolean closedTag)
Removes last child node if contains just empty text.
-
findMatchingParentOpenTag
protected Node findMatchingParentOpenTag(java.lang.String tagName)
Finds matching parent open tag ornullif not found.
-
fixUnclosedTagsUpToMatchingParent
protected void fixUnclosedTagsUpToMatchingParent(Tag tag, Node matchingParent)
Fixes all unclosed tags up to matching parent. Missing end tags will be added just before parent tag is closed, making the whole inner content as its tag body.Tags that can be closed implicitly are checked and closed.
There is optional check for detecting orphan tags inside the table or lists. If set, tags can be closed beyond the border of the table and the list and it is reported as orphan tag.
This is just a generic solutions, closest to the rules.
-
script
public void script(Tag tag, java.lang.CharSequence body)
Description copied from interface:TagVisitorInvoked on script tag.- Specified by:
scriptin interfaceTagVisitor
-
comment
public void comment(java.lang.CharSequence comment)
Description copied from interface:TagVisitorInvoked on comment.- Specified by:
commentin interfaceTagVisitor
-
text
public void text(java.lang.CharSequence text)
Description copied from interface:TagVisitorInvoked on text i.e. anything other than a tag.- Specified by:
textin interfaceTagVisitor
-
cdata
public void cdata(java.lang.CharSequence cdata)
Description copied from interface:TagVisitorInvoked on CDATA sequence.- Specified by:
cdatain interfaceTagVisitor
-
xml
public void xml(java.lang.CharSequence version, java.lang.CharSequence encoding, java.lang.CharSequence standalone)Description copied from interface:TagVisitorInvoked on xml declaration.- Specified by:
xmlin interfaceTagVisitor
-
doctype
public void doctype(Doctype doctype)
Description copied from interface:TagVisitorInvoked on DOCTYPE directive.- Specified by:
doctypein interfaceTagVisitor
-
condComment
public void condComment(java.lang.CharSequence expression, boolean isStartingTag, boolean isHidden, boolean isHiddenEndTag)Description copied from interface:TagVisitorInvoked on IE conditional comment. By default, the parser does not process the conditional comments, so you need to turn them on. Once conditional comments are enabled, this even will be fired.The following conditional comments are recognized:
<!--[if IE 6]>one<![endif]--> <!--[if IE 6]><!-->two<!---<![endif]--> <!--[if IE 6]>three<!--xx<![endif]--> <![if IE 6]>four<![endif]>- Specified by:
condCommentin interfaceTagVisitor
-
errorEnabled
protected boolean errorEnabled()
Returnstrueif error logging or collecting is enabled.
-
error
public void error(java.lang.String message)
Actually collects and logs the errors messages.- Specified by:
errorin interfaceTagVisitor- Parameters:
message- parsing error message
-
-