Package com.helger.html.parser
Class XHTMLParser
- java.lang.Object
-
- com.helger.html.parser.XHTMLParser
-
-
Constructor Summary
Constructors Constructor Description XHTMLParser(EHTMLVersion eHTMLVersion)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static com.helger.xml.serialize.read.SAXReaderSettingscreateDefaultSAXReaderSettings()com.helger.xml.serialize.read.SAXReaderSettingsgetAdditionalSAXReaderSettings()Deprecated, for removal: This API element is subject to removal in a future version.UsegetSAXReaderSettings()insteadEHTMLVersiongetHTMLVersion()com.helger.xml.serialize.read.SAXReaderSettingsgetSAXReaderSettings()booleanisValidXHTMLFragment(String sXHTMLFragment)Check if the given fragment is valid XHTML 1.1 mark-up.static booleanlooksLikeXHTML(String sText)Check whether the passed text looks like it contains XHTML code.com.helger.xml.microdom.IMicroDocumentparseXHTMLDocument(String sXHTML)This method parses a full HTML document into aIMicroDocumentusing the additional SAX reader settings and always theHTMLEntityResolveras an entity resolver.com.helger.xml.microdom.IMicroDocumentparseXHTMLFragment(String sXHTMLFragment)Parse the given fragment as XHTML 1.1.voidsetAdditionalSAXReaderSettings(com.helger.xml.serialize.read.ISAXReaderSettings aAdditionalSaxReaderSettings)Deprecated, for removal: This API element is subject to removal in a future version.UsesetSAXReaderSettings(ISAXReaderSettings)insteadXHTMLParsersetSAXReaderSettings(com.helger.xml.serialize.read.ISAXReaderSettings aAdditionalSaxReaderSettings)Set additional SAX reader settings that are used when an XHTML fragment is read.com.helger.xml.microdom.IMicroContainerunescapeXHTMLFragment(String sXHTML)Interpret the passed XHTML fragment as HTML and retrieve a result container with all body elements.
-
-
-
Constructor Detail
-
XHTMLParser
public XHTMLParser(@Nonnull EHTMLVersion eHTMLVersion)
-
-
Method Detail
-
createDefaultSAXReaderSettings
@Nonnull @ReturnsMutableCopy public static com.helger.xml.serialize.read.SAXReaderSettings createDefaultSAXReaderSettings()
-
getHTMLVersion
@Nonnull public EHTMLVersion getHTMLVersion()
- Returns:
- The HTML version as specified in the constructor. Never
null.
-
getAdditionalSAXReaderSettings
@Deprecated(forRemoval=true, since="9.1.1") @Nonnull @ReturnsMutableCopy public com.helger.xml.serialize.read.SAXReaderSettings getAdditionalSAXReaderSettings()
Deprecated, for removal: This API element is subject to removal in a future version.UsegetSAXReaderSettings()instead- Returns:
- A copy of the additional SAX reader settings that are used for parsing. By default a secure processing is active, that disallows inline DTDs in HTML documents.
-
getSAXReaderSettings
@Nonnull @ReturnsMutableCopy public com.helger.xml.serialize.read.SAXReaderSettings getSAXReaderSettings()
- Returns:
- A copy of the additional SAX reader settings that are used for parsing. By default a secure processing is active, that disallows inline DTDs in HTML documents.
- Since:
- 9.1.1
-
setAdditionalSAXReaderSettings
@Deprecated(forRemoval=true, since="9.1.1") public void setAdditionalSAXReaderSettings(@Nullable com.helger.xml.serialize.read.ISAXReaderSettings aAdditionalSaxReaderSettings)
Deprecated, for removal: This API element is subject to removal in a future version.UsesetSAXReaderSettings(ISAXReaderSettings)insteadSet additional SAX reader settings that are used when an XHTML fragment is read. All settings are reused when parsing except for the entity resolver which is always set to the defaultHTMLEntityResolver.- Parameters:
aAdditionalSaxReaderSettings- The settings to be used. May benull.
-
setSAXReaderSettings
@Nonnull public XHTMLParser setSAXReaderSettings(@Nullable com.helger.xml.serialize.read.ISAXReaderSettings aAdditionalSaxReaderSettings)
Set additional SAX reader settings that are used when an XHTML fragment is read. All settings are reused when parsing except for the entity resolver which is always set to the defaultHTMLEntityResolver.- Parameters:
aAdditionalSaxReaderSettings- The settings to be used. May benull.- Returns:
- this for chaining
- Since:
- 9.1.1
-
looksLikeXHTML
public static boolean looksLikeXHTML(@Nullable String sText)
Check whether the passed text looks like it contains XHTML code. This is a heuristic check only and does not perform actual parsing!- Parameters:
sText- The text to check.- Returns:
trueif the text looks like HTML
-
isValidXHTMLFragment
public boolean isValidXHTMLFragment(@Nullable String sXHTMLFragment)
Check if the given fragment is valid XHTML 1.1 mark-up. This method tries to parse the XHTML fragment, so it is potentially slow!- Parameters:
sXHTMLFragment- The XHTML fragment to parse. It is not checked, whether the value looks like HTML or not.- Returns:
trueif the fragment is valid,falseotherwise.
-
parseXHTMLFragment
@Nullable public com.helger.xml.microdom.IMicroDocument parseXHTMLFragment(@Nullable String sXHTMLFragment)
Parse the given fragment as XHTML 1.1. This is a sanity method forparseXHTMLFragment(String)with the predefined XHTML 1.1 document type.- Parameters:
sXHTMLFragment- The XHTML fragment to parse. May benull.- Returns:
nullif parsing failed.
-
parseXHTMLDocument
@Nullable public com.helger.xml.microdom.IMicroDocument parseXHTMLDocument(@Nullable String sXHTML)
This method parses a full HTML document into aIMicroDocumentusing the additional SAX reader settings and always theHTMLEntityResolveras an entity resolver.- Parameters:
sXHTML- The complete XHTML document as a string. May benull.- Returns:
nullif interpretation failed
-
unescapeXHTMLFragment
@Nullable public com.helger.xml.microdom.IMicroContainer unescapeXHTMLFragment(@Nullable String sXHTML)
Interpret the passed XHTML fragment as HTML and retrieve a result container with all body elements.- Parameters:
sXHTML- The XHTML text fragment. This fragment is parsed as an HTML body and may therefore not contain the <body> tag.- Returns:
nullif the passed text could not be interpreted as XHTML or if no body element was found, anIMicroContainerwith all body children otherwise.
-
-