Package org.ccil.cowan.tagsoup
Class Parser
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.ccil.cowan.tagsoup.Parser
-
- All Implemented Interfaces:
ScanHandler,org.xml.sax.ContentHandler,org.xml.sax.DTDHandler,org.xml.sax.EntityResolver,org.xml.sax.ErrorHandler,org.xml.sax.ext.LexicalHandler,org.xml.sax.XMLReader
public class Parser extends org.xml.sax.helpers.DefaultHandler implements ScanHandler, org.xml.sax.XMLReader, org.xml.sax.ext.LexicalHandler
The SAX parser class.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringautoDetectorPropertySpecifies the AutoDetector (for encoding detection) this Parser uses.static java.lang.StringbogonsEmptyFeatureA value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.static java.lang.StringCDATAElementsFeatureA value of "true" indicates that the parser will treat CDATA elements specially.static java.lang.StringdefaultAttributesFeatureA value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.static java.lang.StringexternalGeneralEntitiesFeatureReports whether this parser processes external general entities (it doesn't).static java.lang.StringexternalParameterEntitiesFeatureReports whether this parser processes external parameter entities (it doesn't).static java.lang.StringignorableWhitespaceFeatureA value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.static java.lang.StringignoreBogonsFeatureA value of "true" indicates that the parser will ignore unknown elements.static java.lang.StringisStandaloneFeatureMay be examined only during a parse, after the startDocument() callback has been completed; read-only.static java.lang.StringlexicalHandlerParameterEntitiesFeatureA value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).static java.lang.StringlexicalHandlerPropertyUsed to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).static java.lang.StringnamespacePrefixesFeatureA value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.static java.lang.StringnamespacesFeatureA value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.static java.lang.StringresolveDTDURIsFeatureA value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.static java.lang.StringrestartElementsFeatureA value of "true" indicates that the parser will attempt to restart the restartable elements.static java.lang.StringrootBogonsFeatureA value of "true" indicates that the parser will allow unknown elements to be the root element.static java.lang.StringscannerPropertySpecifies the Scanner object this Parser uses.static java.lang.StringschemaPropertySpecifies the Schema object this Parser uses.static java.lang.StringstringInterningFeatureHas a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.static java.lang.StringtranslateColonsFeatureA value of "true" indicates that the parser will translate colons into underscores in names.static java.lang.StringunicodeNormalizationCheckingFeatureControls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.static java.lang.StringuseAttributes2FeatureReturns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.static java.lang.StringuseEntityResolver2FeatureReturns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.static java.lang.StringuseLocator2FeatureReturns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.static java.lang.StringvalidationFeatureControls whether the parser is reporting all validity errors (We don't report any validity errors.)static java.lang.StringXML11FeatureReturns "true" if the parser supports both XML 1.1 and XML 1.0.static java.lang.StringxmlnsURIsFeatureControls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace.
-
Constructor Summary
Constructors Constructor Description Parser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadup(char[] buff, int offset, int length)Reports an attribute name without a value.voidaname(char[] buff, int offset, int length)Reports an attribute name; a value will follow.voidaval(char[] buff, int offset, int length)Reports an attribute value.voidcdsect(char[] buff, int offset, int length)Reports the content of a CDATA section (not a CDATA element)voidcmnt(char[] buff, int offset, int length)Reports a comment.voidcomment(char[] ch, int start, int length)voiddecl(char[] buff, int offset, int length)Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.voidendCDATA()voidendDTD()voidendEntity(java.lang.String name)voidentity(char[] buff, int offset, int length)Reports an entity reference or character reference.voideof(char[] buff, int offset, int length)Reports EOF.voidetag(char[] buff, int offset, int length)Reports an end-tag.voidetag_basic(char[] buff, int offset, int length)booleanetag_cdata(char[] buff, int offset, int length)org.xml.sax.ContentHandlergetContentHandler()org.xml.sax.DTDHandlergetDTDHandler()intgetEntity()Returns the value of the last entity or character reference reported.org.xml.sax.EntityResolvergetEntityResolver()org.xml.sax.ErrorHandlergetErrorHandler()booleangetFeature(java.lang.String name)java.lang.ObjectgetProperty(java.lang.String name)voidgi(char[] buff, int offset, int length)Reports the general identifier (element type name) of a start-tag.voidparse(java.lang.String systemid)voidparse(org.xml.sax.InputSource input)voidpcdata(char[] buff, int offset, int length)Reports character content.voidpi(char[] buff, int offset, int length)Reports the data part of a processing instruction.voidpitarget(char[] buff, int offset, int length)Reports the target part of a processing instruction.voidsetContentHandler(org.xml.sax.ContentHandler handler)voidsetDTDHandler(org.xml.sax.DTDHandler handler)voidsetEntityResolver(org.xml.sax.EntityResolver resolver)voidsetErrorHandler(org.xml.sax.ErrorHandler handler)voidsetFeature(java.lang.String name, boolean value)voidsetProperty(java.lang.String name, java.lang.Object value)voidstagc(char[] buff, int offset, int length)Reports the close of a start-tag.voidstage(char[] buff, int offset, int length)Reports the close of an empty-tag.voidstartCDATA()voidstartDTD(java.lang.String name, java.lang.String publicid, java.lang.String systemid)voidstartEntity(java.lang.String name)-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
namespacesFeature
public static final java.lang.String namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.- See Also:
- Constant Field Values
-
namespacePrefixesFeature
public static final java.lang.String namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.- See Also:
- Constant Field Values
-
externalGeneralEntitiesFeature
public static final java.lang.String externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).- See Also:
- Constant Field Values
-
externalParameterEntitiesFeature
public static final java.lang.String externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).- See Also:
- Constant Field Values
-
isStandaloneFeature
public static final java.lang.String isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)- See Also:
- Constant Field Values
-
lexicalHandlerParameterEntitiesFeature
public static final java.lang.String lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).- See Also:
- Constant Field Values
-
resolveDTDURIsFeature
public static final java.lang.String resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)- See Also:
- Constant Field Values
-
stringInterningFeature
public static final java.lang.String stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)- See Also:
- Constant Field Values
-
useAttributes2Feature
public static final java.lang.String useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)- See Also:
- Constant Field Values
-
useLocator2Feature
public static final java.lang.String useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)- See Also:
- Constant Field Values
-
useEntityResolver2Feature
public static final java.lang.String useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)- See Also:
- Constant Field Values
-
validationFeature
public static final java.lang.String validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)- See Also:
- Constant Field Values
-
unicodeNormalizationCheckingFeature
public static final java.lang.String unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)- See Also:
- Constant Field Values
-
xmlnsURIsFeature
public static final java.lang.String xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)- See Also:
- Constant Field Values
-
XML11Feature
public static final java.lang.String XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)- See Also:
- Constant Field Values
-
ignoreBogonsFeature
public static final java.lang.String ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.- See Also:
- Constant Field Values
-
bogonsEmptyFeature
public static final java.lang.String bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.- See Also:
- Constant Field Values
-
rootBogonsFeature
public static final java.lang.String rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.- See Also:
- Constant Field Values
-
defaultAttributesFeature
public static final java.lang.String defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.- See Also:
- Constant Field Values
-
translateColonsFeature
public static final java.lang.String translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.- See Also:
- Constant Field Values
-
restartElementsFeature
public static final java.lang.String restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.- See Also:
- Constant Field Values
-
ignorableWhitespaceFeature
public static final java.lang.String ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.- See Also:
- Constant Field Values
-
CDATAElementsFeature
public static final java.lang.String CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.- See Also:
- Constant Field Values
-
lexicalHandlerProperty
public static final java.lang.String lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.- See Also:
- Constant Field Values
-
scannerProperty
public static final java.lang.String scannerProperty
Specifies the Scanner object this Parser uses.- See Also:
- Constant Field Values
-
schemaProperty
public static final java.lang.String schemaProperty
Specifies the Schema object this Parser uses.- See Also:
- Constant Field Values
-
autoDetectorProperty
public static final java.lang.String autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.- See Also:
- Constant Field Values
-
-
Method Detail
-
getFeature
public boolean getFeature(java.lang.String name) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException- Specified by:
getFeaturein interfaceorg.xml.sax.XMLReader- Throws:
org.xml.sax.SAXNotRecognizedExceptionorg.xml.sax.SAXNotSupportedException
-
setFeature
public void setFeature(java.lang.String name, boolean value) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException- Specified by:
setFeaturein interfaceorg.xml.sax.XMLReader- Throws:
org.xml.sax.SAXNotRecognizedExceptionorg.xml.sax.SAXNotSupportedException
-
getProperty
public java.lang.Object getProperty(java.lang.String name) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException- Specified by:
getPropertyin interfaceorg.xml.sax.XMLReader- Throws:
org.xml.sax.SAXNotRecognizedExceptionorg.xml.sax.SAXNotSupportedException
-
setProperty
public void setProperty(java.lang.String name, java.lang.Object value) throws org.xml.sax.SAXNotRecognizedException, org.xml.sax.SAXNotSupportedException- Specified by:
setPropertyin interfaceorg.xml.sax.XMLReader- Throws:
org.xml.sax.SAXNotRecognizedExceptionorg.xml.sax.SAXNotSupportedException
-
setEntityResolver
public void setEntityResolver(org.xml.sax.EntityResolver resolver)
- Specified by:
setEntityResolverin interfaceorg.xml.sax.XMLReader
-
getEntityResolver
public org.xml.sax.EntityResolver getEntityResolver()
- Specified by:
getEntityResolverin interfaceorg.xml.sax.XMLReader
-
setDTDHandler
public void setDTDHandler(org.xml.sax.DTDHandler handler)
- Specified by:
setDTDHandlerin interfaceorg.xml.sax.XMLReader
-
getDTDHandler
public org.xml.sax.DTDHandler getDTDHandler()
- Specified by:
getDTDHandlerin interfaceorg.xml.sax.XMLReader
-
setContentHandler
public void setContentHandler(org.xml.sax.ContentHandler handler)
- Specified by:
setContentHandlerin interfaceorg.xml.sax.XMLReader
-
getContentHandler
public org.xml.sax.ContentHandler getContentHandler()
- Specified by:
getContentHandlerin interfaceorg.xml.sax.XMLReader
-
setErrorHandler
public void setErrorHandler(org.xml.sax.ErrorHandler handler)
- Specified by:
setErrorHandlerin interfaceorg.xml.sax.XMLReader
-
getErrorHandler
public org.xml.sax.ErrorHandler getErrorHandler()
- Specified by:
getErrorHandlerin interfaceorg.xml.sax.XMLReader
-
parse
public void parse(org.xml.sax.InputSource input) throws java.io.IOException, org.xml.sax.SAXException- Specified by:
parsein interfaceorg.xml.sax.XMLReader- Throws:
java.io.IOExceptionorg.xml.sax.SAXException
-
parse
public void parse(java.lang.String systemid) throws java.io.IOException, org.xml.sax.SAXException- Specified by:
parsein interfaceorg.xml.sax.XMLReader- Throws:
java.io.IOExceptionorg.xml.sax.SAXException
-
adup
public void adup(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports an attribute name without a value.- Specified by:
adupin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
aname
public void aname(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports an attribute name; a value will follow.- Specified by:
anamein interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
aval
public void aval(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports an attribute value.- Specified by:
avalin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
entity
public void entity(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports an entity reference or character reference.- Specified by:
entityin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
eof
public void eof(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports EOF.- Specified by:
eofin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
etag
public void etag(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports an end-tag.- Specified by:
etagin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
etag_cdata
public boolean etag_cdata(char[] buff, int offset, int length) throws org.xml.sax.SAXException- Throws:
org.xml.sax.SAXException
-
etag_basic
public void etag_basic(char[] buff, int offset, int length) throws org.xml.sax.SAXException- Throws:
org.xml.sax.SAXException
-
decl
public void decl(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionParsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral- Specified by:
declin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
gi
public void gi(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the general identifier (element type name) of a start-tag.- Specified by:
giin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
cdsect
public void cdsect(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the content of a CDATA section (not a CDATA element)- Specified by:
cdsectin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
pcdata
public void pcdata(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports character content.- Specified by:
pcdatain interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
pitarget
public void pitarget(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the target part of a processing instruction.- Specified by:
pitargetin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
pi
public void pi(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the data part of a processing instruction.- Specified by:
piin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
stagc
public void stagc(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the close of a start-tag.- Specified by:
stagcin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
stage
public void stage(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports the close of an empty-tag.- Specified by:
stagein interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
cmnt
public void cmnt(char[] buff, int offset, int length) throws org.xml.sax.SAXExceptionDescription copied from interface:ScanHandlerReports a comment.- Specified by:
cmntin interfaceScanHandler- Throws:
org.xml.sax.SAXException
-
getEntity
public int getEntity()
Description copied from interface:ScanHandlerReturns the value of the last entity or character reference reported.- Specified by:
getEntityin interfaceScanHandler
-
comment
public void comment(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
commentin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
endCDATA
public void endCDATA() throws org.xml.sax.SAXException- Specified by:
endCDATAin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
endDTD
public void endDTD() throws org.xml.sax.SAXException- Specified by:
endDTDin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
endEntity
public void endEntity(java.lang.String name) throws org.xml.sax.SAXException- Specified by:
endEntityin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
startCDATA
public void startCDATA() throws org.xml.sax.SAXException- Specified by:
startCDATAin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
startDTD
public void startDTD(java.lang.String name, java.lang.String publicid, java.lang.String systemid) throws org.xml.sax.SAXException- Specified by:
startDTDin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
startEntity
public void startEntity(java.lang.String name) throws org.xml.sax.SAXException- Specified by:
startEntityin interfaceorg.xml.sax.ext.LexicalHandler- Throws:
org.xml.sax.SAXException
-
-