Package org.ccil.cowan.tagsoup
Class Parser
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.ccil.cowan.tagsoup.Parser
- All Implemented Interfaces:
ScanHandler,ContentHandler,DTDHandler,EntityResolver,ErrorHandler,LexicalHandler,XMLReader
The SAX parser class.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringSpecifies the AutoDetector (for encoding detection) this Parser uses.static final StringA value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.static final StringA value of "true" indicates that the parser will treat CDATA elements specially.static final StringA value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.static final StringReports whether this parser processes external general entities (it doesn't).static final StringReports whether this parser processes external parameter entities (it doesn't).static final StringA value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.static final StringA value of "true" indicates that the parser will ignore unknown elements.static final StringMay be examined only during a parse, after the startDocument() callback has been completed; read-only.static final StringA value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).static final StringUsed to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).static final StringA value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.static final StringA value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.static final StringA value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.static final StringA value of "true" indicates that the parser will attempt to restart the restartable elements.static final StringA value of "true" indicates that the parser will allow unknown elements to be the root element.static final StringSpecifies the Scanner object this Parser uses.static final StringSpecifies the Schema object this Parser uses.static final StringHas a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.static final StringA value of "true" indicates that the parser will translate colons into underscores in names.static final StringControls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.static final StringReturns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.static final StringReturns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.static final StringReturns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.static final StringControls whether the parser is reporting all validity errors (We don't report any validity errors.)static final StringReturns "true" if the parser supports both XML 1.1 and XML 1.0.static final StringControls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidadup(char[] buff, int offset, int length) Reports an attribute name without a value.voidaname(char[] buff, int offset, int length) Reports an attribute name; a value will follow.voidaval(char[] buff, int offset, int length) Reports an attribute value.voidcdsect(char[] buff, int offset, int length) Reports the content of a CDATA section (not a CDATA element)voidcmnt(char[] buff, int offset, int length) Reports a comment.voidcomment(char[] ch, int start, int length) voiddecl(char[] buff, int offset, int length) Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.voidendCDATA()voidendDTD()voidvoidentity(char[] buff, int offset, int length) Reports an entity reference or character reference.voideof(char[] buff, int offset, int length) Reports EOF.voidetag(char[] buff, int offset, int length) Reports an end-tag.voidetag_basic(char[] buff, int offset, int length) booleanetag_cdata(char[] buff, int offset, int length) intReturns the value of the last entity or character reference reported.booleangetFeature(String name) getProperty(String name) voidgi(char[] buff, int offset, int length) Reports the general identifier (element type name) of a start-tag.voidvoidparse(InputSource input) voidpcdata(char[] buff, int offset, int length) Reports character content.voidpi(char[] buff, int offset, int length) Reports the data part of a processing instruction.voidpitarget(char[] buff, int offset, int length) Reports the target part of a processing instruction.voidsetContentHandler(ContentHandler handler) voidsetDTDHandler(DTDHandler handler) voidsetEntityResolver(EntityResolver resolver) voidsetErrorHandler(ErrorHandler handler) voidsetFeature(String name, boolean value) voidsetProperty(String name, Object value) voidstagc(char[] buff, int offset, int length) Reports the close of a start-tag.voidstage(char[] buff, int offset, int length) Reports the close of an empty-tag.voidvoidvoidstartEntity(String name) Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warningMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Field Details
-
namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.- See Also:
-
namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.- See Also:
-
externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).- See Also:
-
externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).- See Also:
-
isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)- See Also:
-
lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).- See Also:
-
resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)- See Also:
-
stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)- See Also:
-
useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)- See Also:
-
useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)- See Also:
-
useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)- See Also:
-
validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)- See Also:
-
unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)- See Also:
-
xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)- See Also:
-
XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)- See Also:
-
ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.- See Also:
-
bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.- See Also:
-
rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.- See Also:
-
defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.- See Also:
-
translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.- See Also:
-
restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.- See Also:
-
ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.- See Also:
-
CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.- See Also:
-
lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.- See Also:
-
scannerProperty
Specifies the Scanner object this Parser uses.- See Also:
-
schemaProperty
Specifies the Schema object this Parser uses.- See Also:
-
autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.- See Also:
-
-
Constructor Details
-
Parser
public Parser()
-
-
Method Details
-
getFeature
- Specified by:
getFeaturein interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setFeature
public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException - Specified by:
setFeaturein interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
getProperty
- Specified by:
getPropertyin interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setProperty
public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException - Specified by:
setPropertyin interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setEntityResolver
- Specified by:
setEntityResolverin interfaceXMLReader
-
getEntityResolver
- Specified by:
getEntityResolverin interfaceXMLReader
-
setDTDHandler
- Specified by:
setDTDHandlerin interfaceXMLReader
-
getDTDHandler
- Specified by:
getDTDHandlerin interfaceXMLReader
-
setContentHandler
- Specified by:
setContentHandlerin interfaceXMLReader
-
getContentHandler
- Specified by:
getContentHandlerin interfaceXMLReader
-
setErrorHandler
- Specified by:
setErrorHandlerin interfaceXMLReader
-
getErrorHandler
- Specified by:
getErrorHandlerin interfaceXMLReader
-
parse
- Specified by:
parsein interfaceXMLReader- Throws:
IOExceptionSAXException
-
parse
- Specified by:
parsein interfaceXMLReader- Throws:
IOExceptionSAXException
-
adup
Description copied from interface:ScanHandlerReports an attribute name without a value.- Specified by:
adupin interfaceScanHandler- Throws:
SAXException
-
aname
Description copied from interface:ScanHandlerReports an attribute name; a value will follow.- Specified by:
anamein interfaceScanHandler- Throws:
SAXException
-
aval
Description copied from interface:ScanHandlerReports an attribute value.- Specified by:
avalin interfaceScanHandler- Throws:
SAXException
-
entity
Description copied from interface:ScanHandlerReports an entity reference or character reference.- Specified by:
entityin interfaceScanHandler- Throws:
SAXException
-
eof
Description copied from interface:ScanHandlerReports EOF.- Specified by:
eofin interfaceScanHandler- Throws:
SAXException
-
etag
Description copied from interface:ScanHandlerReports an end-tag.- Specified by:
etagin interfaceScanHandler- Throws:
SAXException
-
etag_cdata
- Throws:
SAXException
-
etag_basic
- Throws:
SAXException
-
decl
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral- Specified by:
declin interfaceScanHandler- Throws:
SAXException
-
gi
Description copied from interface:ScanHandlerReports the general identifier (element type name) of a start-tag.- Specified by:
giin interfaceScanHandler- Throws:
SAXException
-
cdsect
Description copied from interface:ScanHandlerReports the content of a CDATA section (not a CDATA element)- Specified by:
cdsectin interfaceScanHandler- Throws:
SAXException
-
pcdata
Description copied from interface:ScanHandlerReports character content.- Specified by:
pcdatain interfaceScanHandler- Throws:
SAXException
-
pitarget
Description copied from interface:ScanHandlerReports the target part of a processing instruction.- Specified by:
pitargetin interfaceScanHandler- Throws:
SAXException
-
pi
Description copied from interface:ScanHandlerReports the data part of a processing instruction.- Specified by:
piin interfaceScanHandler- Throws:
SAXException
-
stagc
Description copied from interface:ScanHandlerReports the close of a start-tag.- Specified by:
stagcin interfaceScanHandler- Throws:
SAXException
-
stage
Description copied from interface:ScanHandlerReports the close of an empty-tag.- Specified by:
stagein interfaceScanHandler- Throws:
SAXException
-
cmnt
Description copied from interface:ScanHandlerReports a comment.- Specified by:
cmntin interfaceScanHandler- Throws:
SAXException
-
getEntity
public int getEntity()Description copied from interface:ScanHandlerReturns the value of the last entity or character reference reported.- Specified by:
getEntityin interfaceScanHandler
-
comment
- Specified by:
commentin interfaceLexicalHandler- Throws:
SAXException
-
endCDATA
- Specified by:
endCDATAin interfaceLexicalHandler- Throws:
SAXException
-
endDTD
- Specified by:
endDTDin interfaceLexicalHandler- Throws:
SAXException
-
endEntity
- Specified by:
endEntityin interfaceLexicalHandler- Throws:
SAXException
-
startCDATA
- Specified by:
startCDATAin interfaceLexicalHandler- Throws:
SAXException
-
startDTD
- Specified by:
startDTDin interfaceLexicalHandler- Throws:
SAXException
-
startEntity
- Specified by:
startEntityin interfaceLexicalHandler- Throws:
SAXException
-