Package org.ccil.cowan.tagsoup
Class Parser
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.ccil.cowan.tagsoup.Parser
-
- All Implemented Interfaces:
ScanHandler,ContentHandler,DTDHandler,EntityResolver,ErrorHandler,LexicalHandler,XMLReader
public class Parser extends DefaultHandler implements ScanHandler, XMLReader, LexicalHandler
The SAX parser class.
-
-
Field Summary
Fields Modifier and Type Field Description static StringautoDetectorPropertySpecifies the AutoDetector (for encoding detection) this Parser uses.static StringbogonsEmptyFeatureA value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.static StringCDATAElementsFeatureA value of "true" indicates that the parser will treat CDATA elements specially.static StringdefaultAttributesFeatureA value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.static StringexternalGeneralEntitiesFeatureReports whether this parser processes external general entities (it doesn't).static StringexternalParameterEntitiesFeatureReports whether this parser processes external parameter entities (it doesn't).static StringignorableWhitespaceFeatureA value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.static StringignoreBogonsFeatureA value of "true" indicates that the parser will ignore unknown elements.static StringisStandaloneFeatureMay be examined only during a parse, after the startDocument() callback has been completed; read-only.static StringlexicalHandlerParameterEntitiesFeatureA value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).static StringlexicalHandlerPropertyUsed to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).static StringnamespacePrefixesFeatureA value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.static StringnamespacesFeatureA value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.static StringresolveDTDURIsFeatureA value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.static StringrestartElementsFeatureA value of "true" indicates that the parser will attempt to restart the restartable elements.static StringrootBogonsFeatureA value of "true" indicates that the parser will allow unknown elements to be the root element.static StringscannerPropertySpecifies the Scanner object this Parser uses.static StringschemaPropertySpecifies the Schema object this Parser uses.static StringstringInterningFeatureHas a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.static StringtranslateColonsFeatureA value of "true" indicates that the parser will translate colons into underscores in names.static StringunicodeNormalizationCheckingFeatureControls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.static StringuseAttributes2FeatureReturns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.static StringuseEntityResolver2FeatureReturns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.static StringuseLocator2FeatureReturns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.static StringvalidationFeatureControls whether the parser is reporting all validity errors (We don't report any validity errors.)static StringXML11FeatureReturns "true" if the parser supports both XML 1.1 and XML 1.0.static StringxmlnsURIsFeatureControls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace.
-
Constructor Summary
Constructors Constructor Description Parser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadup(char[] buff, int offset, int length)Reports an attribute name without a value.voidaname(char[] buff, int offset, int length)Reports an attribute name; a value will follow.voidaval(char[] buff, int offset, int length)Reports an attribute value.voidcdsect(char[] buff, int offset, int length)Reports the content of a CDATA section (not a CDATA element)voidcmnt(char[] buff, int offset, int length)Reports a comment.voidcomment(char[] ch, int start, int length)voiddecl(char[] buff, int offset, int length)Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.voidendCDATA()voidendDTD()voidendEntity(String name)voidentity(char[] buff, int offset, int length)Reports an entity reference or character reference.voideof(char[] buff, int offset, int length)Reports EOF.voidetag(char[] buff, int offset, int length)Reports an end-tag.voidetag_basic(char[] buff, int offset, int length)booleanetag_cdata(char[] buff, int offset, int length)ContentHandlergetContentHandler()DTDHandlergetDTDHandler()intgetEntity()Returns the value of the last entity or character reference reported.EntityResolvergetEntityResolver()ErrorHandlergetErrorHandler()booleangetFeature(String name)ObjectgetProperty(String name)voidgi(char[] buff, int offset, int length)Reports the general identifier (element type name) of a start-tag.voidparse(String systemid)voidparse(InputSource input)voidpcdata(char[] buff, int offset, int length)Reports character content.voidpi(char[] buff, int offset, int length)Reports the data part of a processing instruction.voidpitarget(char[] buff, int offset, int length)Reports the target part of a processing instruction.voidsetContentHandler(ContentHandler handler)voidsetDTDHandler(DTDHandler handler)voidsetEntityResolver(EntityResolver resolver)voidsetErrorHandler(ErrorHandler handler)voidsetFeature(String name, boolean value)voidsetProperty(String name, Object value)voidstagc(char[] buff, int offset, int length)Reports the close of a start-tag.voidstage(char[] buff, int offset, int length)Reports the close of an empty-tag.voidstartCDATA()voidstartDTD(String name, String publicid, String systemid)voidstartEntity(String name)-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
-
-
-
-
Field Detail
-
namespacesFeature
public static final String namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.- See Also:
- Constant Field Values
-
namespacePrefixesFeature
public static final String namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.- See Also:
- Constant Field Values
-
externalGeneralEntitiesFeature
public static final String externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).- See Also:
- Constant Field Values
-
externalParameterEntitiesFeature
public static final String externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).- See Also:
- Constant Field Values
-
isStandaloneFeature
public static final String isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)- See Also:
- Constant Field Values
-
lexicalHandlerParameterEntitiesFeature
public static final String lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).- See Also:
- Constant Field Values
-
resolveDTDURIsFeature
public static final String resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)- See Also:
- Constant Field Values
-
stringInterningFeature
public static final String stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)- See Also:
- Constant Field Values
-
useAttributes2Feature
public static final String useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)- See Also:
- Constant Field Values
-
useLocator2Feature
public static final String useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)- See Also:
- Constant Field Values
-
useEntityResolver2Feature
public static final String useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)- See Also:
- Constant Field Values
-
validationFeature
public static final String validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)- See Also:
- Constant Field Values
-
unicodeNormalizationCheckingFeature
public static final String unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)- See Also:
- Constant Field Values
-
xmlnsURIsFeature
public static final String xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)- See Also:
- Constant Field Values
-
XML11Feature
public static final String XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)- See Also:
- Constant Field Values
-
ignoreBogonsFeature
public static final String ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.- See Also:
- Constant Field Values
-
bogonsEmptyFeature
public static final String bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.- See Also:
- Constant Field Values
-
rootBogonsFeature
public static final String rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.- See Also:
- Constant Field Values
-
defaultAttributesFeature
public static final String defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.- See Also:
- Constant Field Values
-
translateColonsFeature
public static final String translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.- See Also:
- Constant Field Values
-
restartElementsFeature
public static final String restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.- See Also:
- Constant Field Values
-
ignorableWhitespaceFeature
public static final String ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.- See Also:
- Constant Field Values
-
CDATAElementsFeature
public static final String CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.- See Also:
- Constant Field Values
-
lexicalHandlerProperty
public static final String lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.- See Also:
- Constant Field Values
-
scannerProperty
public static final String scannerProperty
Specifies the Scanner object this Parser uses.- See Also:
- Constant Field Values
-
schemaProperty
public static final String schemaProperty
Specifies the Schema object this Parser uses.- See Also:
- Constant Field Values
-
autoDetectorProperty
public static final String autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.- See Also:
- Constant Field Values
-
-
Method Detail
-
getFeature
public boolean getFeature(String name) throws SAXNotRecognizedException, SAXNotSupportedException
- Specified by:
getFeaturein interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setFeature
public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException
- Specified by:
setFeaturein interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
getProperty
public Object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedException
- Specified by:
getPropertyin interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setProperty
public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException
- Specified by:
setPropertyin interfaceXMLReader- Throws:
SAXNotRecognizedExceptionSAXNotSupportedException
-
setEntityResolver
public void setEntityResolver(EntityResolver resolver)
- Specified by:
setEntityResolverin interfaceXMLReader
-
getEntityResolver
public EntityResolver getEntityResolver()
- Specified by:
getEntityResolverin interfaceXMLReader
-
setDTDHandler
public void setDTDHandler(DTDHandler handler)
- Specified by:
setDTDHandlerin interfaceXMLReader
-
getDTDHandler
public DTDHandler getDTDHandler()
- Specified by:
getDTDHandlerin interfaceXMLReader
-
setContentHandler
public void setContentHandler(ContentHandler handler)
- Specified by:
setContentHandlerin interfaceXMLReader
-
getContentHandler
public ContentHandler getContentHandler()
- Specified by:
getContentHandlerin interfaceXMLReader
-
setErrorHandler
public void setErrorHandler(ErrorHandler handler)
- Specified by:
setErrorHandlerin interfaceXMLReader
-
getErrorHandler
public ErrorHandler getErrorHandler()
- Specified by:
getErrorHandlerin interfaceXMLReader
-
parse
public void parse(InputSource input) throws IOException, SAXException
- Specified by:
parsein interfaceXMLReader- Throws:
IOExceptionSAXException
-
parse
public void parse(String systemid) throws IOException, SAXException
- Specified by:
parsein interfaceXMLReader- Throws:
IOExceptionSAXException
-
adup
public void adup(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports an attribute name without a value.- Specified by:
adupin interfaceScanHandler- Throws:
SAXException
-
aname
public void aname(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports an attribute name; a value will follow.- Specified by:
anamein interfaceScanHandler- Throws:
SAXException
-
aval
public void aval(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports an attribute value.- Specified by:
avalin interfaceScanHandler- Throws:
SAXException
-
entity
public void entity(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports an entity reference or character reference.- Specified by:
entityin interfaceScanHandler- Throws:
SAXException
-
eof
public void eof(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports EOF.- Specified by:
eofin interfaceScanHandler- Throws:
SAXException
-
etag
public void etag(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports an end-tag.- Specified by:
etagin interfaceScanHandler- Throws:
SAXException
-
etag_cdata
public boolean etag_cdata(char[] buff, int offset, int length) throws SAXException- Throws:
SAXException
-
etag_basic
public void etag_basic(char[] buff, int offset, int length) throws SAXException- Throws:
SAXException
-
decl
public void decl(char[] buff, int offset, int length) throws SAXExceptionParsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral- Specified by:
declin interfaceScanHandler- Throws:
SAXException
-
gi
public void gi(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the general identifier (element type name) of a start-tag.- Specified by:
giin interfaceScanHandler- Throws:
SAXException
-
cdsect
public void cdsect(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the content of a CDATA section (not a CDATA element)- Specified by:
cdsectin interfaceScanHandler- Throws:
SAXException
-
pcdata
public void pcdata(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports character content.- Specified by:
pcdatain interfaceScanHandler- Throws:
SAXException
-
pitarget
public void pitarget(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the target part of a processing instruction.- Specified by:
pitargetin interfaceScanHandler- Throws:
SAXException
-
pi
public void pi(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the data part of a processing instruction.- Specified by:
piin interfaceScanHandler- Throws:
SAXException
-
stagc
public void stagc(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the close of a start-tag.- Specified by:
stagcin interfaceScanHandler- Throws:
SAXException
-
stage
public void stage(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports the close of an empty-tag.- Specified by:
stagein interfaceScanHandler- Throws:
SAXException
-
cmnt
public void cmnt(char[] buff, int offset, int length) throws SAXExceptionDescription copied from interface:ScanHandlerReports a comment.- Specified by:
cmntin interfaceScanHandler- Throws:
SAXException
-
getEntity
public int getEntity()
Description copied from interface:ScanHandlerReturns the value of the last entity or character reference reported.- Specified by:
getEntityin interfaceScanHandler
-
comment
public void comment(char[] ch, int start, int length) throws SAXException- Specified by:
commentin interfaceLexicalHandler- Throws:
SAXException
-
endCDATA
public void endCDATA() throws SAXException- Specified by:
endCDATAin interfaceLexicalHandler- Throws:
SAXException
-
endDTD
public void endDTD() throws SAXException- Specified by:
endDTDin interfaceLexicalHandler- Throws:
SAXException
-
endEntity
public void endEntity(String name) throws SAXException
- Specified by:
endEntityin interfaceLexicalHandler- Throws:
SAXException
-
startCDATA
public void startCDATA() throws SAXException- Specified by:
startCDATAin interfaceLexicalHandler- Throws:
SAXException
-
startDTD
public void startDTD(String name, String publicid, String systemid) throws SAXException
- Specified by:
startDTDin interfaceLexicalHandler- Throws:
SAXException
-
startEntity
public void startEntity(String name) throws SAXException
- Specified by:
startEntityin interfaceLexicalHandler- Throws:
SAXException
-
-