Package org.apache.tika.sax
Class SafeContentHandler
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.apache.tika.sax.ContentHandlerDecorator
org.apache.tika.sax.SafeContentHandler
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler
- Direct Known Subclasses:
XHTMLContentHandler,XMPContentHandler
Content handler decorator that makes sure that the character events
(
characters(char[], int, int) or
ignorableWhitespace(char[], int, int)) passed to the decorated
content handler contain only valid XML characters. All invalid characters
are replaced with the Unicode replacement character U+FFFD (though a
subclass may change this by overriding the writeReplacement(Output) method).
The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcharacters(char[] ch, int start, int length) voidvoidendElement(String uri, String localName, String name) voidignorableWhitespace(char[] ch, int start, int length) voidstartElement(String uri, String localName, String name, Attributes atts) Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toStringMethods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warningMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.xml.sax.ContentHandler
declaration
-
Constructor Details
-
SafeContentHandler
-
-
Method Details
-
startElement
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException - Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classContentHandlerDecorator- Throws:
SAXException
-
endElement
- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classContentHandlerDecorator- Throws:
SAXException
-
endDocument
- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classContentHandlerDecorator- Throws:
SAXException
-
characters
- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classContentHandlerDecorator- Throws:
SAXException
-
ignorableWhitespace
- Specified by:
ignorableWhitespacein interfaceContentHandler- Overrides:
ignorableWhitespacein classContentHandlerDecorator- Throws:
SAXException
-