Package org.apache.tika.sax
Class SafeContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- All Implemented Interfaces:
ContentHandler,DTDHandler,EntityResolver,ErrorHandler
- Direct Known Subclasses:
XHTMLContentHandler,XMPContentHandler
public class SafeContentHandler extends ContentHandlerDecorator
Content handler decorator that makes sure that the character events (characters(char[], int, int)orignorableWhitespace(char[], int, int)) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding thewriteReplacement(Output)method).The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
-
-
Constructor Summary
Constructors Constructor Description SafeContentHandler(ContentHandler handler)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcharacters(char[] ch, int start, int length)voidendDocument()voidendElement(String uri, String localName, String name)voidignorableWhitespace(char[] ch, int start, int length)voidstartElement(String uri, String localName, String name, Attributes atts)-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
-
-
-
-
Constructor Detail
-
SafeContentHandler
public SafeContentHandler(ContentHandler handler)
-
-
Method Detail
-
startElement
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
- Specified by:
startElementin interfaceContentHandler- Overrides:
startElementin classContentHandlerDecorator- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String name) throws SAXException
- Specified by:
endElementin interfaceContentHandler- Overrides:
endElementin classContentHandlerDecorator- Throws:
SAXException
-
endDocument
public void endDocument() throws SAXException- Specified by:
endDocumentin interfaceContentHandler- Overrides:
endDocumentin classContentHandlerDecorator- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException- Specified by:
charactersin interfaceContentHandler- Overrides:
charactersin classContentHandlerDecorator- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException- Specified by:
ignorableWhitespacein interfaceContentHandler- Overrides:
ignorableWhitespacein classContentHandlerDecorator- Throws:
SAXException
-
-