public class BoilerpipeHTMLContentHandler extends Object implements ContentHandler
ContentHandler, used by BoilerpipeSAXInput. Can
be used by different parser implementations, e.g. NekoHTML and TagSoup.| Constructor and Description |
|---|
BoilerpipeHTMLContentHandler()
Constructs a
BoilerpipeHTMLContentHandler using the
DefaultTagActionMap. |
BoilerpipeHTMLContentHandler(TagActionMap tagActions)
Constructs a
BoilerpipeHTMLContentHandler using the given
TagActionMap. |
| Modifier and Type | Method and Description |
|---|---|
void |
addLabelAction(LabelAction la) |
protected void |
addTextBlock(TextBlock tb) |
void |
addWhitespaceIfNecessary() |
void |
characters(char[] ch,
int start,
int length) |
void |
endDocument() |
void |
endElement(String uri,
String localName,
String qName) |
void |
endPrefixMapping(String prefix) |
void |
flushBlock() |
String |
getTitle() |
void |
ignorableWhitespace(char[] ch,
int start,
int length) |
void |
processingInstruction(String target,
String data) |
void |
recycle()
Recycles this instance.
|
void |
setDocumentLocator(Locator locator) |
void |
setTitle(String s) |
void |
skippedEntity(String name) |
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
void |
startPrefixMapping(String prefix,
String uri) |
TextDocument |
toTextDocument()
Returns a
TextDocument containing the extracted TextBlock
s. |
public BoilerpipeHTMLContentHandler()
BoilerpipeHTMLContentHandler using the
DefaultTagActionMap.public BoilerpipeHTMLContentHandler(TagActionMap tagActions)
BoilerpipeHTMLContentHandler using the given
TagActionMap.tagActions - The TagActionMap to use, e.g.
DefaultTagActionMap.public void recycle()
public void endDocument()
throws SAXException
endDocument in interface ContentHandlerSAXExceptionpublic void endPrefixMapping(String prefix) throws SAXException
endPrefixMapping in interface ContentHandlerSAXExceptionpublic void ignorableWhitespace(char[] ch,
int start,
int length)
throws SAXException
ignorableWhitespace in interface ContentHandlerSAXExceptionpublic void processingInstruction(String target, String data) throws SAXException
processingInstruction in interface ContentHandlerSAXExceptionpublic void setDocumentLocator(Locator locator)
setDocumentLocator in interface ContentHandlerpublic void skippedEntity(String name) throws SAXException
skippedEntity in interface ContentHandlerSAXExceptionpublic void startDocument()
throws SAXException
startDocument in interface ContentHandlerSAXExceptionpublic void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping in interface ContentHandlerSAXExceptionpublic void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement in interface ContentHandlerSAXExceptionpublic void endElement(String uri, String localName, String qName) throws SAXException
endElement in interface ContentHandlerSAXExceptionpublic void characters(char[] ch,
int start,
int length)
throws SAXException
characters in interface ContentHandlerSAXExceptionpublic void flushBlock()
protected void addTextBlock(TextBlock tb)
public String getTitle()
public void setTitle(String s)
public TextDocument toTextDocument()
TextDocument containing the extracted TextBlock
s. NOTE: Only call this after parsing.TextDocumentpublic void addWhitespaceIfNecessary()
public void addLabelAction(LabelAction la) throws IllegalStateException
IllegalStateExceptionCopyright © 2013-2014. All Rights Reserved.