public abstract class ExtractorBase extends Object implements BoilerpipeExtractor
| Constructor and Description |
|---|
ExtractorBase() |
| Modifier and Type | Method and Description |
|---|---|
String |
getText(InputSource is)
Extracts text from the HTML code available from the given
InputSource. |
String |
getText(Reader r)
Extracts text from the HTML code available from the given
Reader. |
String |
getText(String html)
Extracts text from the HTML code given as a String.
|
String |
getText(TextDocument doc)
Extracts text from the given
TextDocument object. |
String |
getText(URL url)
Extracts text from the HTML code available from the given
URL. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprocesspublic String getText(String html) throws BoilerpipeProcessingException
getText in interface BoilerpipeExtractorhtml - The HTML code as a String.BoilerpipeProcessingExceptionpublic String getText(InputSource is) throws BoilerpipeProcessingException
InputSource.getText in interface BoilerpipeExtractoris - The InputSource containing the HTMLBoilerpipeProcessingExceptionpublic String getText(URL url) throws BoilerpipeProcessingException
URL.
NOTE: This method is mainly to be used for show case purposes. If you are
going to crawl the Web, consider using getText(InputSource)
instead.url - The URL pointing to the HTML code.BoilerpipeProcessingExceptionpublic String getText(Reader r) throws BoilerpipeProcessingException
Reader.getText in interface BoilerpipeExtractorr - The Reader containing the HTMLBoilerpipeProcessingExceptionpublic String getText(TextDocument doc) throws BoilerpipeProcessingException
TextDocument object.getText in interface BoilerpipeExtractordoc - The TextDocument.BoilerpipeProcessingExceptionCopyright © 2013-2014. All Rights Reserved.