public final class MediaExtractor extends Object
| Modifier and Type | Field and Description |
|---|---|
static MediaExtractor |
INSTANCE |
| Constructor and Description |
|---|
MediaExtractor() |
| Modifier and Type | Method and Description |
|---|---|
static MediaExtractor |
getInstance() |
List<Media> |
process(String doc,
BoilerpipeExtractor extractor)
parses the media (picture, video) out of doc
|
List<Media> |
process(TextDocument doc,
InputSource is)
Processes the given
TextDocument and the original HTML text (as an InputSource). |
List<Media> |
process(TextDocument doc,
String origHTML)
Processes the given
TextDocument and the original HTML text (as a String). |
List<Media> |
process(URL url,
BoilerpipeExtractor extractor)
Fetches the given
URL using HTMLFetcher and processes the retrieved HTML using the specified
BoilerpipeExtractor. |
public static final MediaExtractor INSTANCE
public static MediaExtractor getInstance()
MediaExtractor.public List<Media> process(TextDocument doc, String origHTML) throws BoilerpipeProcessingException
TextDocument and the original HTML text (as a String).doc - The processed TextDocument.origHTML - The original HTML document.ImagesBoilerpipeProcessingException - if an error during extraction occurepublic List<Media> process(TextDocument doc, InputSource is) throws BoilerpipeProcessingException
TextDocument and the original HTML text (as an InputSource).doc - The processed TextDocument. The original HTML document.ImagesBoilerpipeProcessingExceptionpublic List<Media> process(URL url, BoilerpipeExtractor extractor) throws IOException, BoilerpipeProcessingException, SAXException
URL using HTMLFetcher and processes the retrieved HTML using the specified
BoilerpipeExtractor.url - the url of the document to fetchextractor - extractor to useImagesIOExceptionBoilerpipeProcessingExceptionSAXExceptionpublic List<Media> process(String doc, BoilerpipeExtractor extractor)
doc - document to parse the media outextractor - extractor to useCopyright © 2013-2014. All Rights Reserved.