| Package | Description |
|---|---|
| de.l3s.boilerpipe.extractors |
This package contains some standard extractors (i.e., completely
piped BoilerpipeFilters)
|
| de.l3s.boilerpipe.sax |
Classes related to parsing and producing HTML from/to Boilerpipe
TextDocuments.
|
| Modifier and Type | Class and Description |
|---|---|
class |
ArticleExtractor
A full-text extractor which is tuned towards news articles.
|
class |
ArticleSentencesExtractor
A full-text extractor which is tuned towards extracting sentences from news articles.
|
class |
CanolaExtractor
|
class |
DefaultExtractor
A quite generic full-text extractor.
|
class |
ExtractorBase
The base class of Extractors.
|
class |
KeepEverythingExtractor
Marks everything as content.
|
class |
KeepEverythingWithMinKWordsExtractor
A full-text extractor which extracts the largest text component of a page.
|
class |
LargestContentExtractor
A full-text extractor which extracts the largest text component of a page.
|
class |
NumWordsRulesExtractor
A quite generic full-text extractor solely based upon the number of words per
block (the current, the previous and the next block).
|
| Modifier and Type | Method and Description |
|---|---|
List<Media> |
MediaExtractor.process(String doc,
BoilerpipeExtractor extractor)
parses the media (picture, video) out of doc
|
List<Media> |
MediaExtractor.process(URL url,
BoilerpipeExtractor extractor)
Fetches the given
URL using HTMLFetcher and processes the retrieved HTML using the specified
BoilerpipeExtractor. |
List<Image> |
ImageExtractor.process(URL url,
BoilerpipeExtractor extractor)
Fetches the given
URL using HTMLFetcher and processes the
retrieved HTML using the specified BoilerpipeExtractor. |
String |
HTMLHighlighter.process(URL url,
BoilerpipeExtractor extractor)
Fetches the given
URL using HTMLFetcher and processes the
retrieved HTML using the specified BoilerpipeExtractor. |
Copyright © 2013-2014. All Rights Reserved.