See: Description
| Class | Description |
|---|---|
| AddPrecedingLabelsFilter |
Adds the labels of the preceding block to the current block, optionally adding a prefix.
|
| ArticleMetadataFilter | |
| BlockProximityFusion |
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
|
| ContentFusion | |
| DocumentTitleMatchClassifier |
Marks
TextBlocks which contain parts of the HTML
<TITLE> tag, using some heuristics which are quite
specific to the news domain. |
| ExpandTitleToContentFilter |
Marks all
TextBlocks "content" which are between the headline and the part that
has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT. |
| KeepLargestBlockFilter |
Keeps the largest
TextBlock only (by the number of words). |
| LabelFusion |
Fuses adjacent blocks if their labels are equal.
|
| LargeBlockSameTagLevelToContentFilter |
Marks all blocks as content that:
are on the same tag-level as very likely main content (usually the level of the largest block)
have a significant number of words, currently: at least 100
|
| ListAtEndFilter |
Marks nested list-item blocks after the end of the main content.
|
| SimpleBlockFusionProcessor |
Merges two subsequent blocks if their text densities are equal.
|
| TrailingHeadlineToBoilerplateFilter |
Marks trailing headlines (
TextBlocks that have the label DefaultLabels.HEADING)
as boilerplate. |
The BoilerpipeFilters in this package are pure heuristics.
Copyright © 2013-2014. All Rights Reserved.