public final class KeepLargestBlockFilter extends Object implements BoilerpipeFilter
TextBlock only (by the number of words). In case of
more than one block with the same number of words, the first block is chosen.
All discarded blocks are marked "not content" and flagged as
DefaultLabels.MIGHT_BE_CONTENT.
Note that, by default, only TextBlocks marked as "content" are taken into consideration.| Modifier and Type | Field and Description |
|---|---|
static KeepLargestBlockFilter |
INSTANCE |
static KeepLargestBlockFilter |
INSTANCE_EXPAND_TO_SAME_TAGLEVEL |
static KeepLargestBlockFilter |
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS |
| Constructor and Description |
|---|
KeepLargestBlockFilter(boolean expandToSameLevelText,
int minWords) |
public static final KeepLargestBlockFilter INSTANCE
public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL
public static final KeepLargestBlockFilter INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS
public KeepLargestBlockFilter(boolean expandToSameLevelText,
int minWords)
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
BoilerpipeFilterdoc.process in interface BoilerpipeFilterdoc - The TextDocument that is to be processed.true if changes have been made to the
TextDocument.BoilerpipeProcessingExceptionCopyright © 2013-2014. All Rights Reserved.