A B C D E F G H I K L M N O P Q R S T U V W X Y 

A

A - Static variable in class org.cyberneko.html.HTMLElements
 
ABBR - Static variable in class org.cyberneko.html.HTMLElements
 
ACRONYM - Static variable in class org.cyberneko.html.HTMLElements
 
addElement(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.ElementList
Adds an element to list, resizing if necessary.
addLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock
Adds an arbitrary String label to this TextBlock.
addLabelAction(LabelAction) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addLabels(Set<String>) - Method in class de.l3s.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabels(String...) - Method in class de.l3s.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabelsTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.LabelAction
 
AddPrecedingLabelsFilter - Class in de.l3s.boilerpipe.filters.heuristics
Adds the labels of the preceding block to the current block, optionally adding a prefix.
AddPrecedingLabelsFilter(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
Creates a new AddPrecedingLabelsFilter instance.
ADDRESS - Static variable in class org.cyberneko.html.HTMLElements
 
addTagAction(String, TagAction) - Method in class de.l3s.boilerpipe.sax.TagActionMap
Adds a particular TagAction for a given tag.
addTextBlock(TextBlock) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.ConditionalLabelAction
 
addTo(TextBlock) - Method in class de.l3s.boilerpipe.labels.LabelAction
 
addWhitespaceIfNecessary() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
APPLET - Static variable in class org.cyberneko.html.HTMLElements
 
AREA - Static variable in class org.cyberneko.html.HTMLElements
 
ARTICLE_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors
Works very well for most types of Article-like HTML.
ARTICLE_METADATA - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
ArticleExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which is tuned towards news articles.
ArticleExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleExtractor
 
ArticleMetadataFilter - Class in de.l3s.boilerpipe.filters.heuristics
 
ArticleSentencesExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which is tuned towards extracting sentences from news articles.
ArticleSentencesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 
attributes - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element attributes.
AUGMENTATIONS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Include infoset augmentations.
avgNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics
Returns the average number of words at block-level (= overall number of words divided by the number of blocks).

B

B - Static variable in class org.cyberneko.html.HTMLElements
 
BASE - Static variable in class org.cyberneko.html.HTMLElements
 
BASEFONT - Static variable in class org.cyberneko.html.HTMLElements
 
BDO - Static variable in class org.cyberneko.html.HTMLElements
 
BGSOUND - Static variable in class org.cyberneko.html.HTMLElements
 
BIG - Static variable in class org.cyberneko.html.HTMLElements
 
BLINK - Static variable in class org.cyberneko.html.HTMLElements
 
BLOCK - Static variable in class org.cyberneko.html.HTMLElements.Element
Block element.
BlockProximityFusion - Class in de.l3s.boilerpipe.filters.heuristics
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
BlockProximityFusion(int, boolean, boolean) - Constructor for class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
Creates a new BlockProximityFusion instance.
BLOCKQUOTE - Static variable in class org.cyberneko.html.HTMLElements
 
BODY - Static variable in class org.cyberneko.html.HTMLElements
 
BoilerpipeDocumentSource - Interface in de.l3s.boilerpipe
Something that can be represented as a TextDocument.
BoilerpipeExtractor - Interface in de.l3s.boilerpipe
Describes a complete filter pipeline.
BoilerpipeFilter - Interface in de.l3s.boilerpipe
A generic BoilerpipeFilter.
BoilerpipeHTMLContentHandler - Class in de.l3s.boilerpipe.sax
A simple SAX ContentHandler, used by BoilerpipeSAXInput.
BoilerpipeHTMLContentHandler() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Constructs a BoilerpipeHTMLContentHandler using the given TagActionMap.
BoilerpipeHTMLParser - Class in de.l3s.boilerpipe.sax
A simple SAX Parser, used by BoilerpipeSAXInput.
BoilerpipeHTMLParser() - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
Constructs a BoilerpipeHTMLParser using a default HTML content handler.
BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
BoilerpipeHTMLParser(boolean) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
 
BoilerpipeInput - Interface in de.l3s.boilerpipe
A source that returns TextDocuments.
BoilerpipeProcessingException - Exception in de.l3s.boilerpipe
Exception for signaling failure in the processing pipeline.
BoilerpipeProcessingException() - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String, Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(Throwable) - Constructor for exception de.l3s.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeSAXInput - Class in de.l3s.boilerpipe.sax
Parses an InputSource using SAX and returns a TextDocument.
BoilerpipeSAXInput(InputSource) - Constructor for class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
Creates a new instance of BoilerpipeSAXInput for the given InputSource.
BoilerplateBlockFilter - Class in de.l3s.boilerpipe.filters.simple
Removes TextBlocks which have explicitly been marked as "not content".
BoilerplateBlockFilter(String) - Constructor for class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
bounds - Variable in class org.cyberneko.html.HTMLElements.Element
The bounding element code.
BR - Static variable in class org.cyberneko.html.HTMLElements
 
BUTTON - Static variable in class org.cyberneko.html.HTMLElements
 

C

callEndElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Call document handler end element.
callStartElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Call document handler start element.
CANOLA_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors
Trained on krdwrd Canola (different definition of "boilerplate").
CanolaExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor trained on krdwrd Canola .
CanolaExtractor() - Constructor for class de.l3s.boilerpipe.extractors.CanolaExtractor
 
CAPTION - Static variable in class org.cyberneko.html.HTMLElements
 
CENTER - Static variable in class org.cyberneko.html.HTMLElements
 
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
 
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
changesTagLevel() - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
 
changesTagLevel() - Method in interface de.l3s.boilerpipe.sax.TagAction
 
characters(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
characters(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Characters.
CITE - Static variable in class org.cyberneko.html.HTMLElements
 
CLASSIFIER - Static variable in class de.l3s.boilerpipe.extractors.CanolaExtractor
The actual classifier, exposed.
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
classify(TextBlock, TextBlock, TextBlock) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
clone() - Method in class de.l3s.boilerpipe.document.TextBlock
 
clone() - Method in class de.l3s.boilerpipe.document.TextDocument
 
closes - Variable in class org.cyberneko.html.HTMLElements.Element
List of elements this element can close.
closes(short) - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element can close the specified Element.
CODE - Static variable in class org.cyberneko.html.HTMLElements
 
code - Variable in class org.cyberneko.html.HTMLElements.Element
The element code.
COL - Static variable in class org.cyberneko.html.HTMLElements
 
COLGROUP - Static variable in class org.cyberneko.html.HTMLElements
 
COMMENT - Static variable in class org.cyberneko.html.HTMLElements
 
comment(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Comment.
CommonExtractors - Class in de.l3s.boilerpipe.extractors
Provides quick access to common BoilerpipeExtractors.
CommonTagActions - Class in de.l3s.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
CommonTagActions.BlockTagLabelAction - Class in de.l3s.boilerpipe.sax
CommonTagActions for block-level elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.BlockTagLabelAction(LabelAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
CommonTagActions.Chained - Class in de.l3s.boilerpipe.sax
 
CommonTagActions.Chained(TagAction, TagAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.Chained
 
CommonTagActions.InlineTagLabelAction - Class in de.l3s.boilerpipe.sax
CommonTagActions for inline elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.InlineTagLabelAction(LabelAction) - Constructor for class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
compareTo(Image) - Method in class de.l3s.boilerpipe.document.Image
 
ConditionalLabelAction - Class in de.l3s.boilerpipe.labels
Adds labels to a TextBlock if the given criteria are met.
ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class de.l3s.boilerpipe.labels.ConditionalLabelAction
 
CONTAINER - Static variable in class org.cyberneko.html.HTMLElements.Element
Container element.
ContentFusion - Class in de.l3s.boilerpipe.filters.heuristics
 
ContentFusion() - Constructor for class de.l3s.boilerpipe.filters.heuristics.ContentFusion
Creates a new ContentFusion instance.

D

data - Variable in class org.cyberneko.html.HTMLElements.ElementList
The data in the list.
data - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
The stack data.
DD - Static variable in class org.cyberneko.html.HTMLElements
 
de.l3s.boilerpipe - package de.l3s.boilerpipe
The Boilerpipe top-level package.
de.l3s.boilerpipe.conditions - package de.l3s.boilerpipe.conditions
 
de.l3s.boilerpipe.document - package de.l3s.boilerpipe.document
The classes in this package represent the simple Boilerpipe document model.
de.l3s.boilerpipe.estimators - package de.l3s.boilerpipe.estimators
 
de.l3s.boilerpipe.extractors - package de.l3s.boilerpipe.extractors
This package contains some standard extractors (i.e., completely piped BoilerpipeFilters)
de.l3s.boilerpipe.filters.debug - package de.l3s.boilerpipe.filters.debug
 
de.l3s.boilerpipe.filters.english - package de.l3s.boilerpipe.filters.english
The BoilerpipeFilters in this package have only been tested on English text.
de.l3s.boilerpipe.filters.heuristics - package de.l3s.boilerpipe.filters.heuristics
The BoilerpipeFilters in this package are pure heuristics.
de.l3s.boilerpipe.filters.simple - package de.l3s.boilerpipe.filters.simple
The BoilerpipeFilters in this package are straight-forward and probably not really specific to English.
de.l3s.boilerpipe.labels - package de.l3s.boilerpipe.labels
 
de.l3s.boilerpipe.sax - package de.l3s.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
de.l3s.boilerpipe.util - package de.l3s.boilerpipe.util
Some helper classes.
debugString() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns detailed debugging information about the contained TextBlocks.
DEFAULT_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors
Usually worse than ArticleExtractor, but simpler/no heuristics.
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
DEFAULT_INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
DefaultExtractor - Class in de.l3s.boilerpipe.extractors
A quite generic full-text extractor.
DefaultExtractor() - Constructor for class de.l3s.boilerpipe.extractors.DefaultExtractor
 
DefaultLabels - Class in de.l3s.boilerpipe.labels
Some pre-defined labels which can be used in conjunction with TextBlock.addLabel(String) and TextBlock.hasLabel(String).
DefaultTagActionMap - Class in de.l3s.boilerpipe.sax
Default TagActions.
DefaultTagActionMap() - Constructor for class de.l3s.boilerpipe.sax.DefaultTagActionMap
 
DEL - Static variable in class org.cyberneko.html.HTMLElements
 
DensityRulesClassifier - Class in de.l3s.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
DensityRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
DFN - Static variable in class org.cyberneko.html.HTMLElements
 
DIR - Static variable in class org.cyberneko.html.HTMLElements
 
DIV - Static variable in class org.cyberneko.html.HTMLElements
 
DL - Static variable in class org.cyberneko.html.HTMLElements
 
doctypeDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Doctype declaration.
DOCUMENT_FRAGMENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only.
DOCUMENT_FRAGMENT_DEPRECATED - Static variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only (deprecated).
DocumentTitleMatchClassifier - Class in de.l3s.boilerpipe.filters.heuristics
Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
DocumentTitleMatchClassifier(String) - Constructor for class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
DT - Static variable in class org.cyberneko.html.HTMLElements
 

E

element - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element.
ELEMENTS - Static variable in class org.cyberneko.html.HTMLElements
Element information as a contiguous list.
ELEMENTS_ARRAY - Static variable in class org.cyberneko.html.HTMLElements
Element information organized by first letter.
EM - Static variable in class org.cyberneko.html.HTMLElements
 
EMBED - Static variable in class org.cyberneko.html.HTMLElements
 
EMPTY - Static variable in class org.cyberneko.html.HTMLElements.Element
Empty element.
EMPTY_END - Static variable in class de.l3s.boilerpipe.document.TextBlock
 
EMPTY_START - Static variable in class de.l3s.boilerpipe.document.TextBlock
 
emptyAttributes() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns a set of empty attributes.
emptyElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Empty element.
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in interface de.l3s.boilerpipe.sax.TagAction
 
endCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End CDATA section.
endDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endDocument(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End document.
endElement(String, String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endElement(QName, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End element.
endGeneralEntity(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End entity.
endPrefixMapping(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endPrefixMapping(String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
End prefix mapping.
equals(Object) - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if the objects are equal.
ERROR_REPORTER - Static variable in class org.cyberneko.html.HTMLTagBalancer
Error reporter.
ExpandTitleToContentFilter - Class in de.l3s.boilerpipe.filters.heuristics
Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT.
ExpandTitleToContentFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
ExtractorBase - Class in de.l3s.boilerpipe.extractors
The base class of Extractors.
ExtractorBase() - Constructor for class de.l3s.boilerpipe.extractors.ExtractorBase
 

F

fAugmentations - Variable in class org.cyberneko.html.HTMLTagBalancer
Include infoset augmentations.
fDocumentFragment - Variable in class org.cyberneko.html.HTMLTagBalancer
Document fragment balancing only.
fDocumentHandler - Variable in class org.cyberneko.html.HTMLTagBalancer
The document handler.
fDocumentSource - Variable in class org.cyberneko.html.HTMLTagBalancer
The document source.
fElementStack - Variable in class org.cyberneko.html.HTMLTagBalancer
The element stack.
fErrorReporter - Variable in class org.cyberneko.html.HTMLTagBalancer
Error reporter.
fetch(URL) - Static method in class de.l3s.boilerpipe.sax.HTMLFetcher
Fetches the document at the given URL, using URLConnection.
FIELDSET - Static variable in class org.cyberneko.html.HTMLElements
 
fIgnoreOutsideContent - Variable in class org.cyberneko.html.HTMLTagBalancer
Ignore outside content.
fInlineStack - Variable in class org.cyberneko.html.HTMLTagBalancer
The inline stack.
flags - Variable in class org.cyberneko.html.HTMLElements.Element
Informational flags.
flushBlock() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
fNamesAttrs - Variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML attribute names.
fNamesElems - Variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML element names.
fNamespaces - Variable in class org.cyberneko.html.HTMLTagBalancer
Namespaces.
FONT - Static variable in class org.cyberneko.html.HTMLElements
 
fOpenedForm - Variable in class org.cyberneko.html.HTMLTagBalancer
True if a form is in the stack (allow to discard opening of nested forms)
FORM - Static variable in class org.cyberneko.html.HTMLElements
 
FRAGMENT_CONTEXT_STACK - Static variable in class org.cyberneko.html.HTMLTagBalancer
EXPERIMENTAL: may change in next release
Name of the property holding the stack of elements in which context a document fragment should be parsed.
FRAME - Static variable in class org.cyberneko.html.HTMLElements
 
FRAMESET - Static variable in class org.cyberneko.html.HTMLElements
 
fReportErrors - Variable in class org.cyberneko.html.HTMLTagBalancer
Report errors.
fSeenAnything - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen anything.
fSeenBodyElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen <body< element.
fSeenDoctype - Variable in class org.cyberneko.html.HTMLTagBalancer
True if root element has been seen.
fSeenHeadElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen <head< element.
fSeenRootElement - Variable in class org.cyberneko.html.HTMLTagBalancer
True if root element has been seen.
fSeenRootElementEnd - Variable in class org.cyberneko.html.HTMLTagBalancer
True if seen the end of the document element.

G

getAlt() - Method in class de.l3s.boilerpipe.document.Image
 
getArea() - Method in class de.l3s.boilerpipe.document.Image
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
getCharset() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
 
getContainedTextElements() - Method in class de.l3s.boilerpipe.document.TextBlock
Returns the containedTextElements BitSet, or null.
getContent() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextDocument's content.
getData() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
 
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
Returns the singleton instance for DeleteBlocksAfterContentFilter.
getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
getDocumentHandler() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the document handler.
getDocumentSource() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the document source.
getElement(short) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element code.
getElement(String) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element name.
getElement(String, HTMLElements.Element) - Static method in class org.cyberneko.html.HTMLElements
Returns the element information for the specified element name.
getElement(QName) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns an HTML element.
getElementDepth(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the depth of the open tag associated with the specified element name or -1 if no matching element is found.
getEmbedUrl() - Method in class de.l3s.boilerpipe.document.Video
 
getExtraStyleSheet() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Returns the extra stylesheet definition that will be inserted in the HEAD element.
getFeatureDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the default state for a feature.
getHeight() - Method in class de.l3s.boilerpipe.document.Image
 
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleExtractor
Returns the singleton instance for ArticleExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
Returns the singleton instance for ArticleSentencesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.CanolaExtractor
Returns the singleton instance for CanolaExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.DefaultExtractor
Returns the singleton instance for DefaultExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
Returns the singleton instance for LargestContentExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
Returns the singleton instance for NumWordsRulesExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
Returns the singleton instance for BlockFusionProcessor.
getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
Returns the singleton instance for BoilerplateBlockFilter.
getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class de.l3s.boilerpipe.sax.ImageExtractor
Returns the singleton instance of ImageExtractor.
getInstance() - Static method in class de.l3s.boilerpipe.sax.MediaExtractor
 
getLabels() - Method in class de.l3s.boilerpipe.document.TextBlock
Returns the labels associated to this TextBlock, or null if no such labels exist.
getLinkDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getNamesValue(String) - Static method in class org.cyberneko.html.HTMLTagBalancer
Converts HTML names string value to constant value.
getNumWords() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics
Returns the overall number of words in all blocks.
getNumWordsInAnchorText() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getOffsetBlocksEnd() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getOffsetBlocksStart() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getOriginUrl() - Method in class de.l3s.boilerpipe.document.Video
 
getParentDepth(HTMLElements.Element[], short) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the depth of the open tag associated with the specified element parent names or -1 if no matching element is found.
getPostHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted after any highlighted HTML block.
getPotentialTitles() - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
getPreHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted before any highlighted HTML block.
getPropertyDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
Returns the default state for a property.
getRecognizedFeatures() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns recognized features.
getRecognizedProperties() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns recognized properties.
getSrc() - Method in class de.l3s.boilerpipe.document.Image
gets the src attribut from the image tag in the html source.
getTagLevel() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getTagWhitelist() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
 
getText(String) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code given as a String.
getText(InputSource) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given InputSource.
getText(Reader) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
Extracts text from the given TextDocument object.
getText() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getText(boolean, boolean) - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextDocument's content, non-content or both
getText(String) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code given as a String.
getText(InputSource) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given InputSource.
getText(URL) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given URL.
getText(Reader) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given Reader.
getText(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
Extracts text from the given TextDocument object.
getTextBlocks() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the TextBlocks of this document.
getTextDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
 
getTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeInput
Returns (somehow) a TextDocument.
getTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using a default HTML parser.
getTextDocument(BoilerpipeHTMLParser) - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using the given HTML parser.
getTitle() - Method in class de.l3s.boilerpipe.document.TextDocument
Returns the "main" title for this document, or null if no such title has ben set.
getTitle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
getWidth() - Method in class de.l3s.boilerpipe.document.Image
 

H

H1 - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
H1 - Static variable in class org.cyberneko.html.HTMLElements
 
H2 - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
H2 - Static variable in class org.cyberneko.html.HTMLElements
 
H3 - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
H3 - Static variable in class org.cyberneko.html.HTMLElements
 
H4 - Static variable in class org.cyberneko.html.HTMLElements
 
H5 - Static variable in class org.cyberneko.html.HTMLElements
 
H6 - Static variable in class org.cyberneko.html.HTMLElements
 
hashCode() - Method in class org.cyberneko.html.HTMLElements.Element
Returns a hash code for this object.
hasLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock
Checks whether this TextBlock has the given label.
HEAD - Static variable in class org.cyberneko.html.HTMLElements
 
HEADING - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
HR - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
HR - Static variable in class org.cyberneko.html.HTMLElements
 
HTML - Static variable in class org.cyberneko.html.HTMLElements
 
HTMLDocument - Class in de.l3s.boilerpipe.sax
HTMLDocument(byte[], Charset) - Constructor for class de.l3s.boilerpipe.sax.HTMLDocument
 
HTMLDocument(String) - Constructor for class de.l3s.boilerpipe.sax.HTMLDocument
 
HTMLElements - Class in org.cyberneko.html
Collection of HTML element information.
HTMLElements() - Constructor for class org.cyberneko.html.HTMLElements
 
HTMLElements.Element - Class in org.cyberneko.html
Element information.
HTMLElements.Element(short, String, int, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
HTMLElements.Element(short, String, int, short, short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
HTMLElements.Element(short, String, int, short[], short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
HTMLElements.Element(short, String, int, short[], short, short[]) - Constructor for class org.cyberneko.html.HTMLElements.Element
Constructs an element object.
HTMLElements.ElementList - Class in org.cyberneko.html
Unsynchronized list of elements.
HTMLElements.ElementList() - Constructor for class org.cyberneko.html.HTMLElements.ElementList
 
HTMLFetcher - Class in de.l3s.boilerpipe.sax
A very simple HTTP/HTML fetcher, really just for demo purposes.
HTMLHighlighter - Class in de.l3s.boilerpipe.sax
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
HTMLTagBalancer - Class in org.cyberneko.html
 
HTMLTagBalancer() - Constructor for class org.cyberneko.html.HTMLTagBalancer
 
HTMLTagBalancer.Info - Class in org.cyberneko.html
Element info for each start element.
HTMLTagBalancer.Info(HTMLElements.Element, QName) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
Creates an element information object.
HTMLTagBalancer.Info(HTMLElements.Element, QName, XMLAttributes) - Constructor for class org.cyberneko.html.HTMLTagBalancer.Info
Creates an element information object.
HTMLTagBalancer.InfoStack - Class in org.cyberneko.html
Unsynchronized stack of element information.
HTMLTagBalancer.InfoStack() - Constructor for class org.cyberneko.html.HTMLTagBalancer.InfoStack
 

I

I - Static variable in class org.cyberneko.html.HTMLElements
 
IFRAME - Static variable in class org.cyberneko.html.HTMLElements
 
ignorableWhitespace(char[], int, int) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ignorableWhitespace(XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Ignorable whitespace.
IGNORE_OUTSIDE_CONTENT - Static variable in class org.cyberneko.html.HTMLTagBalancer
Ignore outside content.
IgnoreBlocksAfterContentFilter - Class in de.l3s.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT.
IgnoreBlocksAfterContentFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
IgnoreBlocksAfterContentFromEndFilter - Class in de.l3s.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT, and after any content block.
ILAYER - Static variable in class org.cyberneko.html.HTMLElements
 
Image - Class in de.l3s.boilerpipe.document
Represents an Image resource that is contained in the document.
Image(String, String, String, String) - Constructor for class de.l3s.boilerpipe.document.Image
 
ImageExtractor - Class in de.l3s.boilerpipe.sax
Extracts the images that are enclosed by extracted content.
IMG - Static variable in class org.cyberneko.html.HTMLElements
 
INDICATES_END_OF_TEXT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
INLINE - Static variable in class org.cyberneko.html.HTMLElements.Element
Inline element.
INPUT - Static variable in class org.cyberneko.html.HTMLElements
 
InputSourceable - Interface in de.l3s.boilerpipe.sax
An InputSourceable can return an arbitrary number of new InputSources for a given document.
INS - Static variable in class org.cyberneko.html.HTMLElements
 
INSTANCE - Static variable in class de.l3s.boilerpipe.estimators.SimpleEstimator
Returns the singleton instance of SimpleEstimator
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.CanolaExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.DefaultExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.LargestContentExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ContentFusion
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.LabelFusion
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.ListAtEndFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.InvertedFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
INSTANCE - Static variable in class de.l3s.boilerpipe.sax.DefaultTagActionMap
 
INSTANCE - Static variable in class de.l3s.boilerpipe.sax.ImageExtractor
 
INSTANCE - Static variable in class de.l3s.boilerpipe.sax.MediaExtractor
 
INSTANCE_200 - Static variable in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS - Static variable in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_KEEP_TITLE - Static variable in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE_PRE - Static variable in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
INSTANCE_TEXT - Static variable in class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter
 
InvertedFilter - Class in de.l3s.boilerpipe.filters.simple
Reverts the "isContent" flag for all TextBlocks
isBlock() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is a block element.
isContainer() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is a container element.
isContent() - Method in class de.l3s.boilerpipe.document.TextBlock
 
isEmpty() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is an empty element.
ISINDEX - Static variable in class org.cyberneko.html.HTMLElements
 
isInline() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is an inline element.
isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class de.l3s.boilerpipe.estimators.SimpleEstimator
Given the statistics of the document before and after applying the BoilerpipeExtractor, can we regard the extraction quality (too) low? Works well with DefaultExtractor, ArticleExtractor and others.
isOutputHighlightOnly() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
If true, only HTML enclosed within highlighted content will be returned
isParent(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLElements.Element
Indicates if the provided element is an accepted parent of current element
isSpecial() - Method in class org.cyberneko.html.HTMLElements.Element
Returns true if this element is special -- if its content should be parsed ignoring markup.

K

KBD - Static variable in class org.cyberneko.html.HTMLElements
 
KEEP_EVERYTHING_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors
Dummy Extractor; should return the input text.
KeepEverythingExtractor - Class in de.l3s.boilerpipe.extractors
Marks everything as content.
KeepEverythingWithMinKWordsExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
KeepEverythingWithMinKWordsExtractor(int) - Constructor for class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
KeepLargestBlockFilter - Class in de.l3s.boilerpipe.filters.heuristics
Keeps the largest TextBlock only (by the number of words).
KeepLargestBlockFilter(boolean, int) - Constructor for class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
KeepLargestFulltextBlockFilter - Class in de.l3s.boilerpipe.filters.english
Keeps the largest TextBlock only (by the number of words).
KeepLargestFulltextBlockFilter() - Constructor for class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
KEYGEN - Static variable in class org.cyberneko.html.HTMLElements
 

L

LABEL - Static variable in class org.cyberneko.html.HTMLElements
 
LabelAction - Class in de.l3s.boilerpipe.labels
Helps adding labels to TextBlocks.
LabelAction(String...) - Constructor for class de.l3s.boilerpipe.labels.LabelAction
 
LabelFusion - Class in de.l3s.boilerpipe.filters.heuristics
Fuses adjacent blocks if their labels are equal.
labels - Variable in class de.l3s.boilerpipe.labels.LabelAction
 
LabelToBoilerplateFilter - Class in de.l3s.boilerpipe.filters.simple
Marks all blocks that contain a given label as "boilerplate".
LabelToBoilerplateFilter(String...) - Constructor for class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
LabelToContentFilter - Class in de.l3s.boilerpipe.filters.simple
Marks all blocks that contain a given label as "content".
LabelToContentFilter(String...) - Constructor for class de.l3s.boilerpipe.filters.simple.LabelToContentFilter
 
LargeBlockSameTagLevelToContentFilter - Class in de.l3s.boilerpipe.filters.heuristics
Marks all blocks as content that: are on the same tag-level as very likely main content (usually the level of the largest block) have a significant number of words, currently: at least 100
LARGEST_CONTENT_EXTRACTOR - Static variable in class de.l3s.boilerpipe.extractors.CommonExtractors
Like DefaultExtractor, but keeps the largest text block only.
LargestContentExtractor - Class in de.l3s.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
LAYER - Static variable in class org.cyberneko.html.HTMLElements
 
LEGEND - Static variable in class org.cyberneko.html.HTMLElements
 
LI - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
LI - Static variable in class org.cyberneko.html.HTMLElements
 
LINK - Static variable in class org.cyberneko.html.HTMLElements
 
ListAtEndFilter - Class in de.l3s.boilerpipe.filters.heuristics
Marks nested list-item blocks after the end of the main content.
LISTING - Static variable in class org.cyberneko.html.HTMLElements
 

M

MAP - Static variable in class org.cyberneko.html.HTMLElements
 
MarkEverythingBoilerplateFilter - Class in de.l3s.boilerpipe.filters.simple
Marks all blocks as boilerplate.
MarkEverythingContentFilter - Class in de.l3s.boilerpipe.filters.simple
Marks all blocks as content.
MARKUP_PREFIX - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
MarkupTagAction - Class in de.l3s.boilerpipe.sax
Assigns labels for element CSS classes and ids to the corresponding TextBlock.
MarkupTagAction(boolean) - Constructor for class de.l3s.boilerpipe.sax.MarkupTagAction
 
MARQUEE - Static variable in class org.cyberneko.html.HTMLElements
 
MAX_DISTANCE_1 - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
Media - Class in de.l3s.boilerpipe.document
Media class
Media() - Constructor for class de.l3s.boilerpipe.document.Media
 
MediaExtractor - Class in de.l3s.boilerpipe.sax
Extracts youtube and vimeo videos that are enclosed by extracted content.
MediaExtractor() - Constructor for class de.l3s.boilerpipe.sax.MediaExtractor
 
meetsCondition(TextBlock) - Method in interface de.l3s.boilerpipe.conditions.TextBlockCondition
Returns true iff the given TextBlock tb meets the defined condition.
MENU - Static variable in class org.cyberneko.html.HTMLElements
 
mergeNext(TextBlock) - Method in class de.l3s.boilerpipe.document.TextBlock
 
META - Static variable in class org.cyberneko.html.HTMLElements
 
MIGHT_BE_CONTENT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
MinClauseWordsFilter - Class in de.l3s.boilerpipe.filters.simple
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
MinClauseWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinClauseWordsFilter(int, boolean) - Constructor for class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinFulltextWordsFilter - Class in de.l3s.boilerpipe.filters.english
Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)).
MinFulltextWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
MinWordsFilter - Class in de.l3s.boilerpipe.filters.simple
Keeps only those content blocks which contain at least k words.
MinWordsFilter(int) - Constructor for class de.l3s.boilerpipe.filters.simple.MinWordsFilter
 
modifyName(String, short) - Static method in class org.cyberneko.html.HTMLTagBalancer
Modifies the given name based on the specified mode.
MULTICOL - Static variable in class org.cyberneko.html.HTMLElements
 

N

name - Variable in class org.cyberneko.html.HTMLElements.Element
The element name.
NAMES_ATTRS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML attribute names: { "upper", "lower", "default" }.
NAMES_ELEMS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Modify HTML element names: { "upper", "lower", "default" }.
NAMES_LOWERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Lowercase HTML names.
NAMES_MATCH - Static variable in class org.cyberneko.html.HTMLTagBalancer
Match HTML element names.
NAMES_NO_CHANGE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Don't modify HTML names.
NAMES_UPPERCASE - Static variable in class org.cyberneko.html.HTMLTagBalancer
Uppercase HTML names.
NAMESPACES - Static variable in class org.cyberneko.html.HTMLTagBalancer
Namespaces.
newExtractingInstance() - Static method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
newHighlightingInstance() - Static method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
NEXTID - Static variable in class org.cyberneko.html.HTMLElements
 
NO_SUCH_ELEMENT - Static variable in class org.cyberneko.html.HTMLElements
No such element.
NOBR - Static variable in class org.cyberneko.html.HTMLElements
 
NOEMBED - Static variable in class org.cyberneko.html.HTMLElements
 
NOFRAMES - Static variable in class org.cyberneko.html.HTMLElements
 
NOLAYER - Static variable in class org.cyberneko.html.HTMLElements
 
NOSCRIPT - Static variable in class org.cyberneko.html.HTMLElements
 
NumWordsRulesClassifier - Class in de.l3s.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
NumWordsRulesClassifier() - Constructor for class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
NumWordsRulesExtractor - Class in de.l3s.boilerpipe.extractors
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
NumWordsRulesExtractor() - Constructor for class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 

O

OBJECT - Static variable in class org.cyberneko.html.HTMLElements
 
OL - Static variable in class org.cyberneko.html.HTMLElements
 
OPTGROUP - Static variable in class org.cyberneko.html.HTMLElements
 
OPTION - Static variable in class org.cyberneko.html.HTMLElements
 
org.cyberneko.html - package org.cyberneko.html
 

P

P - Static variable in class org.cyberneko.html.HTMLElements
 
PARAM - Static variable in class org.cyberneko.html.HTMLElements
 
parent - Variable in class org.cyberneko.html.HTMLElements.Element
Parent elements.
parentCodes - Variable in class org.cyberneko.html.HTMLElements.Element
Parent elements.
peek() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Peeks at the top of the stack.
PLAINTEXT - Static variable in class org.cyberneko.html.HTMLElements
 
pop() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Pops the top item off of the stack.
PRE - Static variable in class org.cyberneko.html.HTMLElements
 
PrintDebugFilter - Class in de.l3s.boilerpipe.filters.debug
Prints debug information about the current state of the TextDocument.
PrintDebugFilter(PrintWriter) - Constructor for class de.l3s.boilerpipe.filters.debug.PrintDebugFilter
Creates a new instance of PrintDebugFilter.
process(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeFilter
Processes the given document doc.
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.CanolaExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.DefaultExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.debug.PrintDebugFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ContentFusion
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.LabelFusion
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.ListAtEndFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.InvertedFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.LabelToContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.MinWordsFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
process(TextDocument) - Method in class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter
 
process(TextDocument, String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as an InputSource).
process(URL, BoilerpipeExtractor) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
process(TextDocument, String) - Method in class de.l3s.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class de.l3s.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as an InputSource).
process(URL, BoilerpipeExtractor) - Method in class de.l3s.boilerpipe.sax.ImageExtractor
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
process(TextDocument, String) - Method in class de.l3s.boilerpipe.sax.MediaExtractor
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class de.l3s.boilerpipe.sax.MediaExtractor
Processes the given TextDocument and the original HTML text (as an InputSource).
process(URL, BoilerpipeExtractor) - Method in class de.l3s.boilerpipe.sax.MediaExtractor
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
process(String, BoilerpipeExtractor) - Method in class de.l3s.boilerpipe.sax.MediaExtractor
parses the media (picture, video) out of doc
processingInstruction(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
processingInstruction(String, XMLString, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Processing instruction.
push(HTMLTagBalancer.Info) - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Pushes element information onto the stack.

Q

Q - Static variable in class org.cyberneko.html.HTMLElements
 
qname - Variable in class org.cyberneko.html.HTMLTagBalancer.Info
The element qualified name.

R

RB - Static variable in class org.cyberneko.html.HTMLElements
 
RBC - Static variable in class org.cyberneko.html.HTMLElements
 
recycle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Recycles this instance.
removeLabel(String) - Method in class de.l3s.boilerpipe.document.TextBlock
 
REPORT_ERRORS - Static variable in class org.cyberneko.html.HTMLTagBalancer
Report errors.
reset(XMLComponentManager) - Method in class org.cyberneko.html.HTMLTagBalancer
Resets the component.
RP - Static variable in class org.cyberneko.html.HTMLElements
 
RT - Static variable in class org.cyberneko.html.HTMLElements
 
RTC - Static variable in class org.cyberneko.html.HTMLElements
 
RUBY - Static variable in class org.cyberneko.html.HTMLElements
 

S

S - Static variable in class org.cyberneko.html.HTMLElements
 
SAMP - Static variable in class org.cyberneko.html.HTMLElements
 
SCRIPT - Static variable in class org.cyberneko.html.HTMLElements
 
SELECT - Static variable in class org.cyberneko.html.HTMLElements
 
setContentHandler(BoilerpipeHTMLContentHandler) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
 
setContentHandler(ContentHandler) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
 
setDocumentHandler(XMLDocumentHandler) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets the document handler.
setDocumentLocator(Locator) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
setDocumentSource(XMLDocumentSource) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets the document source.
setExtraStyleSheet(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Sets the extra stylesheet definition that will be inserted in the HEAD element.
setFeature(String, boolean) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets a feature.
setIsContent(boolean) - Method in class de.l3s.boilerpipe.document.TextBlock
 
setOutputHighlightOnly(boolean) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
setPostHighlight(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted after any highlighted HTML block.
setPreHighlight(String) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted prior to any highlighted HTML block.
setProperty(String, Object) - Method in class org.cyberneko.html.HTMLTagBalancer
Sets a property.
setTagAction(String, TagAction) - Method in class de.l3s.boilerpipe.sax.TagActionMap
Sets a particular TagAction for a given tag.
setTagLevel(int) - Method in class de.l3s.boilerpipe.document.TextBlock
 
setTagWhitelist(Map<String, Set<String>>) - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
 
setTitle(String) - Method in class de.l3s.boilerpipe.document.TextDocument
Updates the "main" title for this document.
setTitle(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SimpleBlockFusionProcessor - Class in de.l3s.boilerpipe.filters.heuristics
Merges two subsequent blocks if their text densities are equal.
SimpleBlockFusionProcessor() - Constructor for class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
SimpleEstimator - Class in de.l3s.boilerpipe.estimators
Estimates the "goodness" of a BoilerpipeExtractor on a given document.
size - Variable in class org.cyberneko.html.HTMLElements.ElementList
The size of the list.
skippedEntity(String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SMALL - Static variable in class org.cyberneko.html.HTMLElements
 
SOUND - Static variable in class org.cyberneko.html.HTMLElements
 
SPACER - Static variable in class org.cyberneko.html.HTMLElements
 
SPAN - Static variable in class org.cyberneko.html.HTMLElements
 
SPECIAL - Static variable in class org.cyberneko.html.HTMLElements.Element
Special element.
SplitParagraphBlocksFilter - Class in de.l3s.boilerpipe.filters.simple
Splits TextBlocks at paragraph boundaries.
SplitParagraphBlocksFilter() - Constructor for class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.Chained
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.MarkupTagAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface de.l3s.boilerpipe.sax.TagAction
 
startCDATA(Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start CDATA section.
startDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startDocument(XMLLocator, String, NamespaceContext, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start document.
startDocument(XMLLocator, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start document.
startElement(String, String, String, Attributes) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startElement(QName, XMLAttributes, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start element.
startGeneralEntity(String, XMLResourceIdentifier, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start entity.
startPrefixMapping(String, String) - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startPrefixMapping(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Start prefix mapping.
STRICTLY_NOT_CONTENT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
STRIKE - Static variable in class org.cyberneko.html.HTMLElements
 
STRONG - Static variable in class org.cyberneko.html.HTMLElements
 
STYLE - Static variable in class org.cyberneko.html.HTMLElements
 
SUB - Static variable in class org.cyberneko.html.HTMLElements
 
SUP - Static variable in class org.cyberneko.html.HTMLElements
 
SurroundingToContentFilter - Class in de.l3s.boilerpipe.filters.simple
 
SurroundingToContentFilter(TextBlockCondition) - Constructor for class de.l3s.boilerpipe.filters.simple.SurroundingToContentFilter
 
SYNTHESIZED_ITEM - Static variable in class org.cyberneko.html.HTMLTagBalancer
Synthesized event info item.
synthesizedAugs() - Method in class org.cyberneko.html.HTMLTagBalancer
Returns an augmentations object with a synthesized item added.

T

TA_ANCHOR_TEXT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Marks this tag as "anchor" (this should usually only be set for the <A> tag).
TA_BLOCK_LEVEL - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
TA_BODY - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Marks this tag the body element (this should usually only be set for the <BODY> tag).
TA_FONT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Special TagAction for the <FONT> tag, which keeps track of the absolute and relative font size.
TA_IGNORABLE_ELEMENT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Marks this tag as "ignorable", i.e.
TA_INLINE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Deprecated.
TA_INLINE_NO_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
TA_INLINE_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
TABLE - Static variable in class org.cyberneko.html.HTMLElements
 
TagAction - Interface in de.l3s.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
TagActionMap - Class in de.l3s.boilerpipe.sax
Base class for definition a set of TagActions that are to be used for the HTML parsing process.
TagActionMap() - Constructor for class de.l3s.boilerpipe.sax.TagActionMap
 
tagBalancingListener - Variable in class org.cyberneko.html.HTMLTagBalancer
 
TBODY - Static variable in class org.cyberneko.html.HTMLElements
 
TD - Static variable in class org.cyberneko.html.HTMLElements
 
TerminatingBlocksFinder - Class in de.l3s.boilerpipe.filters.english
Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT.
TerminatingBlocksFinder() - Constructor for class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
 
TEXTAREA - Static variable in class org.cyberneko.html.HTMLElements
 
TextBlock - Class in de.l3s.boilerpipe.document
Describes a block of text.
TextBlock(String) - Constructor for class de.l3s.boilerpipe.document.TextBlock
 
TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class de.l3s.boilerpipe.document.TextBlock
 
TextBlockCondition - Interface in de.l3s.boilerpipe.conditions
Evaluates whether a given TextBlock meets a certain condition.
textDecl(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
Text declaration.
TextDocument - Class in de.l3s.boilerpipe.document
A text document, consisting of one or more TextBlocks.
TextDocument(List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks, and no title.
TextDocument(String, List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks and given title.
TextDocumentStatistics - Class in de.l3s.boilerpipe.document
Provides shallow statistics on a given TextDocument
TextDocumentStatistics(TextDocument, boolean) - Constructor for class de.l3s.boilerpipe.document.TextDocumentStatistics
Computes statistics on a given TextDocument.
TFOOT - Static variable in class org.cyberneko.html.HTMLElements
 
TH - Static variable in class org.cyberneko.html.HTMLElements
 
THEAD - Static variable in class org.cyberneko.html.HTMLElements
 
TITLE - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
TITLE - Static variable in class org.cyberneko.html.HTMLElements
 
toInputSource() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
 
toInputSource() - Method in interface de.l3s.boilerpipe.sax.InputSourceable
 
tokenize(CharSequence) - Static method in class de.l3s.boilerpipe.util.UnicodeTokenizer
Tokenizes the text and returns an array of tokens.
top - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
The top of the stack.
toString() - Method in class de.l3s.boilerpipe.document.Image
 
toString() - Method in class de.l3s.boilerpipe.document.TextBlock
 
toString() - Method in class de.l3s.boilerpipe.document.Video
 
toString() - Method in class de.l3s.boilerpipe.labels.LabelAction
 
toString() - Method in class org.cyberneko.html.HTMLElements.Element
Provides a simple representation to make debugging easier
toString() - Method in class org.cyberneko.html.HTMLTagBalancer.Info
Simple representation to make debugging easier
toString() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
Simple representation to make debugging easier
toTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeDocumentSource
 
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
Returns a TextDocument containing the extracted TextBlock s.
toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
Returns a TextDocument containing the extracted TextBlock s.
TR - Static variable in class org.cyberneko.html.HTMLElements
 
TrailingHeadlineToBoilerplateFilter - Class in de.l3s.boilerpipe.filters.heuristics
Marks trailing headlines (TextBlocks that have the label DefaultLabels.HEADING) as boilerplate.
TrailingHeadlineToBoilerplateFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
TT - Static variable in class org.cyberneko.html.HTMLElements
 

U

U - Static variable in class org.cyberneko.html.HTMLElements
 
UL - Static variable in class org.cyberneko.html.HTMLElements
 
UnicodeTokenizer - Class in de.l3s.boilerpipe.util
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
UnicodeTokenizer() - Constructor for class de.l3s.boilerpipe.util.UnicodeTokenizer
 
UNKNOWN - Static variable in class org.cyberneko.html.HTMLElements
 

V

VAR - Static variable in class org.cyberneko.html.HTMLElements
 
VERY_LIKELY_CONTENT - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
 
Video - Class in de.l3s.boilerpipe.document
Represents an video resource which is contained in the document.
Video(String, String) - Constructor for class de.l3s.boilerpipe.document.Video
 
VimeoVideo - Class in de.l3s.boilerpipe.document
Represents an Vimeo video resource that is contained in the document.
VimeoVideo(String, String) - Constructor for class de.l3s.boilerpipe.document.VimeoVideo
 

W

WBR - Static variable in class org.cyberneko.html.HTMLElements
 

X

XML - Static variable in class org.cyberneko.html.HTMLElements
 
xmlDecl(String, String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
XML declaration.
XMP - Static variable in class org.cyberneko.html.HTMLElements
 

Y

YoutubeVideo - Class in de.l3s.boilerpipe.document
Represents an Youtube video resource that is contained in the document.
YoutubeVideo(String, String) - Constructor for class de.l3s.boilerpipe.document.YoutubeVideo
 
A B C D E F G H I K L M N O P Q R S T U V W X Y 

Copyright © 2013-2014. All Rights Reserved.