- getAlt() - Method in class de.l3s.boilerpipe.document.Image
-
- getArea() - Method in class de.l3s.boilerpipe.document.Image
-
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
- getCharset() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- getContainedTextElements() - Method in class de.l3s.boilerpipe.document.TextBlock
-
Returns the containedTextElements BitSet, or null.
- getContent() - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getData() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
-
Returns the singleton instance for DeleteBlocksAfterContentFilter.
- getDefaultInstance() - Static method in class de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
-
- getDocumentHandler() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the document handler.
- getDocumentSource() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the document source.
- getElement(short) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element code.
- getElement(String) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element name.
- getElement(String, HTMLElements.Element) - Static method in class org.cyberneko.html.HTMLElements
-
Returns the element information for the specified element name.
- getElement(QName) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns an HTML element.
- getElementDepth(HTMLElements.Element) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the depth of the open tag associated with the specified
element name or -1 if no matching element is found.
- getEmbedUrl() - Method in class de.l3s.boilerpipe.document.Video
-
- getExtraStyleSheet() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the extra stylesheet definition that will be inserted in the HEAD
element.
- getFeatureDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the default state for a feature.
- getHeight() - Method in class de.l3s.boilerpipe.document.Image
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.ArticleSentencesExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.CanolaExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.DefaultExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.LargestContentExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.extractors.NumWordsRulesExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.filters.debug.PrintDebugFilter
-
Returns the default instance for
PrintDebugFilter,
which dumps debug information to
System.out
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.DensityRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
-
Returns the singleton instance for BlockFusionProcessor.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter
-
Returns the singleton instance for BoilerplateBlockFilter.
- getInstance() - Static method in class de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class de.l3s.boilerpipe.sax.ImageExtractor
-
- getInstance() - Static method in class de.l3s.boilerpipe.sax.MediaExtractor
-
- getLabels() - Method in class de.l3s.boilerpipe.document.TextBlock
-
Returns the labels associated to this TextBlock, or null if no such labels
exist.
- getLinkDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getNamesValue(String) - Static method in class org.cyberneko.html.HTMLTagBalancer
-
Converts HTML names string value to constant value.
- getNumWords() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getNumWords() - Method in class de.l3s.boilerpipe.document.TextDocumentStatistics
-
Returns the overall number of words in all blocks.
- getNumWordsInAnchorText() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getOffsetBlocksEnd() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getOffsetBlocksStart() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getOriginUrl() - Method in class de.l3s.boilerpipe.document.Video
-
- getParentDepth(HTMLElements.Element[], short) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the depth of the open tag associated with the specified
element parent names or -1 if no matching element is found.
- getPostHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted after any highlighted HTML
block.
- getPotentialTitles() - Method in class de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
-
- getPreHighlight() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted before any highlighted HTML
block.
- getPropertyDefault(String) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns the default state for a property.
- getRecognizedFeatures() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns recognized features.
- getRecognizedProperties() - Method in class org.cyberneko.html.HTMLTagBalancer
-
Returns recognized properties.
- getSrc() - Method in class de.l3s.boilerpipe.document.Image
-
gets the src attribut from the image tag in the html source.
- getTagLevel() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getTagWhitelist() - Method in class de.l3s.boilerpipe.sax.HTMLHighlighter
-
- getText(String) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code given as a String.
- getText(InputSource) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
InputSource.
- getText(Reader) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
Reader.
- getText(TextDocument) - Method in interface de.l3s.boilerpipe.BoilerpipeExtractor
-
- getText() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getText(boolean, boolean) - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getText(String) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code given as a String.
- getText(InputSource) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
InputSource.
- getText(URL) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
URL.
- getText(Reader) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
Reader.
- getText(TextDocument) - Method in class de.l3s.boilerpipe.extractors.ExtractorBase
-
- getTextBlocks() - Method in class de.l3s.boilerpipe.document.TextDocument
-
- getTextDensity() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- getTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeInput
-
- getTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
-
- getTextDocument(BoilerpipeHTMLParser) - Method in class de.l3s.boilerpipe.sax.BoilerpipeSAXInput
-
- getTitle() - Method in class de.l3s.boilerpipe.document.TextDocument
-
Returns the "main" title for this document, or null if no
such title has ben set.
- getTitle() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
- getWidth() - Method in class de.l3s.boilerpipe.document.Image
-
- TA_ANCHOR_TEXT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag as "anchor" (this should usually only be set for the <A> tag).
- TA_BLOCK_LEVEL - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
- TA_BODY - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag the body element (this should usually only be set for the <BODY> tag).
- TA_FONT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Special TagAction for the <FONT> tag, which keeps track of the
absolute and relative font size.
- TA_IGNORABLE_ELEMENT - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag as "ignorable", i.e.
- TA_INLINE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
- TA_INLINE_NO_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
- TA_INLINE_WHITESPACE - Static variable in class de.l3s.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
- TABLE - Static variable in class org.cyberneko.html.HTMLElements
-
- TagAction - Interface in de.l3s.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs
during HTML parsing.
- TagActionMap - Class in de.l3s.boilerpipe.sax
-
Base class for definition a set of
TagActions that are to be used for the
HTML parsing process.
- TagActionMap() - Constructor for class de.l3s.boilerpipe.sax.TagActionMap
-
- tagBalancingListener - Variable in class org.cyberneko.html.HTMLTagBalancer
-
- TBODY - Static variable in class org.cyberneko.html.HTMLElements
-
- TD - Static variable in class org.cyberneko.html.HTMLElements
-
- TerminatingBlocksFinder - Class in de.l3s.boilerpipe.filters.english
-
- TerminatingBlocksFinder() - Constructor for class de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder
-
- TEXTAREA - Static variable in class org.cyberneko.html.HTMLElements
-
- TextBlock - Class in de.l3s.boilerpipe.document
-
Describes a block of text.
- TextBlock(String) - Constructor for class de.l3s.boilerpipe.document.TextBlock
-
- TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class de.l3s.boilerpipe.document.TextBlock
-
- TextBlockCondition - Interface in de.l3s.boilerpipe.conditions
-
Evaluates whether a given
TextBlock meets a certain condition.
- textDecl(String, String, Augmentations) - Method in class org.cyberneko.html.HTMLTagBalancer
-
Text declaration.
- TextDocument - Class in de.l3s.boilerpipe.document
-
A text document, consisting of one or more
TextBlocks.
- TextDocument(List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
-
- TextDocument(String, List<TextBlock>) - Constructor for class de.l3s.boilerpipe.document.TextDocument
-
- TextDocumentStatistics - Class in de.l3s.boilerpipe.document
-
Provides shallow statistics on a given TextDocument
- TextDocumentStatistics(TextDocument, boolean) - Constructor for class de.l3s.boilerpipe.document.TextDocumentStatistics
-
- TFOOT - Static variable in class org.cyberneko.html.HTMLElements
-
- TH - Static variable in class org.cyberneko.html.HTMLElements
-
- THEAD - Static variable in class org.cyberneko.html.HTMLElements
-
- TITLE - Static variable in class de.l3s.boilerpipe.labels.DefaultLabels
-
- TITLE - Static variable in class org.cyberneko.html.HTMLElements
-
- toInputSource() - Method in class de.l3s.boilerpipe.sax.HTMLDocument
-
- toInputSource() - Method in interface de.l3s.boilerpipe.sax.InputSourceable
-
- tokenize(CharSequence) - Static method in class de.l3s.boilerpipe.util.UnicodeTokenizer
-
Tokenizes the text and returns an array of tokens.
- top - Variable in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
The top of the stack.
- toString() - Method in class de.l3s.boilerpipe.document.Image
-
- toString() - Method in class de.l3s.boilerpipe.document.TextBlock
-
- toString() - Method in class de.l3s.boilerpipe.document.Video
-
- toString() - Method in class de.l3s.boilerpipe.labels.LabelAction
-
- toString() - Method in class org.cyberneko.html.HTMLElements.Element
-
Provides a simple representation to make debugging easier
- toString() - Method in class org.cyberneko.html.HTMLTagBalancer.Info
-
Simple representation to make debugging easier
- toString() - Method in class org.cyberneko.html.HTMLTagBalancer.InfoStack
-
Simple representation to make debugging easier
- toTextDocument() - Method in interface de.l3s.boilerpipe.BoilerpipeDocumentSource
-
- toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
- toTextDocument() - Method in class de.l3s.boilerpipe.sax.BoilerpipeHTMLParser
-
- TR - Static variable in class org.cyberneko.html.HTMLElements
-
- TrailingHeadlineToBoilerplateFilter - Class in de.l3s.boilerpipe.filters.heuristics
-
- TrailingHeadlineToBoilerplateFilter() - Constructor for class de.l3s.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
-
- TT - Static variable in class org.cyberneko.html.HTMLElements
-