Package it.unimi.dsi.parser.callback
Class TextExtractor
- java.lang.Object
-
- it.unimi.dsi.parser.callback.DefaultCallback
-
- it.unimi.dsi.parser.callback.TextExtractor
-
- All Implemented Interfaces:
Callback
public class TextExtractor extends DefaultCallback
-
-
Field Summary
Fields Modifier and Type Field Description MutableStringtextThe text resulting from the parsing process.MutableStringtitleThe title resulting from the parsing process.-
Fields inherited from interface it.unimi.dsi.parser.callback.Callback
EMPTY_CALLBACK_ARRAY
-
-
Constructor Summary
Constructors Constructor Description TextExtractor()
-
Method Summary
Modifier and Type Method Description booleancharacters(char[] characters, int offset, int length, boolean flowBroken)Receive notification of character data inside an element.voidconfigure(BulletParser parser)Configure the parser to parse text.booleanendElement(Element element)Receive notification of the end of an element.voidstartDocument()Receive notification of the beginning of the document.booleanstartElement(Element element, Map<Attribute,MutableString> attrMapUnused)Receive notification of the start of an element.-
Methods inherited from class it.unimi.dsi.parser.callback.DefaultCallback
cdata, endDocument, getInstance
-
-
-
-
Field Detail
-
text
public final MutableString text
The text resulting from the parsing process.
-
title
public final MutableString title
The title resulting from the parsing process.
-
-
Method Detail
-
configure
public void configure(BulletParser parser)
Configure the parser to parse text.- Specified by:
configurein interfaceCallback- Overrides:
configurein classDefaultCallback
-
startDocument
public void startDocument()
Description copied from interface:CallbackReceive notification of the beginning of the document.The callback must use this method to reset its internal state so that it can be resued. It must be safe to invoke this method several times.
- Specified by:
startDocumentin interfaceCallback- Overrides:
startDocumentin classDefaultCallback
-
characters
public boolean characters(char[] characters, int offset, int length, boolean flowBroken)Description copied from interface:CallbackReceive notification of character data inside an element.You must not write into
text, as it could be passed around to many callbacks.flowBrokenwill be true iff the flow was broken beforetext. This feature makes it possible to extract quickly the text in a document without looking at the elements.- Specified by:
charactersin interfaceCallback- Overrides:
charactersin classDefaultCallback- Parameters:
characters- an array containing the character data.offset- the start position in the array.length- the number of characters to read from the array.flowBroken- whether the flow is broken at the start oftext.- Returns:
- true to keep the parser parsing, false to stop it.
-
endElement
public boolean endElement(Element element)
Description copied from interface:CallbackReceive notification of the end of an element. Warning: unless specific decorators are used, in general a callback will just receive notifications for elements whose closing tag appears explicitly in the document.This method will never be called for element without closing tags, even if such a tag is found.
- Specified by:
endElementin interfaceCallback- Overrides:
endElementin classDefaultCallback- Parameters:
element- the element whose closing tag was found.- Returns:
- true to keep the parser parsing, false to stop it.
-
startElement
public boolean startElement(Element element, Map<Attribute,MutableString> attrMapUnused)
Description copied from interface:CallbackReceive notification of the start of an element.For simple elements, this is the only notification that the callback will ever receive.
- Specified by:
startElementin interfaceCallback- Overrides:
startElementin classDefaultCallback- Parameters:
element- the element whose opening tag was found.attrMapUnused- a map fromAttributes toMutableStrings.- Returns:
- true to keep the parser parsing, false to stop it.
-
-