Class TextDocument
-
- All Implemented Interfaces:
public class TextDocumentRepresents a text document extracted from a text page, such as a news article.
-
-
Field Summary
Fields Modifier and Type Field Description private final Stringurlprivate StringpageTitleprivate StringcontentTitleprivate StringtextContentprivate Map<String, String>additionalFields
-
Method Summary
Modifier and Type Method Description final StringgetUrl()The url of the document. final StringgetPageTitle()The title of the document, which is in <title> tag. final UnitsetPageTitle(String pageTitle)The title of the document, which is in <title> tag. final StringgetContentTitle()The title of the content, which is extracted from the text content. final UnitsetContentTitle(String contentTitle)The title of the content, which is extracted from the text content. final StringgetTextContent()The extracted text content of the document, which is usually with links, ads and other irrelevant contents removed. final UnitsetTextContent(String textContent)The extracted text content of the document, which is usually with links, ads and other irrelevant contents removed. final Map<String, String>getAdditionalFields()The extracted fields. final UnitsetAdditionalFields(Map<String, String> additionalFields)The extracted fields. -
-
Method Detail
-
getPageTitle
final String getPageTitle()
The title of the document, which is in <title> tag.
-
setPageTitle
final Unit setPageTitle(String pageTitle)
The title of the document, which is in <title> tag.
-
getContentTitle
final String getContentTitle()
The title of the content, which is extracted from the text content.
-
setContentTitle
final Unit setContentTitle(String contentTitle)
The title of the content, which is extracted from the text content.
-
getTextContent
final String getTextContent()
The extracted text content of the document, which is usually with links, ads and other irrelevant contents removed.
-
setTextContent
final Unit setTextContent(String textContent)
The extracted text content of the document, which is usually with links, ads and other irrelevant contents removed.
-
getAdditionalFields
final Map<String, String> getAdditionalFields()
The extracted fields.
-
setAdditionalFields
final Unit setAdditionalFields(Map<String, String> additionalFields)
The extracted fields.
-
-
-
-