public abstract class WikipediaPage extends Indexable
| Modifier and Type | Class and Description |
|---|---|
static class |
WikipediaPage.Link |
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
isArticle |
protected boolean |
isDisambig |
protected boolean |
isRedirect |
protected boolean |
isStub |
protected String |
language |
protected String |
mId |
protected String |
page |
protected int |
textEnd |
protected int |
textStart |
protected String |
title |
static String |
XML_END_TAG
End delimiter of the page, which is <
/page>. |
protected static String |
XML_END_TAG_ID
End delimiter of the id, which is <
/id>. |
protected static String |
XML_END_TAG_NAMESPACE
End delimiter of the namespace, which is <
/ns>. |
protected static String |
XML_END_TAG_TEXT
End delimiter of the text, which is <
/text>. |
protected static String |
XML_END_TAG_TITLE
End delimiter of the title, which is <
/title>. |
static String |
XML_START_TAG
Start delimiter of the page, which is <
page>. |
protected static String |
XML_START_TAG_ID
Start delimiter of the id, which is <
id>. |
protected static String |
XML_START_TAG_NAMESPACE
Start delimiter of the namespace, which is <
ns>. |
protected static String |
XML_START_TAG_TEXT
Start delimiter of the text, which is <
text xml:space=\"preserve\">. |
protected static String |
XML_START_TAG_TITLE
Start delimiter of the title, which is <
title>. |
| Constructor and Description |
|---|
WikipediaPage()
Creates an empty
WikipediaPage object. |
| Modifier and Type | Method and Description |
|---|---|
List<WikipediaPage.Link> |
extractLinks() |
List<String> |
extractLinkTargets() |
String |
findInterlanguageLink(String lang)
Returns the inter-language link to a specific language (if any).
|
String |
getContent()
Returns the contents of this page (title + text).
|
String |
getDisplayContent()
Returns the content of the document for display to a human.
|
String |
getDisplayContentType()
Returns the type of the display content, per IANA MIME Media Type (e.g., "text/html").
|
String |
getDocid()
Returns the article title (i.e., the docid).
|
String |
getLanguage() |
String |
getRawXML()
Returns the raw XML of this page.
|
String |
getTitle()
Returns the title of this page.
|
String |
getWikiMarkup()
Returns the text of this page.
|
boolean |
isArticle()
Checks to see if this page lives in the main/article namespace, and not, for example, "File:",
"Category:", "Wikipedia:", etc.
|
boolean |
isDisambiguation()
Checks to see if this page is a disambiguation page.
|
boolean |
isEmpty()
Checks to see if this page is an empty page.
|
boolean |
isRedirect()
Checks to see if this page is a redirect page.
|
boolean |
isStub()
Checks to see if this article is a stub.
|
protected abstract void |
processPage(String s)
Reads a raw XML string into a
WikipediaPage object. |
void |
readFields(DataInput in)
Serializes this object.
|
static void |
readPage(WikipediaPage page,
String s)
Reads a raw XML string into a
WikipediaPage object. |
void |
setLanguage(String language)
Deprecated.
|
void |
write(DataOutput out)
Deserializes this object.
|
public static final String XML_START_TAG
page>.public static final String XML_END_TAG
/page>.protected static final String XML_START_TAG_TITLE
title>.protected static final String XML_END_TAG_TITLE
/title>.protected static final String XML_START_TAG_NAMESPACE
ns>.protected static final String XML_END_TAG_NAMESPACE
/ns>.protected static final String XML_START_TAG_ID
id>.protected static final String XML_END_TAG_ID
/id>.protected static final String XML_START_TAG_TEXT
text xml:space=\"preserve\">.protected static final String XML_END_TAG_TEXT
/text>.protected String page
protected String title
protected String mId
protected int textStart
protected int textEnd
protected boolean isRedirect
protected boolean isDisambig
protected boolean isStub
protected boolean isArticle
protected String language
public void write(DataOutput out) throws IOException
IOExceptionpublic void readFields(DataInput in) throws IOException
IOExceptionpublic String getDocid()
@Deprecated public void setLanguage(String language)
public String getLanguage()
public String getContent()
getContent in class Indexablepublic String getDisplayContent()
IndexablegetDisplayContent in class Indexablepublic String getDisplayContentType()
Indexablehttp://www.iana.org/assignments/media-types/index.htmlgetDisplayContentType in class Indexablepublic String getRawXML()
public String getWikiMarkup()
public String getTitle()
public boolean isDisambiguation()
WikipediaPage is either an
article, a disambiguation page, a redirect page, or an empty page.true if this page is a disambiguation pagepublic boolean isRedirect()
WikipediaPage is either an
article, a disambiguation page, a redirect page, or an empty page.true if this page is a redirect pagepublic boolean isEmpty()
WikipediaPage is either an article,
a disambiguation page, a redirect page, or an empty page.true if this page is an empty pagepublic boolean isStub()
true if this article is a stubpublic boolean isArticle()
true if this page is an actual articlepublic String findInterlanguageLink(String lang)
lang - languagenull
otherwisepublic List<WikipediaPage.Link> extractLinks()
public static void readPage(WikipediaPage page, String s)
WikipediaPage object.page - the WikipediaPage objects - raw XML stringprotected abstract void processPage(String s)
WikipediaPage object. Added for backwards
compability.s - raw XML stringCopyright © 2015. All rights reserved.