public class AnchorText extends Object implements org.apache.hadoop.io.WritableComparable<AnchorText>, AnchorTextConstants, Iterable<Integer>
This data structure represents a line of anchor text. A line of anchor text has some text, a weight, and a set of sources (targets) associated with it. Sources (targets) are the pages a line of anchor text originates from (points to) when the underlying link is an incoming (outgoing) link.
The implemented iterator makes it possible to iterate through the source or target documents for each line of anchor text.
AnchorTextConstants.TypeEMPTY_STRING, MAXIMUM_SOURCES_PER_PACKET| Constructor and Description |
|---|
AnchorText()
Creates an empty Internal Incoming Link AnchorText object
|
AnchorText(byte type,
String text)
Creates a new AnchorText object
|
AnchorText(byte type,
String text,
int docno)
Creates a new AnchorText object and adds a new source/target document if
the AnchorText object is allowed to have text.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addDocument(int docno)
Adds a new source/target to this anchor text.
|
void |
addDocumentsFrom(AnchorText other)
Adds the sources/targets from another AnchorText to the current object
|
AnchorText |
clone()
Clones (deep copies) this object and returns a new AnchorText object.
|
int |
compareTo(AnchorText obj)
For sorting purposes, the comparison is only
limited to the type and the text
of two AnchorText objects.
|
boolean |
containsDocument(int docno)
Checks whether a document is a source or a target for this anchor text
|
boolean |
equals(Object obj)
Does a thorough comparison of two AnchorText objects.
|
boolean |
equalsIgnoreSources(AnchorText other)
Checks whether two lines of anchor text are equal,
regardless of their source/target lists.
|
int[] |
getDocuments()
Returns a list of all the sources/targets
|
int |
getSize() |
String |
getText() |
byte |
getType() |
float |
getWeight() |
int |
hashCode() |
boolean |
hasValidText() |
boolean |
intersects(AnchorText other)
Checks whether two lines of anchor text share a source/target document
|
boolean |
isDocnoField() |
boolean |
isExternalInLink() |
boolean |
isExternalOutLink() |
boolean |
isInDegree() |
boolean |
isInternalInLink() |
boolean |
isInternalOutLink() |
boolean |
isOfOtherTypes() |
boolean |
isOutDegree() |
boolean |
isURL() |
boolean |
isWeighted() |
Iterator<Integer> |
iterator()
Creates a new iterator for the current object.
|
void |
readFields(DataInput in)
Deserializes an AnchorText object.
|
void |
resetToType(byte type)
Clears this object and initializes the type
|
void |
setText(String text)
Sets the text for this line of anchor text
|
void |
setWeight(float weight)
Sets a new weight for this line of anchor text and changes the type to "weighted"
|
String |
toString() |
void |
write(DataOutput out)
Serializes an AnchorText object
|
public AnchorText()
public AnchorText(byte type,
String text)
type - Internal or external, incoming or outgoing, etc.
(see AnchorTextConstants)text - Text associated with a line of anchor textpublic AnchorText(byte type,
String text,
int docno)
interfacetype - Internal or external, incoming or outgoing, etc.
(see AnchorTextConstantstext - Text associated with a line of anchor textdocno - Source/Target document idpublic void readFields(DataInput in) throws IOException
readFields in interface org.apache.hadoop.io.Writablein - Input StreamIOExceptionpublic void write(DataOutput out) throws IOException
write in interface org.apache.hadoop.io.Writableout - Output StreamIOExceptionpublic byte getType()
public void resetToType(byte type)
type - New typepublic String getText()
public void setText(String text)
text - New text for this anchor textpublic float getWeight()
public void setWeight(float weight)
weight - New weightpublic int getSize()
public int[] getDocuments()
public void addDocument(int docno)
docno - The new document id to be added to the source/target listpublic void addDocumentsFrom(AnchorText other)
other - The other AnchorText object from which the
sources/targets are to be copied.public boolean containsDocument(int docno)
docno - Document to be checkedpublic boolean intersects(AnchorText other)
other - The other anchor textpublic boolean equalsIgnoreSources(AnchorText other)
other - The other anchor text to check against.public boolean equals(Object obj)
public int compareTo(AnchorText obj)
compareTo in interface Comparable<AnchorText>public AnchorText clone()
public boolean isExternalInLink()
public boolean isInternalInLink()
public boolean isExternalOutLink()
public boolean isInternalOutLink()
public boolean isWeighted()
public boolean isInDegree()
public boolean isOutDegree()
public boolean isDocnoField()
public boolean isURL()
public boolean isOfOtherTypes()
public boolean hasValidText()
Copyright © 2015. All rights reserved.