public class Trie
This class implements a trie, or a digital search tree. A trie is a multiway tree (each node can have multiple children) that represents a set of strings.
Each node contains data encapsulated in an object instance. Each edge spells out a character and each path from the root represents a string described by the characters labelling the traversed edges. Moreover, for each string represented, there is a unique path from the root.
The trie of the following example represents the strings 'a', 'd', 'b', 'ac', 'ba', 'be', 'bd', 'bad' and 'bae'.
[0]
--+--
/ | \
a/ d| \b
[1] [2] [4]
| --+--
| / | \
c| a/ e| d\
[3] [5] [6] [9]
--+--
/ \
d/ e\
[7] [8]
It is easy to see that strings with common prefixes will branch off from each other at the first distinguishing character. This feature makes the trie a good data structure to identify and represent phrases of a text such as the ones induced by the Lempel-Ziv familiy of compression algorithms. For instance, the LZ78 version parses the text into phrases, where each phrase is the longest matching phrase seen previously plus one character.
In this implementation, each node is actually an instance of this class. To build a trie, one must first create the root using the public constructor:
Trie root = new Trie (some_object);
Here some_object contains any relevant information encapsulated in an object instance. Typically, that's the only moment the public constructor is used. From now on, all new nodes will be added as a new child of one existing node using the add method:
new_node = any_node.add (some_object, character);
Here character is the character that will label the edge from any_node to new_node. Note that this transition must not already exist, otherwise an exception is raised.
To find the longest prefix of a given string, we follow a path from the root down the tree, character by character, with the spellDown method:
next_node = root;
while (next_node != null)
{
current_node = next_node;
char c = get next character from somewhere
next_node = current_node.spellDown (c);
}
spellDown follows the edge out of current_node labelled by the character c and returns the next node. If there is no such a path, it returns null.
To retrieve the information stored at any node, simply use the getData method.
In fact, there are many ways to implement a trie. To avoid wasting space with multiple pointers at each node, this implementation uses an approach with a linked list of siblings. Each node actually contains a pointer to one of its children and a pointer to one of its siblings only. Together with the pointers, each node also stores the character that labels the edge to the pointed node.
[0]
|
a| d b
[1]---[2]---[4]
| |
c| a| e d
[3] [5]---[6]---[9]
|
d| e
[7]---[8]
In this way, a trie is similar to a binary tree. Although this implementation is more efficient in terms of space, the search for a label with a given character leaving a node n is no more constant but proportional to the number of children of n. In the previous example, it is necessary to traverse three edges to reach node 9 from node 4 with character d.
This class is used by the class FactorSequence to build a linked list of factors of a sequence in a LZ78 fashion, i.e. where each factor is the longest factor previously seen plus one character.
class FactorSequence,
class FactorSequenceprotected Trie son
A pointer to the first of this node's children.
protected char to_son
The character that labels the edge from this node to the child node pointer by son.
protected Trie sibling
A pointer to this node's next sibling.
protected char to_sibling
The character that labels the edge from this node to the sibling pointer by sibling.
protected java.lang.Object data
This node's stored data.
public Trie(java.lang.Object data)
Creates a new trie node with the specified data. This constructor is typically used by the client only once to instantiate the root node. After that, all new nodes are implicitly instantiated by the add method.
data - the data that will be associated with the new nodepublic java.lang.Object getData()
Returns the data associated with this node.
public Trie add(java.lang.Object data, char c)
Adds a new child to this node. The new node will be implicitly instantiated with the data argument, and the edge from this node to the new node will be labelled by the character argument. If this node already have an edge labelled with this character, an exception is raised. Otherwise, the new node created and returned.
If this node have no child, a new node is created straight away. Otherwise, the task is assigned to its first child that will add the new node as a sibling.
data - the data that will be associated with the new nodec - the character that will label the edge from this node to the new nodeIllegalStateException - if this node already have an edge labelled by cprotected Trie addSibling(java.lang.Object data, char c)
Adds a sibling to this node. The new node will be implicitly instantiated with the data argument, and the edge from this node to the new node will be labelled by the character argument. If this node already have a sibling with this character, an exception is raised. Otherwise, the new node is created and returned.
If this node have no direct sibling, a new node is created straight away. Otherwise, the task is assigned to its next sibling.
data - the data that will be associated with the new nodec - the character that will label the edge from this node to the new nodeIllegalStateException - if this node already have an edge labelled by cpublic Trie spellDown(char c)
Follows a path from this node to one of its children by spelling the character supplied as an argument. If there is no such a path, null is returned. Otherwise, the reached child node is returned.
If this node's direct child is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c - the character that labels the path to be followed to this node's childcprotected Trie spellRight(char c)
Follows a path from this node to one of its sibling by spelling the character supplied as an argument. If there is no such a path, null is returned. Otherwise, the reached sibling node is returned.
If this node's direct sibling is reached with an edge labelled by the character, it is returned straight away. Otherwise, it is assigned the task of finding another sibling labelled with that character.
c - the character that labels the path to be followed to the siblingc