Class FST
- java.lang.Object
-
- org.apache.pinot.segment.local.utils.nativefst.FST
-
- All Implemented Interfaces:
Iterable<ByteBuffer>
- Direct Known Subclasses:
ConstantArcSizeFST,ImmutableFST
public abstract class FST extends Object implements Iterable<ByteBuffer>
This is a top abstract class for handling finite state automata. These automata are arc-based, a design described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).
-
-
Constructor Summary
Constructors Constructor Description FST()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected Map<Integer,Integer>buildMap(String inputString)Build a map from a serialized stringabstract intgetArc(int node, byte label)abstract bytegetArcLabel(int arc)abstract intgetEndNode(int arc)abstract intgetFirstArc(int node)abstract Set<FSTFlags>getFlags()abstract intgetNextArc(int arc)abstract intgetOutputSymbol(int arc)Get output symbol for the given arcintgetRightLanguageCount(int node)abstract intgetRootNode()Iterable<ByteBuffer>getSequences()Iterable<ByteBuffer>getSequences(int node)Returns an iterator over all binary sequences starting at the given FST state (node) and ending in final nodes.abstract booleanisArcFinal(int arc)abstract booleanisArcLast(int arc)abstract booleanisArcTerminal(int arc)Iterator<ByteBuffer>iterator()Returns an iterator over all binary sequences starting from the initial FST state (node) and ending in final nodes.static StringprintToString(FST fst)Print to Stringstatic FSTread(InputStream stream)Wrapper for the main read functionstatic FSTread(InputStream stream, boolean hasOutputSymbols, PinotDataBufferMemoryManager memoryManager)A factory for reading automata in any of the supported versions.static <T extends FST>
Tread(InputStream stream, Class<? extends T> clazz, boolean hasOutputSymbols)A factory for reading a specific FST subclass, including proper casting.protected static byte[]readRemaining(InputStream in, int length)intsave(FileOutputStream fileOutputStream)StringtoString()Returns a string representation of this automaton.<T extends StateVisitor>
TvisitInPostOrder(T v)Same asvisitInPostOrder(StateVisitor, int), starting from root automaton node.<T extends StateVisitor>
TvisitInPostOrder(T v, int node)Visits all states reachable fromnodein postorder.<T extends StateVisitor>
TvisitInPreOrder(T v)Same asvisitInPreOrder(StateVisitor, int), starting from root automaton node.<T extends StateVisitor>
TvisitInPreOrder(T v, int node)Visits all states in preorder.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
readRemaining
protected static byte[] readRemaining(InputStream in, int length) throws IOException
- Parameters:
in- The input stream.length- Length of input to be read- Returns:
- Reads remaining bytes upto length from an input stream and returns them as a byte array. Null if no data was read
- Throws:
IOException- Rethrown if an I/O exception occurs.
-
read
public static FST read(InputStream stream) throws IOException
Wrapper for the main read function- Throws:
IOException
-
read
public static FST read(InputStream stream, boolean hasOutputSymbols, PinotDataBufferMemoryManager memoryManager) throws IOException
A factory for reading automata in any of the supported versions.- Parameters:
stream- The input stream to read automaton data from. The stream is not closed.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException- If the input stream does not represent an automaton or is otherwise invalid.
-
read
public static <T extends FST> T read(InputStream stream, Class<? extends T> clazz, boolean hasOutputSymbols) throws IOException
A factory for reading a specific FST subclass, including proper casting.- Type Parameters:
T- A subclass ofFSTto cast the read automaton to.- Parameters:
stream- The input stream to read automaton data from. The stream is not closed.clazz- A subclass ofFSTto cast the read automaton to.- Returns:
- Returns an instantiated automaton. Never null.
- Throws:
IOException- If the input stream does not represent an automaton, is otherwise invalid or the class of the automaton read from the input stream is not assignable toclazz.
-
getRootNode
public abstract int getRootNode()
- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public abstract int getFirstArc(int node)
- Parameters:
node- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
nodeor 0 if the node has no outgoing arcs.
-
getNextArc
public abstract int getNextArc(int arc)
- Parameters:
arc- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arcand leavingnode. Zero is returned if no more arcs are available for the node.
-
getArc
public abstract int getArc(int node, byte label)- Parameters:
node- Identifier of the node.label- The arc's label.- Returns:
- Returns the identifier of an arc leaving
nodeand labeled withlabel. An identifier equal to 0 means the node has no outgoing arc labeledlabel.
-
getArcLabel
public abstract byte getArcLabel(int arc)
- Parameters:
arc- The arc's identifier.- Returns:
- Return the label associated with a given
arc.
-
getOutputSymbol
public abstract int getOutputSymbol(int arc)
Get output symbol for the given arc- Parameters:
arc- Arc for which the output symbol is requested- Returns:
- Output symbol, null if not present
-
isArcFinal
public abstract boolean isArcFinal(int arc)
- Parameters:
arc- The arc's identifier.- Returns:
- Returns
trueif the destination node at the end of thisarccorresponds to an input sequence created when building this automaton.
-
isArcTerminal
public abstract boolean isArcTerminal(int arc)
- Parameters:
arc- The arc's identifier.- Returns:
- Returns
trueif thisarcdoes not have a terminating node (@linkgetEndNode(int)will throw an exception). ImpliesisArcFinal(int).
-
getEndNode
public abstract int getEndNode(int arc)
- Parameters:
arc- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getFlags
public abstract Set<FSTFlags> getFlags()
- Returns:
- Returns a set of flags for this FST instance.
-
getRightLanguageCount
public int getRightLanguageCount(int node)
- Parameters:
node- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSTFlags.NUMBERS. The size of the right language of the state, in other words. - Throws:
UnsupportedOperationException- If the automaton was not compiled withFSTFlags.NUMBERS. The value can then be computed by manual count ofgetSequences(int).
-
getSequences
public Iterable<ByteBuffer> getSequences(int node)
Returns an iterator over all binary sequences starting at the given FST state (node) and ending in final nodes. This corresponds to a set of suffixes of a given prefix from all sequences stored in the automaton.The returned iterator is a
ByteBufferwhose contents changes on each call toIterator.next(). The keep the contents between calls toIterator.next(), one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Parameters:
node- Identifier of the starting node from which to return subsequences.- Returns:
- An iterable over all sequences encoded starting at the given node.
-
getSequences
public final Iterable<ByteBuffer> getSequences()
- Returns:
- Returns all sequences encoded in the automaton.
-
iterator
public final Iterator<ByteBuffer> iterator()
Returns an iterator over all binary sequences starting from the initial FST state (node) and ending in final nodes. The returned iterator is aByteBufferwhose contents changes on each call toIterator.next(). The keep the contents between calls toIterator.next(), one must copy the buffer to some other location.Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.
- Specified by:
iteratorin interfaceIterable<ByteBuffer>
-
visitInPostOrder
public <T extends StateVisitor> T visitInPostOrder(T v)
Same asvisitInPostOrder(StateVisitor, int), starting from root automaton node.- Type Parameters:
T- A subclass ofStateVisitor.- Parameters:
v- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPostOrder
public <T extends StateVisitor> T visitInPostOrder(T v, int node)
Visits all states reachable fromnodein postorder. Returning false fromStateVisitor.accept(int)immediately terminates the traversal.- Type Parameters:
T- A subclass ofStateVisitor.- Parameters:
v- Visitor to receive traversal calls.node- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPreOrder
public <T extends StateVisitor> T visitInPreOrder(T v)
Same asvisitInPreOrder(StateVisitor, int), starting from root automaton node.- Type Parameters:
T- A subclass ofStateVisitor.- Parameters:
v- Visitor to receive traversal calls.- Returns:
- Returns the argument (for access to anonymous class fields).
-
visitInPreOrder
public <T extends StateVisitor> T visitInPreOrder(T v, int node)
Visits all states in preorder. Returning false fromStateVisitor.accept(int)skips traversal of all sub-states of a given state.- Type Parameters:
T- A subclass ofStateVisitor.- Parameters:
v- Visitor to receive traversal calls.node- Identifier of the node.- Returns:
- Returns the argument (for access to anonymous class fields).
-
buildMap
protected Map<Integer,Integer> buildMap(String inputString)
Build a map from a serialized string- Parameters:
inputString- Serialized string- Returns:
-
isArcLast
public abstract boolean isArcLast(int arc)
-
save
public int save(FileOutputStream fileOutputStream)
-
-