Class ImmutableFST
- java.lang.Object
-
- org.apache.pinot.segment.local.utils.nativefst.FST
-
- org.apache.pinot.segment.local.utils.nativefst.ImmutableFST
-
- All Implemented Interfaces:
Iterable<ByteBuffer>
public final class ImmutableFST extends FST
FST binary format implementationThis version indicates the dictionary was built with these flags:
FSTFlags.FLEXIBLE,FSTFlags.STOPBITandFSTFlags.NEXTBIT. The internal representation of the FST must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).---- this node header present only if automaton was compiled with NUMBERS option. Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | \ LSB +-+-+-+-+-+-+-+-+ + 1 | | | | | | | | | | number of strings recognized +-+-+-+-+-+-+-+-+ +----- by the automaton starting : : : : : : : : : | from this node. +-+-+-+-+-+-+-+-+ + ctl-1 | | | | | | | | | / MSB +-+-+-+-+-+-+-+-+/ ---- remaining part of the node Length of output symbols dictionary -- Integer. . . (Length) Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | +------ label +-+-+-+-+-+-+-+-+/ +------------- node pointed to is next | +----------- the last arc of the node | | +--------- the arc is final | | | +-----------+ | | | | | ___+___ | | | | / \ | | | | MSB LSB | 7 6 5 4 3 2 1 0 | +-+-+-+-+-+-+-+-+ | 1 | | | | | | | | | \ \ +-+-+-+-+-+-+-+-+ \ \ LSB +-+-+-+-+-+-+-+-+ + 2 | | | | | | | | | | +-+-+-+-+-+-+-+-+ | 3 | | | | | | | | | +----- target node address (in bytes) +-+-+-+-+-+-+-+-+ | (not present except for the byte : : : : : : : : : | with flags if the node pointed to +-+-+-+-+-+-+-+-+ + is next) gtl | | | | | | | | | / MSB +-+-+-+-+-+-+-+-+ / gtl+1 (gtl = gotoLength)
-
-
Field Summary
Fields Modifier and Type Field Description byte_annotationAnnotation character.byte_fillerFiller character.int_gotoLengthNumber of bytes each address takes in full, expanded form (goto length).OffHeapMutableBytesStore_mutableBytesStoreAn array of bytes with the internal representation of the automaton.int_nodeDataLengthThe length of the node header structure (if the automaton was compiled withNUMBERSoption).Map<Integer,Integer>_outputSymbolsstatic intADDRESS_OFFSETAn offset in the arc structure, where the address and flags field begins.static intBIT_FINAL_ARCBit indicating that an arc corresponds to the last character of a sequence available when building the automaton.static intBIT_LAST_ARCBit indicating that an arc is the last one of the node's list and the following one belongs to another node.static intBIT_TARGET_NEXTBit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).static byteDEFAULT_ANNOTATIONDefault annotation byte.static byteDEFAULT_FILLERDefault filler byte.static byteVERSIONAutomaton version as in the file header.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intgetArc(int node, byte label)bytegetArcLabel(int arc)intgetEndNode(int arc)intgetFirstArc(int node)Set<FSTFlags>getFlags()intgetNextArc(int arc)intgetOutputSymbol(int arc)Get output symbol for the given arcintgetRightLanguageCount(int node)Returns the number encoded at the given node.intgetRootNode()Returns the start node of this automaton.booleanisArcFinal(int arc)booleanisArcLast(int arc)Returnstrueif this arc hasNEXTbit set.booleanisArcTerminal(int arc)booleanisNextSet(int arc)-
Methods inherited from class org.apache.pinot.segment.local.utils.nativefst.FST
buildMap, getSequences, getSequences, iterator, printToString, read, read, read, readRemaining, save, toString, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Field Detail
-
DEFAULT_FILLER
public static final byte DEFAULT_FILLER
Default filler byte.- See Also:
- Constant Field Values
-
DEFAULT_ANNOTATION
public static final byte DEFAULT_ANNOTATION
Default annotation byte.- See Also:
- Constant Field Values
-
VERSION
public static final byte VERSION
Automaton version as in the file header.- See Also:
- Constant Field Values
-
BIT_FINAL_ARC
public static final int BIT_FINAL_ARC
Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.- See Also:
- Constant Field Values
-
BIT_LAST_ARC
public static final int BIT_LAST_ARC
Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.- See Also:
- Constant Field Values
-
BIT_TARGET_NEXT
public static final int BIT_TARGET_NEXT
Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).- See Also:
- Constant Field Values
-
ADDRESS_OFFSET
public static final int ADDRESS_OFFSET
An offset in the arc structure, where the address and flags field begins. In version 5 of FST automata, this value is constant (1, skip label).- See Also:
- Constant Field Values
-
_mutableBytesStore
public final OffHeapMutableBytesStore _mutableBytesStore
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
-
_nodeDataLength
public final int _nodeDataLength
The length of the node header structure (if the automaton was compiled withNUMBERSoption). Otherwise zero.
-
_gotoLength
public final int _gotoLength
Number of bytes each address takes in full, expanded form (goto length).
-
_filler
public final byte _filler
Filler character.
-
_annotation
public final byte _annotation
Annotation character.
-
-
Method Detail
-
getRootNode
public int getRootNode()
Returns the start node of this automaton.- Specified by:
getRootNodein classFST- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public int getFirstArc(int node)
- Specified by:
getFirstArcin classFST- Parameters:
node- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
nodeor 0 if the node has no outgoing arcs.
-
getNextArc
public int getNextArc(int arc)
- Specified by:
getNextArcin classFST- Parameters:
arc- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arcand leavingnode. Zero is returned if no more arcs are available for the node.
-
getArc
public int getArc(int node, byte label)
-
getEndNode
public int getEndNode(int arc)
- Specified by:
getEndNodein classFST- Parameters:
arc- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getArcLabel
public byte getArcLabel(int arc)
- Specified by:
getArcLabelin classFST- Parameters:
arc- The arc's identifier.- Returns:
- Return the label associated with a given
arc.
-
getOutputSymbol
public int getOutputSymbol(int arc)
Description copied from class:FSTGet output symbol for the given arc- Specified by:
getOutputSymbolin classFST- Parameters:
arc- Arc for which the output symbol is requested- Returns:
- Output symbol, null if not present
-
isArcFinal
public boolean isArcFinal(int arc)
- Specified by:
isArcFinalin classFST- Parameters:
arc- The arc's identifier.- Returns:
- Returns
trueif the destination node at the end of thisarccorresponds to an input sequence created when building this automaton.
-
isArcTerminal
public boolean isArcTerminal(int arc)
- Specified by:
isArcTerminalin classFST- Parameters:
arc- The arc's identifier.- Returns:
- Returns
trueif thisarcdoes not have a terminating node (@linkFST.getEndNode(int)will throw an exception). ImpliesFST.isArcFinal(int).
-
getRightLanguageCount
public int getRightLanguageCount(int node)
Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable fromnode(called its right language).- Overrides:
getRightLanguageCountin classFST- Parameters:
node- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSTFlags.NUMBERS. The size of the right language of the state, in other words.
-
getFlags
public Set<FSTFlags> getFlags()
For this automaton version, an additional
FSTFlags.NUMBERSflag may be set to indicate the automaton contains extra fields for each node.
-
isArcLast
public boolean isArcLast(int arc)
Returnstrueif this arc hasNEXTbit set.- Specified by:
isArcLastin classFST- Parameters:
arc- The node's arc identifier.- Returns:
- Returns true if the argument is the last arc of a node.
- See Also:
BIT_LAST_ARC
-
isNextSet
public boolean isNextSet(int arc)
- Parameters:
arc- The node's arc identifier.- Returns:
- Returns true if
BIT_TARGET_NEXTis set for this arc. - See Also:
BIT_TARGET_NEXT
-
-