Class ImmutableFST

  • All Implemented Interfaces:
    Iterable<ByteBuffer>

    public final class ImmutableFST
    extends FST
    FST binary format implementation

    This version indicates the dictionary was built with these flags: FSTFlags.FLEXIBLE, FSTFlags.STOPBIT and FSTFlags.NEXTBIT. The internal representation of the FST must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

     ---- this node header present only if automaton was compiled with NUMBERS option.
     Byte
            +-+-+-+-+-+-+-+-+\
          0 | | | | | | | | | \  LSB
            +-+-+-+-+-+-+-+-+  +
          1 | | | | | | | | |  |      number of strings recognized
            +-+-+-+-+-+-+-+-+  +----- by the automaton starting
            : : : : : : : : :  |      from this node.
            +-+-+-+-+-+-+-+-+  +
      ctl-1 | | | | | | | | | /  MSB
            +-+-+-+-+-+-+-+-+/
    
     ---- remaining part of the node
     Length of output symbols dictionary -- Integer
     
     
     
     .
     .
     .
      (Length)
    
     Byte
           +-+-+-+-+-+-+-+-+\
         0 | | | | | | | | | +------ label
           +-+-+-+-+-+-+-+-+/
    
                      +------------- node pointed to is next
                      | +----------- the last arc of the node
                      | | +--------- the arc is final
                      | | |
                 +-----------+
                 |    | | |  |
             ___+___  | | |  |
            /       \ | | |  |
           MSB           LSB |
            7 6 5 4 3 2 1 0  |
           +-+-+-+-+-+-+-+-+ |
         1 | | | | | | | | | \ \
           +-+-+-+-+-+-+-+-+  \ \  LSB
           +-+-+-+-+-+-+-+-+     +
         2 | | | | | | | | |     |
           +-+-+-+-+-+-+-+-+     |
         3 | | | | | | | | |     +----- target node address (in bytes)
           +-+-+-+-+-+-+-+-+     |      (not present except for the byte
           : : : : : : : : :     |       with flags if the node pointed to
           +-+-+-+-+-+-+-+-+     +       is next)
       gtl | | | | | | | | |    /  MSB
           +-+-+-+-+-+-+-+-+   /
     gtl+1                           (gtl = gotoLength)
     
    • Field Detail

      • DEFAULT_FILLER

        public static final byte DEFAULT_FILLER
        Default filler byte.
        See Also:
        Constant Field Values
      • DEFAULT_ANNOTATION

        public static final byte DEFAULT_ANNOTATION
        Default annotation byte.
        See Also:
        Constant Field Values
      • VERSION

        public static final byte VERSION
        Automaton version as in the file header.
        See Also:
        Constant Field Values
      • BIT_FINAL_ARC

        public static final int BIT_FINAL_ARC
        Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
        See Also:
        Constant Field Values
      • BIT_LAST_ARC

        public static final int BIT_LAST_ARC
        Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
        See Also:
        Constant Field Values
      • BIT_TARGET_NEXT

        public static final int BIT_TARGET_NEXT
        Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
        See Also:
        Constant Field Values
      • ADDRESS_OFFSET

        public static final int ADDRESS_OFFSET
        An offset in the arc structure, where the address and flags field begins. In version 5 of FST automata, this value is constant (1, skip label).
        See Also:
        Constant Field Values
      • _mutableBytesStore

        public final OffHeapMutableBytesStore _mutableBytesStore
        An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
      • _nodeDataLength

        public final int _nodeDataLength
        The length of the node header structure (if the automaton was compiled with NUMBERS option). Otherwise zero.
      • _gotoLength

        public final int _gotoLength
        Number of bytes each address takes in full, expanded form (goto length).
      • _filler

        public final byte _filler
        Filler character.
      • _annotation

        public final byte _annotation
        Annotation character.
    • Method Detail

      • getRootNode

        public int getRootNode()
        Returns the start node of this automaton.
        Specified by:
        getRootNode in class FST
        Returns:
        Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
      • getFirstArc

        public int getFirstArc​(int node)
        Specified by:
        getFirstArc in class FST
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
      • getNextArc

        public int getNextArc​(int arc)
        Specified by:
        getNextArc in class FST
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.
      • getArc

        public int getArc​(int node,
                          byte label)
        Specified by:
        getArc in class FST
        Parameters:
        node - Identifier of the node.
        label - The arc's label.
        Returns:
        Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.
      • getEndNode

        public int getEndNode​(int arc)
        Specified by:
        getEndNode in class FST
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
      • getArcLabel

        public byte getArcLabel​(int arc)
        Specified by:
        getArcLabel in class FST
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the label associated with a given arc.
      • getOutputSymbol

        public int getOutputSymbol​(int arc)
        Description copied from class: FST
        Get output symbol for the given arc
        Specified by:
        getOutputSymbol in class FST
        Parameters:
        arc - Arc for which the output symbol is requested
        Returns:
        Output symbol, null if not present
      • isArcFinal

        public boolean isArcFinal​(int arc)
        Specified by:
        isArcFinal in class FST
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
      • isArcTerminal

        public boolean isArcTerminal​(int arc)
        Specified by:
        isArcTerminal in class FST
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if this arc does not have a terminating node (@link FST.getEndNode(int) will throw an exception). Implies FST.isArcFinal(int).
      • getRightLanguageCount

        public int getRightLanguageCount​(int node)
        Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable from node (called its right language).
        Overrides:
        getRightLanguageCount in class FST
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the number of sequences reachable from the given state if the automaton was compiled with FSTFlags.NUMBERS. The size of the right language of the state, in other words.
      • getFlags

        public Set<FSTFlags> getFlags()

        For this automaton version, an additional FSTFlags.NUMBERS flag may be set to indicate the automaton contains extra fields for each node.

        Specified by:
        getFlags in class FST
        Returns:
        Returns a set of flags for this FST instance.
      • isArcLast

        public boolean isArcLast​(int arc)
        Returns true if this arc has NEXT bit set.
        Specified by:
        isArcLast in class FST
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if the argument is the last arc of a node.
        See Also:
        BIT_LAST_ARC
      • isNextSet

        public boolean isNextSet​(int arc)
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if BIT_TARGET_NEXT is set for this arc.
        See Also:
        BIT_TARGET_NEXT