Class RegexpMatcher


  • public class RegexpMatcher
    extends Object
    RegexpMatcher is a helper to retrieve matching values for a given regexp query. Regexp query is converted into an automaton and we run the matching algorithm on FST. Two main functions of this class are regexMatchOnFST() Function runs matching on FST (See function comments for more details) match(input) Function builds the automaton and matches given input.
    • Constructor Detail

      • RegexpMatcher

        public RegexpMatcher​(String regexQuery,
                             org.apache.lucene.util.fst.FST<Long> fst)
    • Method Detail

      • match

        public boolean match​(String input)
      • regexMatchOnFST

        public List<Long> regexMatchOnFST()
                                   throws IOException
        This function runs matching on automaton built from regexQuery and the FST. FST stores key (string) to a value (Long). Both are state machines and state transition is based on a input character. This algorithm starts with Queue containing (Automaton Start Node, FST Start Node). Each step an entry is popped from the queue: 1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST is added to the set of result. 2) Else next set of transitions on automaton are gathered and for each transition target node for that character is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue. 3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final nodes.
        Returns:
        Throws:
        IOException