it.unimi.dsi.mg4j.search
Class AbstractDocumentIterator

java.lang.Object
  extended by it.unimi.dsi.fastutil.ints.AbstractIntIterator
      extended by it.unimi.dsi.mg4j.search.AbstractDocumentIterator
All Implemented Interfaces:
IntIterator, DocumentIterator, Iterable<Interval>, Iterator<Integer>
Direct Known Subclasses:
AbstractCompositeDocumentIterator, AlignDocumentIterator, DifferenceDocumentIterator, DocumentalConcatenatedClusterDocumentIterator, DocumentalMergedClusterDocumentIterator, FalseDocumentIterator, LowPassDocumentIterator, NotDocumentIterator, PayloadPredicateDocumentIterator, TrueDocumentIterator

public abstract class AbstractDocumentIterator
extends AbstractIntIterator
implements DocumentIterator

An abstract iterator on documents that implements hasNext() and nextInt() using nextDocument(), and provides support for the DocumentIterator.weight()/DocumentIterator.weight(double) methods.

As explained elsewhere, since MG4J 1.2 the iteration logic has been made fully lazy, and the standard IntIterator methods are available as a commodity; however, their use in performance-sensitive environments is strongly discouraged. The fully lazy implementation needs some bridging to be accessible using java.util's semi-lazy iterators, and this class provides the necessary code. In MG4J 4.0 the class has been redesigned and is not backward compatible, but this should not be a problem unless you implemented your own document iterators.

Instances of this class expect implementation to keep track of the current document of the iterator. The special value -1 denotes an iterator that has not still been accessed, and the special value DocumentIterator.END_OF_LIST denotes an iterator that has been exhausted.

This class keeps track of whether it is ahead, that is, hasNext() has been called but nextInt() or nextDocument() have not yet: in this case, curr has not been returned yet by nextInt() or nextDocument(). ahead may be true only if curr is neither -1, nor DocumentIterator.END_OF_LIST.

Concrete subclasses must implement a nextDocumentInternal() method that implements only the true iterator logic, forgetting about ahead and assuming the curr is not DocumentIterator.END_OF_LIST. Moreover, ahead must be always set to false in DocumentIterator.skipTo(int).

Methods performing actions depending on the last document returned should throw an IllegalStateException if called when ahead is true, or if curr is -1 or DocumentIterator.END_OF_LIST. You just need to call ensureOnADocument().

Finally, toNextDocument(int) will turn the value of curr into a suitable return value for nextDocument() (as DocumentIterator.END_OF_LIST needs to be massaged).


Nested Class Summary
protected static class AbstractDocumentIterator.AbstractIntervalIterator
           
 
Field Summary
protected  boolean ahead
          Whether this iterator is ahead.
protected  int curr
          The current document of the iterator.
protected  double weight
          The weight of this iterator.
 
Fields inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator
END_OF_LIST
 
Constructor Summary
AbstractDocumentIterator()
           
 
Method Summary
 int document()
          Returns the current document.
protected  void ensureOnADocument()
           
protected static int fromNextDocument(int d)
          Turns a value returned by nextDocument() into a valid value for curr.
 boolean hasNext()
          Checks whether ahead is true; if not, sets ahead if {#nextDocument()} returns a document.
 IntervalIterator iterator()
          Invokes DocumentIterator.intervalIterator()
 int nextDocument()
          Returns the next document provided by this document iterator, or -1 if no more documents are available.
protected abstract  int nextDocumentInternal()
           
 int nextInt()
          Deprecated. 
protected static int toNextDocument(int curr)
          Turns the value of the argument into a valid return value of nextDocument()
 double weight()
          Returns the weight associated with this iterator.
 DocumentIterator weight(double weight)
          Sets the weight of this index iterator.
 
Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator
next, remove, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator
accept, acceptOnTruePaths, dispose, indices, intervalIterator, intervalIterator, intervalIterators, skipTo
 
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator
skip
 
Methods inherited from interface java.util.Iterator
next, remove
 

Field Detail

curr

protected int curr
The current document of the iterator. The special value -1 means that that

See Also:
AbstractDocumentIterator

ahead

protected boolean ahead
Whether this iterator is ahead.

See Also:
AbstractDocumentIterator

weight

protected double weight
The weight of this iterator.

Constructor Detail

AbstractDocumentIterator

public AbstractDocumentIterator()
Method Detail

toNextDocument

protected static int toNextDocument(int curr)
Turns the value of the argument into a valid return value of nextDocument()

Parameters:
curr - a value for curr, including possibly DocumentIterator.END_OF_LIST.
Returns:
the correct return value for nextDocument().

fromNextDocument

protected static int fromNextDocument(int d)
Turns a value returned by nextDocument() into a valid value for curr.

Parameters:
d - a value returned by nextDocument().
Returns:
the correct return value for curr.

weight

public double weight()
Description copied from interface: DocumentIterator
Returns the weight associated with this iterator.

The number returned by this method has no fixed semantics: different scorers might choose different interpretations, or even ignore it.

Specified by:
weight in interface DocumentIterator
Returns:
the weight associated with this iterator.

weight

public DocumentIterator weight(double weight)
Description copied from interface: DocumentIterator
Sets the weight of this index iterator.

Specified by:
weight in interface DocumentIterator
Parameters:
weight - the weight of this index iterator.
Returns:
this document iterator.

iterator

public IntervalIterator iterator()
Invokes DocumentIterator.intervalIterator()

Specified by:
iterator in interface DocumentIterator
Specified by:
iterator in interface Iterable<Interval>
Returns:
DocumentIterator.intervalIterator().

hasNext

public boolean hasNext()
Checks whether ahead is true; if not, sets ahead if {#nextDocument()} returns a document.

Specified by:
hasNext in interface Iterator<Integer>
Returns:
true if ahead is true or nextDocument() returns a document.

nextInt

@Deprecated
public int nextInt()
Deprecated. 

Checks whether there is an element to be returned; if so, sets ahead to false and returns curr.

Specified by:
nextInt in interface IntIterator
Specified by:
nextInt in interface DocumentIterator
Overrides:
nextInt in class AbstractIntIterator
Returns:
curr.
See Also:
DocumentIterator.nextDocument()

nextDocumentInternal

protected abstract int nextDocumentInternal()
                                     throws IOException
Throws:
IOException

nextDocument

public int nextDocument()
                 throws IOException
Description copied from interface: DocumentIterator
Returns the next document provided by this document iterator, or -1 if no more documents are available.

Warning: the specification of this method has significantly changed as of MG4J 1.2. The special return value -1 is used to mark the end of iteration (a NoSuchElementException would have been thrown before in that case, so ho harm should be caused by this change). The reason for this change is providing fully lazy iteration over documents. Fully lazy iteration does not provide an hasNext() method—you have to actually ask for the next element and check the return value. Fully lazy iteration is much lighter on method calls (half) and in most (if not all) MG4J classes leads to a much simpler logic. Moreover, DocumentIterator.nextDocument() can be specified as throwing an IOException, which avoids the pernicious proliferation of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance).

Specified by:
nextDocument in interface DocumentIterator
Returns:
the next document, or -1 if no more documents are available.
Throws:
IOException

ensureOnADocument

protected final void ensureOnADocument()

document

public int document()
Returns the current document.

Specified by:
document in interface DocumentIterator
Returns:
curr.
Throws:
IllegalStateException - if ahead is true;