|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectit.unimi.dsi.fastutil.ints.AbstractIntIterator
it.unimi.dsi.mg4j.search.AbstractDocumentIterator
it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
public abstract class AbstractIntersectionDocumentIterator
An abstract iterator on documents, generating the intersection of the documents returned by a number of document iterators.
To be usable, this class must be subclassed so to provide also an iterator on intervals.
Such iterators must be instantiable using getComposedIntervalIterator(Index).
The latter is an example of a non-static factory method, that is, a factory method which
depends on the enclosing instance. This pattern allows to easily specialise this class to iterators
that do different things, such as AndDocumentIterator and
ConsecutiveDocumentIterator, but that have a similar semantics at the document
level (the semantics may in fact be slightly different: for instance, not all document belonging
to all components will actually appear in a consecutive iterator, as there may be documents filtered
at the interval level).
The important invariant is that only after a call to AbstractDocumentIterator.nextDocument(), a call
to intervalIterator(Index) will return an interval iterator over the document
just returned, and that for at least one index in AbstractCompositeDocumentIterator.indices() the iterator will not be empty
or TRUE.
Since MG4J 1.1, this class implements a new intersection algorithm that should be significantly faster than the previous one. The main idea is that of letting sparser iterator interact as much as possible to obtain a candidate common document, and then trying to align the others. At construction time, the component iterators are sorted so that index iterators are separated, and sorted by frequency. Then, each time we have to align the iterators we align them greedily starting from the index iterators, in frequency order. This has the effect of skipping very quickly (and usually by large jumps, which are handled nicely by indices with skips), as the main interaction happens between low-frequency index iterators.
Moreover, this class treats in a special way
index iterators coming from payload-based indices. Such
iterators are checked at the end of the alignment process,
after all standard index iterators (and general document iterators)
are aligned. At that point, the special method PayloadPredicateDocumentIterator.skipUnconditionallyTo(int)
is used to position unconditionally such iterators and check whether the payload predicate is satisfied.
If this doesn't happen, the current candidate (obtained by alignment of standard iterators) is increased and the
whole process is restarted. This procedure guarantees that we will never search exhaustively in a
payload-based index a document record satisfying the predicate (unless, of course, we have a query
containing just PayloadPredicateDocumentIterators), which is very efficient if the payload-based
index uses skipping.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
|---|
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator |
| Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
|---|
AbstractDocumentIterator.AbstractIntervalIterator |
| Field Summary | |
|---|---|
protected Reference2ReferenceArrayMap<Index,IntervalIterator> |
currentIterators
A map from indices to the iterators returned for the current document. |
protected Reference2ReferenceArrayMap<Index,IntervalIterator> |
intervalIterators
A map from indices to interval iterators. |
protected Reference2ReferenceMap<Index,IntervalIterator> |
unmodifiableCurrentIterators
An unmodifiable wrapper around currentIterators. |
| Fields inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
|---|
documentIterator, indexIterator, indices, n, soleIndex |
| Fields inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
|---|
ahead, curr, weight |
| Fields inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator |
|---|
END_OF_LIST |
| Constructor Summary | |
|---|---|
protected |
AbstractIntersectionDocumentIterator(DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators. |
protected |
AbstractIntersectionDocumentIterator(Index index,
DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators and a given index. |
| Method Summary | |
|---|---|
protected abstract IntervalIterator |
getComposedIntervalIterator(Index index)
|
IntervalIterator |
intervalIterator(Index index)
Returns the interval iterator of this document iterator for the given index. |
Reference2ReferenceMap<Index,IntervalIterator> |
intervalIterators()
Returns an unmodifiable map from indices to interval iterators. |
protected int |
nextDocumentInternal()
|
int |
skipTo(int n)
Skips all documents smaller than n. |
| Methods inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
|---|
accept, acceptOnTruePaths, dispose, indices, intervalIterator, toString |
| Methods inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
|---|
document, ensureOnADocument, fromNextDocument, hasNext, iterator, nextDocument, nextInt, toNextDocument, weight, weight |
| Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator |
|---|
next, remove, skip |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator |
|---|
document, iterator, nextDocument, nextInt, weight, weight |
| Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator |
|---|
skip |
| Methods inherited from interface java.util.Iterator |
|---|
hasNext, next, remove |
| Field Detail |
|---|
protected final Reference2ReferenceArrayMap<Index,IntervalIterator> intervalIterators
protected final Reference2ReferenceArrayMap<Index,IntervalIterator> currentIterators
intervalIterators
because it could be TRUE (in fact, in that case it may even
happen that intervalIterators does not contain the index).
protected final Reference2ReferenceMap<Index,IntervalIterator> unmodifiableCurrentIterators
currentIterators.
| Constructor Detail |
|---|
protected AbstractIntersectionDocumentIterator(Index index,
DocumentIterator[] documentIterator)
throws IOException
index - an index that will be passed to AbstractCompositeDocumentIterator.AbstractCompositeDocumentIterator(Index, DocumentIterator...).documentIterator - the iterators to be intersected (at least one).
IOException
protected AbstractIntersectionDocumentIterator(DocumentIterator[] documentIterator)
throws IOException
documentIterator - the iterators to be insersected (at least one).
IOException| Method Detail |
|---|
public int skipTo(int n)
throws IOException
DocumentIteratorn. If Iterator.hasNext() has been called returning
true but DocumentIterator.nextDocument() has not been called afterwards, then a call
to DocumentIterator.skipTo(int) will be implicitly preceded by
a call to DocumentIterator.nextDocument() (the only consequence is that skipping to the current
document after a call to Iterator.hasNext() will return the next document).
Define the current document k associated with this document iterator
as follows:
DocumentIterator.nextDocument() and this method have never been called;
DocumentIterator.END_OF_LIST, if a call to this method returned DocumentIterator.END_OF_LIST, or
DocumentIterator.nextDocument() returned -1;
DocumentIterator.nextDocument() or this method, otherwise.
If k is larger than or equal to n, then
this method does nothing and returns k. Otherwise, a
call to this method is equivalent to
while( ( k = nextDocument() ) < n && k != -1 ); return k == -1 ? END_OF_LIST : k;
Thus, when a result k ≠ DocumentIterator.END_OF_LIST
is returned, the state of this iterator
will be exactly the same as after a call to DocumentIterator.nextDocument()
that returned k.
In particular, the first document larger than or equal to n (when returned
by this method) will not be returned by the next call to
DocumentIterator.nextDocument().
n - a document pointer.
n if available, DocumentIterator.END_OF_LIST
otherwise.
IOException
protected int nextDocumentInternal()
throws IOException
nextDocumentInternal in class AbstractDocumentIteratorIOException
public Reference2ReferenceMap<Index,IntervalIterator> intervalIterators()
throws IOException
DocumentIteratorAfter a call to DocumentIterator.nextDocument(), this map
can be used to retrieve the intervals in the current document. An invocation of Map.get(java.lang.Object)
on this map with argument index yields the same result as
intervalIterator(index).
IOExceptionDocumentIterator.intervalIterator(Index)
public IntervalIterator intervalIterator(Index index)
throws IOException
DocumentIteratorAfter a call to DocumentIterator.nextDocument(), this iterator
can be used to retrieve the intervals in the current document (the
one returned by DocumentIterator.nextDocument()) for
the index index.
Note that if all indices have positions, it is guaranteed that at least one index will return an interval. However, for disjunctive queries it cannot be guaranteed that all indices will return an interval.
Indices without positions always return IntervalIterators.TRUE.
Thus, in presence of indices without positions it is possible that no
intervals at all are available.
index - an index (must be one over which the query was built).
index.
IOExceptionprotected abstract IntervalIterator getComposedIntervalIterator(Index index)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||