class BufferingLogDeletionIterator extends Iterator[FileStatus]
An iterator that helps select old log files for deletion. It takes the input iterator of log files from the earliest file, and returns should-be-deleted files until the given maxTimestamp or maxVersion to delete is reached. Note that this iterator may stop deleting files earlier than maxTimestamp or maxVersion if it finds that files that need to be preserved for adjusting the timestamps of subsequent files. Let's go through an example. Assume the following commit history:
+---------+-----------+--------------------+
Version | Timestamp | Adjusted Timestamp |
|---|
+---------+-----------+--------------------+
0 | 0 | 0 |
|---|---|---|
2 | 10 | 10 |
3 | 7 | 11 |
4 | 8 | 12 |
5 | 14 | 14 |
+---------+-----------+--------------------+
As you can see from the example, we require timestamps to be monotonically increasing with respect to the version of the commit, and each commit to have a unique timestamp. If we have a commit which doesn't obey one of these two requirements, we adjust the timestamp of that commit to be one millisecond greater than the previous commit.
Given the above commit history, the behavior of this iterator will be as follows:
- For maxVersion = 1 and maxTimestamp = 9, we can delete versions 0 and 1
- Until we receive maxVersion >= 4 and maxTimestamp >= 12, we can't delete versions 2 and 3. This is because version 2 is used to adjust the timestamps of commits up to version 4.
- For maxVersion >= 5 and maxTimestamp >= 14 we can delete everything The semantics of time travel guarantee that for a given timestamp, the user will ALWAYS get the same version. Consider a user asks to get the version at timestamp 11. If all files are there, we would return version 3 (timestamp 11) for this query. If we delete versions 0-2, the original timestamp of version 3 (7) will not have an anchor to adjust on, and if the time travel query is re-executed we would return version 4. This is the motivation behind this iterator implementation.
The implementation maintains an internal "maybeDelete" buffer of files that we are unsure of deleting because they may be necessary to adjust time of future files. For each file we get from the underlying iterator, we check whether it needs time adjustment or not. If it does need time adjustment, then we cannot immediately decide whether it is safe to delete that file or not and therefore we put it in each the buffer. Then we iteratively peek ahead at the future files and accordingly decide whether to delete all the buffered files or retain them.
- Alphabetic
- By Inheritance
- BufferingLogDeletionIterator
- Iterator
- IterableOnceOps
- IterableOnce
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new BufferingLogDeletionIterator(underlying: Iterator[FileStatus], maxTimestamp: Long, maxVersion: Long, versionGetter: (Path) => Long)
- underlying
The iterator which gives the list of files in ascending version order
- maxTimestamp
The timestamp until which we can delete (inclusive).
- maxVersion
The version until which we can delete (inclusive).
- versionGetter
A method to get the commit version from the file path.
Type Members
- class GroupedIterator[B >: A] extends AbstractIterator[Seq[B]]
- Definition Classes
- Iterator
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ++[B >: FileStatus](xs: => IterableOnce[B]): Iterator[B]
- Definition Classes
- Iterator
- Annotations
- @inline()
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def addString(b: StringBuilder): b.type
- Definition Classes
- IterableOnceOps
- Annotations
- @inline()
- final def addString(b: StringBuilder, sep: String): b.type
- Definition Classes
- IterableOnceOps
- Annotations
- @inline()
- def addString(b: StringBuilder, start: String, sep: String, end: String): b.type
- Definition Classes
- IterableOnceOps
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def buffered: BufferedIterator[FileStatus]
- Definition Classes
- Iterator
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def collect[B](pf: PartialFunction[FileStatus, B]): Iterator[B]
- Definition Classes
- Iterator → IterableOnceOps
- def collectFirst[B](pf: PartialFunction[FileStatus, B]): Option[B]
- Definition Classes
- IterableOnceOps
- def concat[B >: FileStatus](xs: => IterableOnce[B]): Iterator[B]
- Definition Classes
- Iterator
- def contains(elem: Any): Boolean
- Definition Classes
- Iterator
- def copyToArray[B >: FileStatus](xs: Array[B], start: Int, len: Int): Int
- Definition Classes
- IterableOnceOps
- def copyToArray[B >: FileStatus](xs: Array[B], start: Int): Int
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecatedOverriding()
- def copyToArray[B >: FileStatus](xs: Array[B]): Int
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecatedOverriding()
- def corresponds[B](that: IterableOnce[B])(p: (FileStatus, B) => Boolean): Boolean
- Definition Classes
- IterableOnceOps
- def count(p: (FileStatus) => Boolean): Int
- Definition Classes
- IterableOnceOps
- def distinct: Iterator[FileStatus]
- Definition Classes
- Iterator
- def distinctBy[B](f: (FileStatus) => B): Iterator[FileStatus]
- Definition Classes
- Iterator
- def drop(n: Int): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def dropWhile(p: (FileStatus) => Boolean): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def duplicate: (Iterator[FileStatus], Iterator[FileStatus])
- Definition Classes
- Iterator
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def exists(p: (FileStatus) => Boolean): Boolean
- Definition Classes
- IterableOnceOps
- def filter(p: (FileStatus) => Boolean): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def filterNot(p: (FileStatus) => Boolean): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def find(p: (FileStatus) => Boolean): Option[FileStatus]
- Definition Classes
- IterableOnceOps
- def flatMap[B](f: (FileStatus) => IterableOnce[B]): Iterator[B]
- Definition Classes
- Iterator → IterableOnceOps
- def flatten[B](implicit ev: (FileStatus) => IterableOnce[B]): Iterator[B]
- Definition Classes
- Iterator → IterableOnceOps
- def fold[A1 >: FileStatus](z: A1)(op: (A1, A1) => A1): A1
- Definition Classes
- IterableOnceOps
- def foldLeft[B](z: B)(op: (B, FileStatus) => B): B
- Definition Classes
- IterableOnceOps
- def foldRight[B](z: B)(op: (FileStatus, B) => B): B
- Definition Classes
- IterableOnceOps
- def forall(p: (FileStatus) => Boolean): Boolean
- Definition Classes
- IterableOnceOps
- def foreach[U](f: (FileStatus) => U): Unit
- Definition Classes
- IterableOnceOps
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def grouped[B >: FileStatus](size: Int): GroupedIterator[B]
- Definition Classes
- Iterator
- def hasNext: Boolean
- Definition Classes
- BufferingLogDeletionIterator → Iterator
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def indexOf[B >: FileStatus](elem: B, from: Int): Int
- Definition Classes
- Iterator
- def indexOf[B >: FileStatus](elem: B): Int
- Definition Classes
- Iterator
- def indexWhere(p: (FileStatus) => Boolean, from: Int): Int
- Definition Classes
- Iterator
- def isEmpty: Boolean
- Definition Classes
- Iterator → IterableOnceOps
- Annotations
- @deprecatedOverriding()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraversableAgain: Boolean
- Definition Classes
- IterableOnceOps
- final def iterator: Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnce
- Annotations
- @inline()
- def knownSize: Int
- Definition Classes
- IterableOnce
- final def length: Int
- Definition Classes
- Iterator
- Annotations
- @inline()
- def map[B](f: (FileStatus) => B): Iterator[B]
- Definition Classes
- Iterator → IterableOnceOps
- def max[B >: FileStatus](implicit ord: Ordering[B]): FileStatus
- Definition Classes
- IterableOnceOps
- def maxBy[B](f: (FileStatus) => B)(implicit ord: Ordering[B]): FileStatus
- Definition Classes
- IterableOnceOps
- def maxByOption[B](f: (FileStatus) => B)(implicit ord: Ordering[B]): Option[FileStatus]
- Definition Classes
- IterableOnceOps
- def maxOption[B >: FileStatus](implicit ord: Ordering[B]): Option[FileStatus]
- Definition Classes
- IterableOnceOps
- def min[B >: FileStatus](implicit ord: Ordering[B]): FileStatus
- Definition Classes
- IterableOnceOps
- def minBy[B](f: (FileStatus) => B)(implicit ord: Ordering[B]): FileStatus
- Definition Classes
- IterableOnceOps
- def minByOption[B](f: (FileStatus) => B)(implicit ord: Ordering[B]): Option[FileStatus]
- Definition Classes
- IterableOnceOps
- def minOption[B >: FileStatus](implicit ord: Ordering[B]): Option[FileStatus]
- Definition Classes
- IterableOnceOps
- final def mkString: String
- Definition Classes
- IterableOnceOps
- Annotations
- @inline()
- final def mkString(sep: String): String
- Definition Classes
- IterableOnceOps
- Annotations
- @inline()
- final def mkString(start: String, sep: String, end: String): String
- Definition Classes
- IterableOnceOps
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def next(): FileStatus
- Definition Classes
- BufferingLogDeletionIterator → Iterator
- def nextOption(): Option[FileStatus]
- Definition Classes
- Iterator
- def nonEmpty: Boolean
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecatedOverriding()
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def padTo[B >: FileStatus](len: Int, elem: B): Iterator[B]
- Definition Classes
- Iterator
- def partition(p: (FileStatus) => Boolean): (Iterator[FileStatus], Iterator[FileStatus])
- Definition Classes
- Iterator
- def patch[B >: FileStatus](from: Int, patchElems: Iterator[B], replaced: Int): Iterator[B]
- Definition Classes
- Iterator
- def product[B >: FileStatus](implicit num: Numeric[B]): B
- Definition Classes
- IterableOnceOps
- def reduce[B >: FileStatus](op: (B, B) => B): B
- Definition Classes
- IterableOnceOps
- def reduceLeft[B >: FileStatus](op: (B, FileStatus) => B): B
- Definition Classes
- IterableOnceOps
- def reduceLeftOption[B >: FileStatus](op: (B, FileStatus) => B): Option[B]
- Definition Classes
- IterableOnceOps
- def reduceOption[B >: FileStatus](op: (B, B) => B): Option[B]
- Definition Classes
- IterableOnceOps
- def reduceRight[B >: FileStatus](op: (FileStatus, B) => B): B
- Definition Classes
- IterableOnceOps
- def reduceRightOption[B >: FileStatus](op: (FileStatus, B) => B): Option[B]
- Definition Classes
- IterableOnceOps
- def reversed: Iterable[FileStatus]
- Attributes
- protected
- Definition Classes
- IterableOnceOps
- def sameElements[B >: FileStatus](that: IterableOnce[B]): Boolean
- Definition Classes
- Iterator
- def scanLeft[B](z: B)(op: (B, FileStatus) => B): Iterator[B]
- Definition Classes
- Iterator → IterableOnceOps
- def size: Int
- Definition Classes
- IterableOnceOps
- def slice(from: Int, until: Int): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def sliceIterator(from: Int, until: Int): Iterator[FileStatus]
- Attributes
- protected
- Definition Classes
- Iterator
- def sliding[B >: FileStatus](size: Int, step: Int): GroupedIterator[B]
- Definition Classes
- Iterator
- def span(p: (FileStatus) => Boolean): (Iterator[FileStatus], Iterator[FileStatus])
- Definition Classes
- Iterator → IterableOnceOps
- def splitAt(n: Int): (Iterator[FileStatus], Iterator[FileStatus])
- Definition Classes
- IterableOnceOps
- def stepper[S <: Stepper[_]](implicit shape: StepperShape[FileStatus, S]): S
- Definition Classes
- IterableOnce
- def sum[B >: FileStatus](implicit num: Numeric[B]): B
- Definition Classes
- IterableOnceOps
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def take(n: Int): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def takeWhile(p: (FileStatus) => Boolean): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def tapEach[U](f: (FileStatus) => U): Iterator[FileStatus]
- Definition Classes
- Iterator → IterableOnceOps
- def to[C1](factory: Factory[FileStatus, C1]): C1
- Definition Classes
- IterableOnceOps
- def toArray[B >: FileStatus](implicit arg0: ClassTag[B]): Array[B]
- Definition Classes
- IterableOnceOps
- final def toBuffer[B >: FileStatus]: Buffer[B]
- Definition Classes
- IterableOnceOps
- Annotations
- @inline()
- def toIndexedSeq: IndexedSeq[FileStatus]
- Definition Classes
- IterableOnceOps
- def toList: List[FileStatus]
- Definition Classes
- IterableOnceOps
- def toMap[K, V](implicit ev: <:<[FileStatus, (K, V)]): Map[K, V]
- Definition Classes
- IterableOnceOps
- def toSeq: Seq[FileStatus]
- Definition Classes
- IterableOnceOps
- def toSet[B >: FileStatus]: Set[B]
- Definition Classes
- IterableOnceOps
- def toString(): String
- Definition Classes
- Iterator → AnyRef → Any
- def toVector: Vector[FileStatus]
- Definition Classes
- IterableOnceOps
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withFilter(p: (FileStatus) => Boolean): Iterator[FileStatus]
- Definition Classes
- Iterator
- def zip[B](that: IterableOnce[B]): Iterator[(FileStatus, B)]
- Definition Classes
- Iterator
- def zipAll[A1 >: FileStatus, B](that: IterableOnce[B], thisElem: A1, thatElem: B): Iterator[(A1, B)]
- Definition Classes
- Iterator
- def zipWithIndex: Iterator[(FileStatus, Int)]
- Definition Classes
- Iterator → IterableOnceOps
Deprecated Value Members
- final def /:[B](z: B)(op: (B, FileStatus) => B): B
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) Use foldLeft instead of /:
- final def :\[B](z: B)(op: (FileStatus, B) => B): B
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) Use foldRight instead of :\
- def aggregate[B](z: => B)(seqop: (B, FileStatus) => B, combop: (B, B) => B): B
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated
- Deprecated
(Since version 2.13.0) For sequential collections, prefer
foldLeft(z)(seqop). For parallel collections, useParIterableLike#aggregate.
- final def copyToBuffer[B >: FileStatus](dest: Buffer[B]): Unit
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) Use
dest ++= collinstead
- final def hasDefiniteSize: Boolean
- Definition Classes
- Iterator → IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) hasDefiniteSize on Iterator is the same as isEmpty
- def scanRight[B](z: B)(op: (FileStatus, B) => B): Iterator[B]
- Definition Classes
- Iterator
- Annotations
- @deprecated
- Deprecated
(Since version 2.13.0) Call scanRight on an Iterable instead.
- def seq: BufferingLogDeletionIterator.this.type
- Definition Classes
- Iterator
- Annotations
- @deprecated
- Deprecated
(Since version 2.13.0) Iterator.seq always returns the iterator itself
- final def toIterator: Iterator[FileStatus]
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) Use .iterator instead of .toIterator
- final def toStream: Stream[FileStatus]
- Definition Classes
- IterableOnceOps
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.13.0) Use .to(LazyList) instead of .toStream