class DeltaWriterBucket[IN] extends AnyRef
Internal implementation for writing the actual events to the underlying files in the correct buckets / partitions.
In reference to the Flink's org.apache.flink.api.connector.sink.Sink topology
one of its main components is org.apache.flink.api.connector.sink.SinkWriter
which in case of DeltaSink is implemented as DeltaWriter. However, to comply
with DeltaLake's support for partitioning tables a new component was added in the form
of DeltaWriterBucket that is responsible for handling writes to only one of the
buckets (aka partitions). Such bucket writers are managed by DeltaWriter
which works as a proxy between higher order frameworks commands (write, prepareCommit etc.)
and actual writes' implementation in DeltaWriterBucket. Thanks to this solution
events within one DeltaWriter operator received during particular checkpoint interval
are always grouped and flushed to the currently opened in-progress file.
The implementation was sourced from the org.apache.flink.connector.file.sink.FileSink
that utilizes same concept and implements
org.apache.flink.connector.file.sink.writer.FileWriter with its FileWriterBucket
implementation.
All differences between DeltaSink's and FileSink's writer buckets are explained in particular
method's below.
Lifecycle of instances of this class is as follows:
- Every instance is being created via
DeltaWriter#writemethod whenever writer receives first event that belongs to the bucket represented by givenDeltaWriterBucketinstance. Or in case of non-partitioned tables whenever writer receives the very first event as in such cases there is only oneDeltaWriterBucketrepresenting the root path of the table DeltaWriterinstance can create zero, one or multiple instances ofDeltaWriterBucketduring one checkpoint interval. It creates none if it hasn't received any events (thus didn't have to create buckets for them). It creates one when it has received events belonging only to one bucket (same if the table is not partitioned). Finally, it creates multiple when it has received events belonging to more than one bucket.- Life span of one
DeltaWriterBucketmay hold through one or more checkpoint intervals. It remains "active" as long as it receives data. If e.g. for given checkpoint interval an instance ofDeltaWriterhasn't received any events belonging to given bucket, thenDeltaWriterBucketrepresenting this bucket is de-listed from the writer's internal bucket's iterator. If in future checkpoint interval givenDeltaWriterwill receive some more events for given bucket then it will create new instance ofDeltaWriterBucketrepresenting this bucket.
- Alphabetic
- By Inheritance
- DeltaWriterBucket
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def isActive(): Boolean
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()