trait Builder[T, W, Self] extends AnyRef
Builds an instance of ParquetPartitioningFlow
- T
Type of message that flow accepts
- W
Schema of Parquet file that flow writes
- Alphabetic
- By Inheritance
- Builder
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def maxCount(maxCount: Long): Self
- maxCount
max number of records to be written before file rotation
- abstract def maxDuration(maxDuration: FiniteDuration): Self
- maxDuration
max time after which partition file is rotated
- abstract def options(options: Options): Self
- options
writer options used by the flow
- abstract def partitionBy(partitionBy: ColumnPath*): Self
Sets partition paths that flow partitions data by.
Sets partition paths that flow partitions data by. Can be empty. Partition path can be a simple string column (e.g. "color") or a path pointing nested string field (e.g. "user.address.postcode"). Partition path is used to extract data from the entity and to create a tree of subdirectories for partitioned files. Using aforementioned partitions effects in creation of (example) following tree:
../color=blue /user.address.postcode=XY1234/ /user.address.postcode=AB4321/ /color=green /user.address.postcode=XY1234/ /user.address.postcode=CV3344/ /user.address.postcode=GH6732/Take note:
- PartitionBy must point a string field.
- Partitioning removes partition fields from the schema. Data is stored in name of subdirectory instead of Parquet file.
- Partitioning cannot end in having empty schema. If you remove all fields of the message you will get an error.
- Partitioned directories can be filtered effectively during reading.
- partitionBy
ColumnPaths to partition by
- abstract def postWriteHandler(handler: (PostWriteState[T]) => Unit): Self
Adds a handler after record writes, exposing some of the internal state of the flow.
Adds a handler after record writes, exposing some of the internal state of the flow. Intended for lower level monitoring and control.
Please note that the handler is invoked after each input element is processed and not after each write. It is so because postWriteHandler may produce multiple records for a single input element.
- handler
a function called after writing a record, receiving a snapshot of the internal state of the flow as a parameter.
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()