package util
- Alphabetic
- Public
- All
Type Members
- trait AnalysisHelper extends AnyRef
-
class
BinPackingIterator[T <: Sizing] extends Iterator[Seq[T]]
Iterator that packs objects in
inputIterto create bins that have a total size of 'targetSize'.Iterator that packs objects in
inputIterto create bins that have a total size of 'targetSize'. Each T object may contain multiple inputs that are always packed into a single bin. T instances must inherit from Sizing and define what is their size. -
class
DatasetRefCache[T] extends AnyRef
A Dataset reference cache to automatically create new Dataset objects when the active SparkSession changes.
A Dataset reference cache to automatically create new Dataset objects when the active SparkSession changes. This is useful when sharing objects holding Dataset references cross multiple sessions. Without this, using a Dataset that holds a stale session may change the active session and cause multiple issues (e.g., if we switch to a stale session coming from a notebook that has been detached, we may not be able to use built-in functions because those are cleaned up).
The
creatorfunction will be called to create a new Dataset object when the old one has a different session than the current active session. Note that one MUST use SparkSession.active in the creator() if creator() needs to use Spark session.Unlike StateCache, this class only caches the Dataset reference and doesn't cache the underlying
RDD.WARNING: If there are many concurrent Spark sessions and each session calls 'get' multiple times, then the cost of creator becomes more noticeable as everytime it switch the active session, the older session needs to call creator again when it becomes active.
-
sealed
trait
DateFormatter extends Serializable
Forked from org.apache.spark.sql.catalyst.util.DateFormatter
-
trait
DateTimeFormatterHelper extends AnyRef
Forked from org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper
-
case class
DeltaCommitFileProvider(logPath: String, maxVersionInclusive: Long, uuids: Map[Long, String]) extends Product with Serializable
Provides access to resolve Delta commit files names based on the commit-version.
Provides access to resolve Delta commit files names based on the commit-version.
This class is part of the changes introduced to accommodate the adoption of coordinated-commits in Delta Lake. Previously, certain code paths assumed the existence of delta files for a specific version at a predictable path
_delta_log/$version.json. With coordinated-commits, delta files may alternatively be located at_delta_log/_commits/$version.$uuid.json. DeltaCommitFileProvider attempts to locate the correct delta files from the Snapshot's LogSegment.- logPath
The path to the Delta table log directory.
- maxVersionInclusive
The maximum version of the Delta table (inclusive).
- uuids
A map of version numbers to their corresponding UUIDs.
-
class
DeltaLogGroupingIterator extends Iterator[(Long, ArrayBuffer[FileStatus])]
An iterator that groups same types of files by version.
An iterator that groups same types of files by version. Note that this class could handle only Checkpoints and Delta files. For example for an input iterator: - 11.checkpoint.0.1.parquet - 11.checkpoint.1.1.parquet - 11.json - 12.checkpoint.parquet - 12.json - 13.json - 14.json - 15.checkpoint.0.1.parquet - 15.checkpoint.1.1.parquet - 15.checkpoint.<uuid>.parquet - 15.json This will return:
- (11, Seq(11.checkpoint.0.1.parquet, 11.checkpoint.1.1.parquet, 11.json))
- (12, Seq(12.checkpoint.parquet, 12.json))
- (13, Seq(13.json))
- (14, Seq(14.json))
- (15, Seq(15.checkpoint.0.1.parquet, 15.checkpoint.1.1.parquet, 15.checkpoint.<uuid>.parquet, 15.json))
- trait DeltaProgressReporter extends LoggingShims
- trait DeltaSparkPlanUtils extends AnyRef
-
class
FractionTimestampFormatter extends Iso8601TimestampFormatter
The formatter parses/formats timestamps according to the pattern
yyyy-MM-dd HH:mm:ss.[..fff..]where[..fff..]is a fraction of second up to microsecond resolution.The formatter parses/formats timestamps according to the pattern
yyyy-MM-dd HH:mm:ss.[..fff..]where[..fff..]is a fraction of second up to microsecond resolution. The formatter does not output trailing zeros in the fraction. For example, the timestamp2019-03-05 15:00:01.123400is formatted as the string2019-03-05 15:00:01.1234. - class Iso8601DateFormatter extends DateFormatter with DateTimeFormatterHelper
- class Iso8601TimestampFormatter extends TimestampFormatter with DateTimeFormatterHelper
-
class
PartitionPath extends AnyRef
Holds a directory in a partitioned collection of files as well as the partition values in the form of a Row.
Holds a directory in a partitioned collection of files as well as the partition values in the form of a Row. Before scanning, the files at
pathneed to be enumerated. - case class PartitionSpec(partitionColumns: StructType, partitions: Seq[PartitionPath]) extends Product with Serializable
-
case class
PathWithFileSystem extends Product with Serializable
Bundling the
Pathwith theFileSysteminstance ensures that we never pass the wrong file system with the path to a function at compile time. -
class
SetAccumulator[T] extends AccumulatorV2[T, Set[T]]
Accumulator to collect distinct elements as a set.
-
trait
StateCache extends DeltaLogging
Machinary that caches the reconstructed state of a Delta table using the RDD cache.
Machinary that caches the reconstructed state of a Delta table using the RDD cache. The cache is designed so that the first access will materialize the results. However once uncache is called, all data will be flushed and will not be cached again.
-
sealed
trait
TimestampFormatter extends Serializable
Forked from org.apache.spark.sql.catalyst.util.TimestampFormatter
Value Members
- object AnalysisHelper
- object BinPackingUtils
-
object
Codec
Additional codecs not supported by Apache Commons Codecs.
- object DateFormatter extends Serializable
-
object
DateTimeUtils
Helper functions for converting between internal and external date and time representations.
Helper functions for converting between internal and external date and time representations. Dates are exposed externally as java.sql.Date and are represented internally as the number of dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp and are stored internally as longs, which are capable of storing timestamps with microsecond precision.
- object DeltaCommitFileProvider extends Serializable
-
object
DeltaFileOperations extends DeltaLogging
Some utility methods on files, directories, and paths.
- object DeltaSparkPlanUtils
-
object
FileNames
Helper for creating file names for specific commits / checkpoints.
-
object
JsonUtils
Useful json functions used around the Delta codebase.
-
object
PartitionPath
This file is forked from org.apache.spark.sql.execution.datasources.PartitioningUtils.
- object PartitionSpec extends Serializable
- object PathWithFileSystem extends Serializable
- object TimestampFormatter extends Serializable
-
object
Utils
Various utility methods used by Delta.