Packages

package util

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait AnalysisHelper extends AnyRef
  2. class BinPackingIterator[T <: Sizing] extends Iterator[Seq[T]]

    Iterator that packs objects in inputIter to create bins that have a total size of 'targetSize'.

    Iterator that packs objects in inputIter to create bins that have a total size of 'targetSize'. Each T object may contain multiple inputs that are always packed into a single bin. T instances must inherit from Sizing and define what is their size.

  3. class DatasetRefCache[T] extends AnyRef

    A Dataset reference cache to automatically create new Dataset objects when the active SparkSession changes.

    A Dataset reference cache to automatically create new Dataset objects when the active SparkSession changes. This is useful when sharing objects holding Dataset references cross multiple sessions. Without this, using a Dataset that holds a stale session may change the active session and cause multiple issues (e.g., if we switch to a stale session coming from a notebook that has been detached, we may not be able to use built-in functions because those are cleaned up).

    The creator function will be called to create a new Dataset object when the old one has a different session than the current active session. Note that one MUST use SparkSession.active in the creator() if creator() needs to use Spark session.

    Unlike StateCache, this class only caches the Dataset reference and doesn't cache the underlying RDD.

    WARNING: If there are many concurrent Spark sessions and each session calls 'get' multiple times, then the cost of creator becomes more noticeable as everytime it switch the active session, the older session needs to call creator again when it becomes active.

  4. sealed trait DateFormatter extends Serializable

    Forked from org.apache.spark.sql.catalyst.util.DateFormatter

  5. trait DateTimeFormatterHelper extends AnyRef

    Forked from org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper

  6. case class DeltaCommitFileProvider(logPath: String, maxVersionInclusive: Long, uuids: Map[Long, String]) extends Product with Serializable

    Provides access to resolve Delta commit files names based on the commit-version.

    Provides access to resolve Delta commit files names based on the commit-version.

    This class is part of the changes introduced to accommodate the adoption of coordinated-commits in Delta Lake. Previously, certain code paths assumed the existence of delta files for a specific version at a predictable path _delta_log/$version.json. With coordinated-commits, delta files may alternatively be located at _delta_log/_commits/$version.$uuid.json. DeltaCommitFileProvider attempts to locate the correct delta files from the Snapshot's LogSegment.

    logPath

    The path to the Delta table log directory.

    maxVersionInclusive

    The maximum version of the Delta table (inclusive).

    uuids

    A map of version numbers to their corresponding UUIDs.

  7. class DeltaLogGroupingIterator extends Iterator[(Long, ArrayBuffer[FileStatus])]

    An iterator that groups same types of files by version.

    An iterator that groups same types of files by version. Note that this class could handle only Checkpoints and Delta files. For example for an input iterator: - 11.checkpoint.0.1.parquet - 11.checkpoint.1.1.parquet - 11.json - 12.checkpoint.parquet - 12.json - 13.json - 14.json - 15.checkpoint.0.1.parquet - 15.checkpoint.1.1.parquet - 15.checkpoint.<uuid>.parquet - 15.json This will return:

    • (11, Seq(11.checkpoint.0.1.parquet, 11.checkpoint.1.1.parquet, 11.json))
    • (12, Seq(12.checkpoint.parquet, 12.json))
    • (13, Seq(13.json))
    • (14, Seq(14.json))
    • (15, Seq(15.checkpoint.0.1.parquet, 15.checkpoint.1.1.parquet, 15.checkpoint.<uuid>.parquet, 15.json))
  8. trait DeltaProgressReporter extends LoggingShims
  9. trait DeltaSparkPlanUtils extends AnyRef
  10. class FractionTimestampFormatter extends Iso8601TimestampFormatter

    The formatter parses/formats timestamps according to the pattern yyyy-MM-dd HH:mm:ss.[..fff..] where [..fff..] is a fraction of second up to microsecond resolution.

    The formatter parses/formats timestamps according to the pattern yyyy-MM-dd HH:mm:ss.[..fff..] where [..fff..] is a fraction of second up to microsecond resolution. The formatter does not output trailing zeros in the fraction. For example, the timestamp 2019-03-05 15:00:01.123400 is formatted as the string 2019-03-05 15:00:01.1234.

  11. class Iso8601DateFormatter extends DateFormatter with DateTimeFormatterHelper
  12. class Iso8601TimestampFormatter extends TimestampFormatter with DateTimeFormatterHelper
  13. class PartitionPath extends AnyRef

    Holds a directory in a partitioned collection of files as well as the partition values in the form of a Row.

    Holds a directory in a partitioned collection of files as well as the partition values in the form of a Row. Before scanning, the files at path need to be enumerated.

  14. case class PartitionSpec(partitionColumns: StructType, partitions: Seq[PartitionPath]) extends Product with Serializable
  15. case class PathWithFileSystem extends Product with Serializable

    Bundling the Path with the FileSystem instance ensures that we never pass the wrong file system with the path to a function at compile time.

  16. class SetAccumulator[T] extends AccumulatorV2[T, Set[T]]

    Accumulator to collect distinct elements as a set.

  17. trait StateCache extends DeltaLogging

    Machinary that caches the reconstructed state of a Delta table using the RDD cache.

    Machinary that caches the reconstructed state of a Delta table using the RDD cache. The cache is designed so that the first access will materialize the results. However once uncache is called, all data will be flushed and will not be cached again.

  18. sealed trait TimestampFormatter extends Serializable

    Forked from org.apache.spark.sql.catalyst.util.TimestampFormatter

Value Members

  1. object AnalysisHelper
  2. object BinPackingUtils
  3. object Codec

    Additional codecs not supported by Apache Commons Codecs.

  4. object DateFormatter extends Serializable
  5. object DateTimeUtils

    Helper functions for converting between internal and external date and time representations.

    Helper functions for converting between internal and external date and time representations. Dates are exposed externally as java.sql.Date and are represented internally as the number of dates since the Unix epoch (1970-01-01). Timestamps are exposed externally as java.sql.Timestamp and are stored internally as longs, which are capable of storing timestamps with microsecond precision.

  6. object DeltaCommitFileProvider extends Serializable
  7. object DeltaFileOperations extends DeltaLogging

    Some utility methods on files, directories, and paths.

  8. object DeltaSparkPlanUtils
  9. object FileNames

    Helper for creating file names for specific commits / checkpoints.

  10. object JsonUtils

    Useful json functions used around the Delta codebase.

  11. object PartitionPath

    This file is forked from org.apache.spark.sql.execution.datasources.PartitioningUtils.

  12. object PartitionSpec extends Serializable
  13. object PathWithFileSystem extends Serializable
  14. object TimestampFormatter extends Serializable
  15. object Utils

    Various utility methods used by Delta.

Ungrouped