object VacuumCommand extends VacuumCommandImpl

Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. VacuumCommand
  2. VacuumCommandImpl
  3. DeltaCommand
  4. DeltaLogging
  5. DatabricksLogging
  6. DeltaProgressReporter
  7. Logging
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def allValidFiles(file: String, isBloomFiltered: Boolean): Seq[String]

    This is used to create the list of files we want to retain during GC.

    This is used to create the list of files we want to retain during GC.

    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def buildBaseRelation(spark: SparkSession, txn: OptimisticTransaction, actionType: String, rootPath: Path, inputLeafFiles: Seq[String], nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation

    Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.

    Build a base relation of files that need to be rewritten as part of an update/delete/merge operation.

    Attributes
    protected
    Definition Classes
    DeltaCommand
  7. def checkRetentionPeriodSafety(spark: SparkSession, retentionMs: Option[Long], configuredRetention: Long): Unit

    Additional check on retention duration to prevent people from shooting themselves in the foot.

    Additional check on retention duration to prevent people from shooting themselves in the foot.

    Attributes
    protected
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  9. def delete(diff: Dataset[String], fs: FileSystem): Long

    Attempts to delete the list of candidate files.

    Attempts to delete the list of candidate files. Returns the number of files deleted.

    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def gc(spark: SparkSession, deltaLog: DeltaLog, dryRun: Boolean = true, retentionHours: Option[Double] = None, clock: Clock = new SystemClock): DataFrame

    Clears all untracked files and folders within this table.

    Clears all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.

    dryRun

    If set to true, no files will be deleted. Instead, we will list all files and directories that will be cleared.

    retentionHours

    An optional parameter to override the default Delta tombstone retention period

    returns

    A Dataset containing the paths of the files/folders to delete in dryRun mode. Otherwise returns the base path of the table.

  14. def generateCandidateFileMap(basePath: Path, candidateFiles: Seq[AddFile]): Map[String, AddFile]

    Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update.

    Generates a map of file names to add file entries for operations where we will need to rewrite files such as delete, merge, update. We expect file names to be unique, because each file contains a UUID.

    Attributes
    protected
    Definition Classes
    DeltaCommand
  15. def getAllSubdirs(base: String, file: String, fs: FileSystem): Iterator[String]

    Wrapper function for DeltaFileOperations.getAllSubDirectories returns all subdirectories that file has with respect to base.

    Wrapper function for DeltaFileOperations.getAllSubDirectories returns all subdirectories that file has with respect to base.

    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  16. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def getTouchedFile(basePath: Path, filePath: String, nameToAddFileMap: Map[String, AddFile]): AddFile

    Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.

    Find the AddFile record corresponding to the file that was read as part of a delete/update/merge operation.

    filePath

    The path to a file. Can be either absolute or relative

    nameToAddFileMap

    Map generated through generateCandidateFileMap()

    Attributes
    protected
    Definition Classes
    DeltaCommand
  18. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  19. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  20. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. def isCatalogTable(analyzer: Analyzer, tableIdent: TableIdentifier): Boolean

    Use the analyzer to see whether the provided TableIdentifier is for a path based table or not

    Use the analyzer to see whether the provided TableIdentifier is for a path based table or not

    analyzer

    The session state analyzer to call

    tableIdent

    Table Identifier to determine whether is path based or not

    returns

    Boolean where true means that the table is a table in a metastore and false means the table is a path based table

    Definition Classes
    DeltaCommand
  22. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  23. def isPathIdentifier(tableIdent: TableIdentifier): Boolean

    Checks if the given identifier can be for a delta table's path

    Checks if the given identifier can be for a delta table's path

    tableIdent

    Table Identifier for which to check

    Attributes
    protected
    Definition Classes
    DeltaCommand
  24. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  25. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  26. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  27. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  34. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  39. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  40. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  41. def parsePartitionPredicates(spark: SparkSession, predicate: String): Seq[Expression]

    Converts string predicates into Expressions relative to a transaction.

    Converts string predicates into Expressions relative to a transaction.

    Attributes
    protected
    Definition Classes
    DeltaCommand
    Exceptions thrown

    AnalysisException if a non-partition column is referenced.

  42. def pathToString(path: Path): String
    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  43. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  44. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation.

    Used to report the duration as well as the success or failure of an operation.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  45. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  46. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  47. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  48. def relativize(path: Path, fs: FileSystem, reservoirBase: Path, isDir: Boolean): String

    Attempts to relativize the path with respect to the reservoirBase and converts the path to a string.

    Attempts to relativize the path with respect to the reservoirBase and converts the path to a string.

    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  49. def removeFilesFromPaths(deltaLog: DeltaLog, nameToAddFileMap: Map[String, AddFile], filesToRewrite: Seq[String], operationTimestamp: Long): Seq[RemoveFile]

    This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.

    This method provides the RemoveFile actions that are necessary for files that are touched and need to be rewritten in methods like Delete, Update, and Merge.

    deltaLog

    The DeltaLog of the table that is being operated on

    nameToAddFileMap

    A map generated using generateCandidateFileMap.

    filesToRewrite

    Absolute paths of the files that were touched. We will search for these in candidateFiles. Obtained as the output of the input_file_name function.

    operationTimestamp

    The timestamp of the operation

    Attributes
    protected
    Definition Classes
    DeltaCommand
  50. def resolveIdentifier(analyzer: Analyzer, identifier: TableIdentifier): LogicalPlan

    Use the analyzer to resolve the identifier provided

    Use the analyzer to resolve the identifier provided

    analyzer

    The session state analyzer to call

    identifier

    Table Identifier to determine whether is path based or not

    Attributes
    protected
    Definition Classes
    DeltaCommand
  51. def stringToPath(path: String): Path
    Attributes
    protected
    Definition Classes
    VacuumCommandImpl
  52. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  53. def toString(): String
    Definition Classes
    AnyRef → Any
  54. def verifyPartitionPredicates(spark: SparkSession, partitionColumns: Seq[String], predicates: Seq[Expression]): Unit
    Attributes
    protected
    Definition Classes
    DeltaCommand
  55. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  56. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  57. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  58. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from VacuumCommandImpl

Inherited from DeltaCommand

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped