object DeltaFileOperations extends DeltaLogging
Some utility methods on files, directories, and paths.
- Alphabetic
- By Inheritance
- DeltaFileOperations
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def absolutePath(basePath: String, child: String): Path
Create an absolute path from
childusing thebasePathif the child is a relative path.Create an absolute path from
childusing thebasePathif the child is a relative path. Returnchildif it is an absolute path.- basePath
Base path to prepend to
childif child is a relative path. Note: It is assumed that the basePath do not have any escaped characters and is directly readable by Hadoop APIs.- child
Child path to append to
basePathif child is a relative path. Note: t is assumed that the child is escaped, that is, all special chars that need escaping by URI standards are already escaped.- returns
Absolute path without escaped chars that is directly readable by Hadoop APIs.
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def defaultHiddenFileFilter(fileName: String): Boolean
The default filter for hidden files.
The default filter for hidden files. Files names beginning with _ or . are considered hidden.
- returns
true if the file is hidden
- def deltaAssert(check: => Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def getAllSubDirectories(base: String, path: String): (Iterator[String], String)
Returns all the levels of sub directories that
pathhas with respect tobase.Returns all the levels of sub directories that
pathhas with respect tobase. For example: getAllSubDirectories("/base", "/base/a/b/c") => (Iterator("/base/a", "/base/a/b"), "/base/a/b/c") - def getAllTopComponents(listDir: Path, topDir: Path): List[String]
Get all parent directory paths from
listDiruntiltopDir(exclusive).Get all parent directory paths from
listDiruntiltopDir(exclusive). For example, iftopDiris "/folder/" andcurrDiris "/folder/a/b/c", we would return "/folder/a/b/c", "/folder/a/b" and "/folder/a". - final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
- def getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def localListDirs(hadoopConf: Configuration, dirs: Seq[String], recursive: Boolean = true, dirFilter: (String) => Boolean = defaultHiddenFileFilter, fileFilter: (String) => Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]
Lists the directory locally using LogStore without launching a spark job.
Lists the directory locally using LogStore without launching a spark job. Returns an iterator from LogStore.
- def localListFrom(hadoopConf: Configuration, listFilename: String, topDir: String, recursive: Boolean = true, dirFilter: (String) => Boolean = defaultHiddenFileFilter, fileFilter: (String) => Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]
Incrementally lists files with filenames after
listDirby alphabetical order.Incrementally lists files with filenames after
listDirby alphabetical order. Helpful if you only want to list new files instead of the entire directory. Listed locally using LogStore without launching a spark job. Returns an iterator from LogStore. - def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def makePathsAbsolute(qualifiedTablePath: String, files: Dataset[AddFile]): Dataset[AddFile]
Returns a
Dataset[AddFile], where all theAddFileactions have absolute paths.Returns a
Dataset[AddFile], where all theAddFileactions have absolute paths. The files may have already had absolute paths, in which case they are left unchanged. Else, they are prepended with thequalifiedSourcePath.- qualifiedTablePath
Fully qualified path of Delta table root
- files
List of
AddFileinstances
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def readParquetFootersInParallel(conf: Configuration, partFiles: Seq[FileStatus], ignoreCorruptFiles: Boolean): Seq[Footer]
Reads Parquet footers in multi-threaded manner.
Reads Parquet footers in multi-threaded manner. If the config "spark.sql.files.ignoreCorruptFiles" is set to true, we will ignore the corrupted files when reading footers.
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recursiveListDirs(spark: SparkSession, subDirs: Seq[String], hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) => Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) => Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None, listAsDirectories: Boolean = true): Dataset[SerializableFileStatus]
Recursively lists all the files and directories for the given
subDirsin a scalable manner.Recursively lists all the files and directories for the given
subDirsin a scalable manner.- spark
The SparkSession
- subDirs
Absolute path of the subdirectories to list
- hadoopConf
The Hadoop Configuration to get a FileSystem instance
- hiddenDirNameFilter
A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- hiddenFileNameFilter
A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- listAsDirectories
Whether to treat the paths in subDirs as directories, where all files that are children to the path will be listed. If false, the paths are treated as filenames, and files under the same folder with filenames after the path will be listed instead.
- def recursiveListFrom(spark: SparkSession, listFilename: String, topDir: String, hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) => Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) => Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None): Dataset[SerializableFileStatus]
Recursively and incrementally lists files with filenames after
listFilenameby alphabetical order.Recursively and incrementally lists files with filenames after
listFilenameby alphabetical order. Helpful if you only want to list new files instead of the entire directory.Files located within
topDirwith filenames lexically afterlistFilenamewill be included, even if they may be located in parent/sibling folders oflistFilename.- spark
The SparkSession
- listFilename
Absolute path to a filename from which new files are listed (exclusive)
- topDir
Absolute path to the original starting directory
- hadoopConf
The Hadoop Configuration to get a FileSystem instance
- hiddenDirNameFilter
A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- hiddenFileNameFilter
A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- def registerTempFileDeletionTaskFailureListener(conf: Configuration, tempPath: Path): Unit
Register a task failure listener to delete a temp file in our best effort.
- def runInNewThread[T](threadName: String, isDaemon: Boolean = true)(body: => T): T
Expose
org.apache.spark.util.ThreadUtils.runInNewThreadto use in Delta code. - final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- def tryDeleteNonRecursive(fs: FileSystem, path: Path, tries: Int = 3): Boolean
Tries deleting a file or directory non-recursively.
Tries deleting a file or directory non-recursively. If the file/folder doesn't exist, that's fine, a separate operation may be deleting files/folders. If a directory is non-empty, we shouldn't delete it. FileSystem implementations throw an
IOExceptionin those cases, which we return as a "we failed to delete".Listing on S3 is not consistent after deletes, therefore in case the
deletereturnsfalse, because the file didn't exist, then we still returntrue. Retries on S3 rate limits up to 3 times. - def tryRelativizePath(fs: FileSystem, basePath: Path, child: Path, ignoreError: Boolean = false): Path
Given a path
child:Given a path
child:- Returns
childif the path is already relative 2. Tries relativizingchildwith respect tobasePatha) If thechilddoesn't live within the same base path, returnschildas is b) Ifchildlives in a different FileSystem, throws an exception Note thatchildmay physically be pointing to a path withinbasePath, but may logically belong to a different FileSystem, e.g. DBFS mount points and direct S3 paths.
- Returns
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter