object DeltaFileOperations extends DeltaLogging
Some utility methods on files, directories, and paths.
- Alphabetic
- By Inheritance
- DeltaFileOperations
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- LoggingShims
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
implicit
class
LogStringContext extends AnyRef
- Definition Classes
- LoggingShims
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
absolutePath(basePath: String, child: String): Path
Create an absolute path from
childusing thebasePathif the child is a relative path.Create an absolute path from
childusing thebasePathif the child is a relative path. Returnchildif it is an absolute path.- basePath
Base path to prepend to
childif child is a relative path. Note: It is assumed that the basePath do not have any escaped characters and is directly readable by Hadoop APIs.- child
Child path to append to
basePathif child is a relative path. Note: t is assumed that the child is escaped, that is, all special chars that need escaping by URI standards are already escaped.- returns
Absolute path without escaped chars that is directly readable by Hadoop APIs.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
defaultHiddenFileFilter(fileName: String): Boolean
The default filter for hidden files.
The default filter for hidden files. Files names beginning with _ or . are considered hidden.
- returns
true if the file is hidden
-
def
deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit
Helper method to check invariants in Delta code.
Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getAllSubDirectories(base: String, path: String): (Iterator[String], String)
Returns all the levels of sub directories that
pathhas with respect tobase.Returns all the levels of sub directories that
pathhas with respect tobase. For example: getAllSubDirectories("/base", "/base/a/b/c") => (Iterator("/base/a", "/base/a/b"), "/base/a/b/c") -
def
getAllTopComponents(listDir: Path, topDir: Path): List[String]
Get all parent directory paths from
listDiruntiltopDir(exclusive).Get all parent directory paths from
listDiruntiltopDir(exclusive). For example, iftopDiris "/folder/" andcurrDiris "/folder/a/b/c", we would return "/folder/a/b/c", "/folder/a/b" and "/folder/a". -
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
- Definition Classes
- DeltaLogging
-
def
getErrorData(e: Throwable): Map[String, Any]
- Definition Classes
- DeltaLogging
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
localListDirs(hadoopConf: Configuration, dirs: Seq[String], recursive: Boolean = true, dirFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileFilter: (String) ⇒ Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]
Lists the directory locally using LogStore without launching a spark job.
Lists the directory locally using LogStore without launching a spark job. Returns an iterator from LogStore.
-
def
localListFrom(hadoopConf: Configuration, listFilename: String, topDir: String, recursive: Boolean = true, dirFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileFilter: (String) ⇒ Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]
Incrementally lists files with filenames after
listDirby alphabetical order.Incrementally lists files with filenames after
listDirby alphabetical order. Helpful if you only want to list new files instead of the entire directory. Listed locally using LogStore without launching a spark job. Returns an iterator from LogStore. -
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
-
def
logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- LoggingShims
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
makePathsAbsolute(qualifiedTablePath: String, files: Dataset[AddFile]): Dataset[AddFile]
Returns a
Dataset[AddFile], where all theAddFileactions have absolute paths.Returns a
Dataset[AddFile], where all theAddFileactions have absolute paths. The files may have already had absolute paths, in which case they are left unchanged. Else, they are prepended with thequalifiedSourcePath.- qualifiedTablePath
Fully qualified path of Delta table root
- files
List of
AddFileinstances
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
readParquetFootersInParallel(conf: Configuration, partFiles: Seq[FileStatus], ignoreCorruptFiles: Boolean): Seq[Footer]
Reads Parquet footers in multi-threaded manner.
Reads Parquet footers in multi-threaded manner. If the config "spark.sql.files.ignoreCorruptFiles" is set to true, we will ignore the corrupted files when reading footers.
-
def
recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
-
def
recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
- Definition Classes
- DatabricksLogging
-
def
recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
-
def
recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
-
def
recursiveListDirs(spark: SparkSession, subDirs: Seq[String], hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None, listAsDirectories: Boolean = true): Dataset[SerializableFileStatus]
Recursively lists all the files and directories for the given
subDirsin a scalable manner.Recursively lists all the files and directories for the given
subDirsin a scalable manner.- spark
The SparkSession
- subDirs
Absolute path of the subdirectories to list
- hadoopConf
The Hadoop Configuration to get a FileSystem instance
- hiddenDirNameFilter
A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- hiddenFileNameFilter
A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- listAsDirectories
Whether to treat the paths in subDirs as directories, where all files that are children to the path will be listed. If false, the paths are treated as filenames, and files under the same folder with filenames after the path will be listed instead.
-
def
recursiveListFrom(spark: SparkSession, listFilename: String, topDir: String, hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None): Dataset[SerializableFileStatus]
Recursively and incrementally lists files with filenames after
listFilenameby alphabetical order.Recursively and incrementally lists files with filenames after
listFilenameby alphabetical order. Helpful if you only want to list new files instead of the entire directory.Files located within
topDirwith filenames lexically afterlistFilenamewill be included, even if they may be located in parent/sibling folders oflistFilename.- spark
The SparkSession
- listFilename
Absolute path to a filename from which new files are listed (exclusive)
- topDir
Absolute path to the original starting directory
- hadoopConf
The Hadoop Configuration to get a FileSystem instance
- hiddenDirNameFilter
A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
- hiddenFileNameFilter
A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".
-
def
registerTempFileDeletionTaskFailureListener(conf: Configuration, tempPath: Path): Unit
Register a task failure listener to delete a temp file in our best effort.
-
def
runInNewThread[T](threadName: String, isDaemon: Boolean = true)(body: ⇒ T): T
Expose
org.apache.spark.util.ThreadUtils.runInNewThreadto use in Delta code. -
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
tryDeleteNonRecursive(fs: FileSystem, path: Path, tries: Int = 3): Boolean
Tries deleting a file or directory non-recursively.
Tries deleting a file or directory non-recursively. If the file/folder doesn't exist, that's fine, a separate operation may be deleting files/folders. If a directory is non-empty, we shouldn't delete it. FileSystem implementations throw an
IOExceptionin those cases, which we return as a "we failed to delete".Listing on S3 is not consistent after deletes, therefore in case the
deletereturnsfalse, because the file didn't exist, then we still returntrue. Retries on S3 rate limits up to 3 times. -
def
tryRelativizePath(fs: FileSystem, basePath: Path, child: Path, ignoreError: Boolean = false): Path
Given a path
child:Given a path
child:- Returns
childif the path is already relative 2. Tries relativizingchildwith respect tobasePatha) If thechilddoesn't live within the same base path, returnschildas is b) Ifchildlives in a different FileSystem, throws an exception Note thatchildmay physically be pointing to a path withinbasePath, but may logically belong to a different FileSystem, e.g. DBFS mount points and direct S3 paths.
- Returns
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter