o

org.apache.spark.sql.delta.util

DeltaFileOperations

object DeltaFileOperations extends DeltaLogging

Some utility methods on files, directories, and paths.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaFileOperations
  2. DeltaLogging
  3. DatabricksLogging
  4. DeltaProgressReporter
  5. LoggingShims
  6. Logging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    LoggingShims

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def absolutePath(basePath: String, child: String): Path

    Create an absolute path from child using the basePath if the child is a relative path.

    Create an absolute path from child using the basePath if the child is a relative path. Return child if it is an absolute path.

    basePath

    Base path to prepend to child if child is a relative path. Note: It is assumed that the basePath do not have any escaped characters and is directly readable by Hadoop APIs.

    child

    Child path to append to basePath if child is a relative path. Note: t is assumed that the child is escaped, that is, all special chars that need escaping by URI standards are already escaped.

    returns

    Absolute path without escaped chars that is directly readable by Hadoop APIs.

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  7. def defaultHiddenFileFilter(fileName: String): Boolean

    The default filter for hidden files.

    The default filter for hidden files. Files names beginning with _ or . are considered hidden.

    returns

    true if the file is hidden

  8. def deltaAssert(check: ⇒ Boolean, name: String, msg: String, deltaLog: DeltaLog = null, data: AnyRef = null, path: Option[Path] = None): Unit

    Helper method to check invariants in Delta code.

    Helper method to check invariants in Delta code. Fails when running in tests, records a delta assertion event and logs a warning otherwise.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. def getAllSubDirectories(base: String, path: String): (Iterator[String], String)

    Returns all the levels of sub directories that path has with respect to base.

    Returns all the levels of sub directories that path has with respect to base. For example: getAllSubDirectories("/base", "/base/a/b/c") => (Iterator("/base/a", "/base/a/b"), "/base/a/b/c")

  13. def getAllTopComponents(listDir: Path, topDir: Path): List[String]

    Get all parent directory paths from listDir until topDir (exclusive).

    Get all parent directory paths from listDir until topDir (exclusive). For example, if topDir is "/folder/" and currDir is "/folder/a/b/c", we would return "/folder/a/b/c", "/folder/a/b" and "/folder/a".

  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def getCommonTags(deltaLog: DeltaLog, tahoeId: String): Map[TagDefinition, String]
    Definition Classes
    DeltaLogging
  16. def getErrorData(e: Throwable): Map[String, Any]
    Definition Classes
    DeltaLogging
  17. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  18. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  19. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  20. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  21. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  22. def localListDirs(hadoopConf: Configuration, dirs: Seq[String], recursive: Boolean = true, dirFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileFilter: (String) ⇒ Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]

    Lists the directory locally using LogStore without launching a spark job.

    Lists the directory locally using LogStore without launching a spark job. Returns an iterator from LogStore.

  23. def localListFrom(hadoopConf: Configuration, listFilename: String, topDir: String, recursive: Boolean = true, dirFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileFilter: (String) ⇒ Boolean = defaultHiddenFileFilter): Iterator[SerializableFileStatus]

    Incrementally lists files with filenames after listDir by alphabetical order.

    Incrementally lists files with filenames after listDir by alphabetical order. Helpful if you only want to list new files instead of the entire directory. Listed locally using LogStore without launching a spark job. Returns an iterator from LogStore.

  24. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  25. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  26. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  27. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  28. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  31. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  32. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  35. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  36. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  39. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  40. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  41. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  44. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    LoggingShims
  45. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  47. def makePathsAbsolute(qualifiedTablePath: String, files: Dataset[AddFile]): Dataset[AddFile]

    Returns a Dataset[AddFile], where all the AddFile actions have absolute paths.

    Returns a Dataset[AddFile], where all the AddFile actions have absolute paths. The files may have already had absolute paths, in which case they are left unchanged. Else, they are prepended with the qualifiedSourcePath.

    qualifiedTablePath

    Fully qualified path of Delta table root

    files

    List of AddFile instances

  48. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  49. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  50. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  51. def readParquetFootersInParallel(conf: Configuration, partFiles: Seq[FileStatus], ignoreCorruptFiles: Boolean): Seq[Footer]

    Reads Parquet footers in multi-threaded manner.

    Reads Parquet footers in multi-threaded manner. If the config "spark.sql.files.ignoreCorruptFiles" is set to true, we will ignore the corrupted files when reading footers.

  52. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  53. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  54. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: ⇒ A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  55. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  56. def recordFrameProfile[T](group: String, name: String)(thunk: ⇒ T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  57. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = METRIC_OPERATION_DURATION, silent: Boolean = true)(thunk: ⇒ S): S
    Definition Classes
    DatabricksLogging
  58. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  59. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  60. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  61. def recursiveListDirs(spark: SparkSession, subDirs: Seq[String], hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None, listAsDirectories: Boolean = true): Dataset[SerializableFileStatus]

    Recursively lists all the files and directories for the given subDirs in a scalable manner.

    Recursively lists all the files and directories for the given subDirs in a scalable manner.

    spark

    The SparkSession

    subDirs

    Absolute path of the subdirectories to list

    hadoopConf

    The Hadoop Configuration to get a FileSystem instance

    hiddenDirNameFilter

    A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".

    hiddenFileNameFilter

    A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".

    listAsDirectories

    Whether to treat the paths in subDirs as directories, where all files that are children to the path will be listed. If false, the paths are treated as filenames, and files under the same folder with filenames after the path will be listed instead.

  62. def recursiveListFrom(spark: SparkSession, listFilename: String, topDir: String, hadoopConf: Broadcast[SerializableConfiguration], hiddenDirNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, hiddenFileNameFilter: (String) ⇒ Boolean = defaultHiddenFileFilter, fileListingParallelism: Option[Int] = None): Dataset[SerializableFileStatus]

    Recursively and incrementally lists files with filenames after listFilename by alphabetical order.

    Recursively and incrementally lists files with filenames after listFilename by alphabetical order. Helpful if you only want to list new files instead of the entire directory.

    Files located within topDir with filenames lexically after listFilename will be included, even if they may be located in parent/sibling folders of listFilename.

    spark

    The SparkSession

    listFilename

    Absolute path to a filename from which new files are listed (exclusive)

    topDir

    Absolute path to the original starting directory

    hadoopConf

    The Hadoop Configuration to get a FileSystem instance

    hiddenDirNameFilter

    A function that returns true when the directory should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".

    hiddenFileNameFilter

    A function that returns true when the file should be considered hidden and excluded from results. Defaults to checking for prefixes of "." or "_".

  63. def registerTempFileDeletionTaskFailureListener(conf: Configuration, tempPath: Path): Unit

    Register a task failure listener to delete a temp file in our best effort.

  64. def runInNewThread[T](threadName: String, isDaemon: Boolean = true)(body: ⇒ T): T

    Expose org.apache.spark.util.ThreadUtils.runInNewThread to use in Delta code.

  65. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  66. def toString(): String
    Definition Classes
    AnyRef → Any
  67. def tryDeleteNonRecursive(fs: FileSystem, path: Path, tries: Int = 3): Boolean

    Tries deleting a file or directory non-recursively.

    Tries deleting a file or directory non-recursively. If the file/folder doesn't exist, that's fine, a separate operation may be deleting files/folders. If a directory is non-empty, we shouldn't delete it. FileSystem implementations throw an IOException in those cases, which we return as a "we failed to delete".

    Listing on S3 is not consistent after deletes, therefore in case the delete returns false, because the file didn't exist, then we still return true. Retries on S3 rate limits up to 3 times.

  68. def tryRelativizePath(fs: FileSystem, basePath: Path, child: Path, ignoreError: Boolean = false): Path

    Given a path child:

    Given a path child:

    1. Returns child if the path is already relative 2. Tries relativizing child with respect to basePath a) If the child doesn't live within the same base path, returns child as is b) If child lives in a different FileSystem, throws an exception Note that child may physically be pointing to a path within basePath, but may logically belong to a different FileSystem, e.g. DBFS mount points and direct S3 paths.
  69. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  70. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  71. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  72. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: ⇒ T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from LoggingShims

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped