Class S3SingleDriverLogStore


public class S3SingleDriverLogStore extends HadoopFileSystemLogStore
Single Spark-driver/JVM LogStore implementation for S3.

We assume the following from S3's FileSystem implementations:

  • File writing on S3 is all-or-nothing, whether overwrite or not.
  • List-after-write can be inconsistent.

Regarding file creation, this implementation:

  • Opens a stream to write to S3 (regardless of the overwrite option).
  • Failures during stream write may leak resources, but may never result in partial writes.

Regarding directory listing, this implementation:

  • returns a list by merging the files listed from S3 and recently-written files from the cache.
  • Constructor Summary

    Constructors
    Constructor
    Description
    S3SingleDriverLogStore(org.apache.hadoop.conf.Configuration hadoopConf)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    isPartialWriteVisible(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration hadoopConf)
    :: DeveloperApi :: Whether a partial write is visible for the underlying file system of `path`.
    Iterator<org.apache.hadoop.fs.FileStatus>
    listFrom(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration hadoopConf)
    :: DeveloperApi :: List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given `path`.
    void
    write(org.apache.hadoop.fs.Path path, Iterator<String> actions, Boolean overwrite, org.apache.hadoop.conf.Configuration hadoopConf)
    :: DeveloperApi :: Write the given `actions` to the given `path` with or without overwrite as indicated.

    Methods inherited from class io.delta.storage.HadoopFileSystemLogStore

    read, resolvePathOnPhysicalStorage

    Methods inherited from class io.delta.storage.LogStore

    initHadoopConf

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • S3SingleDriverLogStore

      public S3SingleDriverLogStore(org.apache.hadoop.conf.Configuration hadoopConf)
  • Method Details

    • write

      public void write(org.apache.hadoop.fs.Path path, Iterator<String> actions, Boolean overwrite, org.apache.hadoop.conf.Configuration hadoopConf) throws IOException
      Description copied from class: LogStore
      :: DeveloperApi :: Write the given `actions` to the given `path` with or without overwrite as indicated. Implementation must throw FileAlreadyExistsException exception if the file already exists and overwrite = false. Furthermore, if isPartialWriteVisible returns false, implementation must ensure that the entire file is made visible atomically, that is, it should not generate partial files.
      Specified by:
      write in class LogStore
      Throws:
      IOException - if there's an issue resolving the FileSystem
      FileAlreadyExistsException - if the file already exists and overwrite is false
    • listFrom

      public Iterator<org.apache.hadoop.fs.FileStatus> listFrom(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration hadoopConf) throws IOException
      Description copied from class: LogStore
      :: DeveloperApi :: List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given `path`. The result should also be sorted by the file name.
      Overrides:
      listFrom in class HadoopFileSystemLogStore
      Throws:
      IOException - if there's an issue resolving the FileSystem
      FileAlreadyExistsException - if path directory can't be found
    • isPartialWriteVisible

      public Boolean isPartialWriteVisible(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration hadoopConf)
      Description copied from class: LogStore
      :: DeveloperApi :: Whether a partial write is visible for the underlying file system of `path`.
      Specified by:
      isPartialWriteVisible in class LogStore