class S3SingleDriverLogStore extends HadoopFileSystemLogStore
Single Spark-driver/JVM LogStore implementation for S3.
We assume the following from S3's FileSystem implementations: - File writing on S3 is all-or-nothing, whether overwrite or not. - List-after-write can be inconsistent.
Regarding file creation, this implementation: - Opens a stream to write to S3 (regardless of the overwrite option). - Failures during stream write may leak resources, but may never result in partial writes.
Regarding directory listing, this implementation: - returns a list by merging the files listed from S3 and recently-written files from the cache.
- Alphabetic
- By Inheritance
- S3SingleDriverLogStore
- HadoopFileSystemLogStore
- LogStore
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new S3SingleDriverLogStore(sparkConf: SparkConf, hadoopConf: Configuration)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def createTempPath(path: Path): Path
- Attributes
- protected
- Definition Classes
- HadoopFileSystemLogStore
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getHadoopConfiguration: Configuration
- Attributes
- protected
- Definition Classes
- HadoopFileSystemLogStore
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def invalidateCache(): Unit
Invalidate any caching that the implementation may be using
Invalidate any caching that the implementation may be using
- Definition Classes
- S3SingleDriverLogStore → HadoopFileSystemLogStore → LogStore
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isPartialWriteVisible(path: Path, hadoopConf: Configuration): Boolean
Whether a partial write is visible when writing to
path.Whether a partial write is visible when writing to
path.As this depends on the underlying file system implementations, we require the input of
pathhere in order to identify the underlying file system, even though in most cases a log store only deals with one file system.The default value is only provided here for legacy reasons, which will be removed. Any LogStore implementation should override this instead of relying on the default.
Note: The default implementation ignores the
hadoopConfparameter to provide the backward compatibility. Subclasses should override this method and usehadoopConfproperly to support passing Hadoop file system configurations through DataFrame options.- Definition Classes
- S3SingleDriverLogStore → LogStore
- def isPartialWriteVisible(path: Path): Boolean
Whether a partial write is visible when writing to
path.Whether a partial write is visible when writing to
path.As this depends on the underlying file system implementations, we require the input of
pathhere in order to identify the underlying file system, even though in most cases a log store only deals with one file system.The default value is only provided here for legacy reasons, which will be removed. Any LogStore implementation should override this instead of relying on the default.
- Definition Classes
- S3SingleDriverLogStore → LogStore
- def listFrom(path: Path, hadoopConf: Configuration): Iterator[FileStatus]
List files starting from
resolvedPath(inclusive) in the same directory.List files starting from
resolvedPath(inclusive) in the same directory.- Definition Classes
- S3SingleDriverLogStore → HadoopFileSystemLogStore → LogStore
- def listFrom(path: Path): Iterator[FileStatus]
List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given
path.List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given
path. The result should also be sorted by the file name.- Definition Classes
- S3SingleDriverLogStore → HadoopFileSystemLogStore → LogStore
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def read(path: Path, hadoopConf: Configuration): Seq[String]
Load the given file and return a
Seqof lines.Load the given file and return a
Seqof lines. The line break will be removed from each line. This method will load the entire file into the memory. CallreadAsIteratorif possible as its implementation may be more efficient.Note: The default implementation ignores the
hadoopConfparameter to provide the backward compatibility. Subclasses should override this method and usehadoopConfproperly to support passing Hadoop file system configurations through DataFrame options.- Definition Classes
- HadoopFileSystemLogStore → LogStore
- def read(path: Path): Seq[String]
Load the given file and return a
Seqof lines.Load the given file and return a
Seqof lines. The line break will be removed from each line. This method will load the entire file into the memory. CallreadAsIteratorif possible as its implementation may be more efficient.- Definition Classes
- HadoopFileSystemLogStore → LogStore
- final def read(fileStatus: FileStatus, hadoopConf: Configuration): Seq[String]
Load the given file represented by
fileStatusand return aSeqof lines.Load the given file represented by
fileStatusand return aSeqof lines. The line break will be removed from each line.Note: Using a stale
FileStatusmay get an incorrect result.- Definition Classes
- LogStore
- def readAsIterator(path: Path, hadoopConf: Configuration): ClosableIterator[String]
Load the given file and return an iterator of lines.
Load the given file and return an iterator of lines. The line break will be removed from each line. The default implementation calls
readto load the entire file into the memory. An implementation should provide a more efficient approach if possible. For example, the file content can be loaded on demand.Note: the returned ClosableIterator should be closed when it's no longer used to avoid resource leak.
Note: The default implementation ignores the
hadoopConfparameter to provide the backward compatibility. Subclasses should override this method and usehadoopConfproperly to support passing Hadoop file system configurations through DataFrame options.- Definition Classes
- HadoopFileSystemLogStore → LogStore
- def readAsIterator(path: Path): ClosableIterator[String]
Load the given file and return an iterator of lines.
Load the given file and return an iterator of lines. The line break will be removed from each line. The default implementation calls
readto load the entire file into the memory. An implementation should provide a more efficient approach if possible. For example, the file content can be loaded on demand.Note: the returned ClosableIterator should be closed when it's no longer used to avoid resource leak.
- Definition Classes
- HadoopFileSystemLogStore → LogStore
- def readAsIterator(fileStatus: FileStatus, hadoopConf: Configuration): ClosableIterator[String]
Load the file represented by given fileStatus and return an iterator of lines.
Load the file represented by given fileStatus and return an iterator of lines. The line break will be removed from each line.
Note-1: the returned ClosableIterator should be closed when it's no longer used to avoid resource leak.
Note-2: Using a stale
FileStatusmay get an incorrect result.- Definition Classes
- LogStore
- def resolvePathOnPhysicalStorage(path: Path, hadoopConf: Configuration): Path
Resolve the fully qualified path for the given
path.Resolve the fully qualified path for the given
path.Note: The default implementation ignores the
hadoopConfparameter to provide the backward compatibility. Subclasses should override this method and usehadoopConfproperly to support passing Hadoop file system configurations through DataFrame options.- Definition Classes
- HadoopFileSystemLogStore → LogStore
- def resolvePathOnPhysicalStorage(path: Path): Path
Resolve the fully qualified path for the given
path.Resolve the fully qualified path for the given
path.- Definition Classes
- HadoopFileSystemLogStore → LogStore
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def write(path: Path, actions: Iterator[String], overwrite: Boolean, hadoopConf: Configuration): Unit
Write the given
actionsto the givenpathwith or without overwrite as indicated.Write the given
actionsto the givenpathwith or without overwrite as indicated. Implementation must throw java.nio.file.FileAlreadyExistsException exception if the file already exists and overwrite = false. Furthermore, implementation must ensure that the entire file is made visible atomically, that is, it should not generate partial files.Note: The default implementation ignores the
hadoopConfparameter to provide the backward compatibility. Subclasses should override this method and usehadoopConfproperly to support passing Hadoop file system configurations through DataFrame options.- Definition Classes
- S3SingleDriverLogStore → LogStore
- def write(path: Path, actions: Iterator[String], overwrite: Boolean = false): Unit
Write the given
actionsto the givenpathwith or without overwrite as indicated.Write the given
actionsto the givenpathwith or without overwrite as indicated. Implementation must throw java.nio.file.FileAlreadyExistsException exception if the file already exists and overwrite = false. Furthermore, implementation must ensure that the entire file is made visible atomically, that is, it should not generate partial files.- Definition Classes
- S3SingleDriverLogStore → LogStore
- def writeWithRename(path: Path, actions: Iterator[String], overwrite: Boolean, hadoopConf: Configuration): Unit
An internal write implementation that uses FileSystem.rename().
An internal write implementation that uses FileSystem.rename().
This implementation should only be used for the underlying file systems that support atomic renames, e.g., Azure is OK but HDFS is not.
- Attributes
- protected
- Definition Classes
- HadoopFileSystemLogStore
Deprecated Value Members
- final def listFrom(path: String): Iterator[FileStatus]
List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given
path.List the paths in the same directory that are lexicographically greater or equal to (UTF-8 sorting) the given
path. The result should also be sorted by the file name.- Definition Classes
- LogStore
- Annotations
- @deprecated
- Deprecated
call the method that asks for a Hadoop Configuration object instead
- final def read(path: String): Seq[String]
Load the given file and return a
Seqof lines.Load the given file and return a
Seqof lines. The line break will be removed from each line. This method will load the entire file into the memory. CallreadAsIteratorif possible as its implementation may be more efficient.- Definition Classes
- LogStore
- Annotations
- @deprecated
- Deprecated
call the method that asks for a Hadoop Configuration object instead
- final def readAsIterator(path: String): ClosableIterator[String]
Load the given file and return an iterator of lines.
Load the given file and return an iterator of lines. The line break will be removed from each line. The default implementation calls
readto load the entire file into the memory. An implementation should provide a more efficient approach if possible. For example, the file content can be loaded on demand.- Definition Classes
- LogStore
- Annotations
- @deprecated
- Deprecated
call the method that asks for a Hadoop Configuration object instead
- final def write(path: String, actions: Iterator[String]): Unit
Write the given
actionsto the givenpathwithout overwriting any existing file.Write the given
actionsto the givenpathwithout overwriting any existing file. Implementation must throw java.nio.file.FileAlreadyExistsException exception if the file already exists. Furthermore, implementation must ensure that the entire file is made visible atomically, that is, it should not generate partial files.- Definition Classes
- LogStore
- Annotations
- @deprecated
- Deprecated
call the method that asks for a Hadoop Configuration object instead
- def writeWithRename(path: Path, actions: Iterator[String], overwrite: Boolean = false): Unit
An internal write implementation that uses FileSystem.rename().
An internal write implementation that uses FileSystem.rename().
This implementation should only be used for the underlying file systems that support atomic renames, e.g., Azure is OK but HDFS is not.
- Attributes
- protected
- Definition Classes
- HadoopFileSystemLogStore
- Annotations
- @deprecated
- Deprecated
call the method that asks for a Hadoop Configuration object instead