public abstract class GoogleHadoopFileSystemBase extends org.apache.hadoop.fs.FileSystem implements FileSystemDescriptor
It is implemented as a thin abstraction layer on top of GCS. The layer hides any specific characteristics of the underlying store and exposes FileSystem interface understood by the Hadoop engine.
Users interact with the files in the storage using fully qualified URIs. The file system
exposed by this class is identified using the 'gs' scheme. For example, gs://dir1/dir2/file1.txt.
This implementation translates paths between hadoop Path and GCS URI with the convention that the Hadoop root directly corresponds to the GCS "root", e.g. gs:/. This is convenient for many reasons, such as data portability and close equivalence to gsutil paths, but imposes certain inherited constraints, such as files not being allowed in root (only 'directories' can be placed in root), and directory names inside root have a more limited set of allowed characters.
One of the main goals of this implementation is to maintain compatibility with behavior of HDFS implementation when accessed through FileSystem interface. HDFS implementation is not very consistent about the cases when it throws versus the cases when methods return false. We run GHFS tests and HDFS tests against the same test data and use that as a guide to decide whether to throw or to return false.
| Modifier and Type | Class and Description |
|---|---|
static class |
GoogleHadoopFileSystemBase.GcsFileChecksumType
Available GCS checksum types for use with
GoogleHadoopFileSystemConfiguration.GCS_FILE_CHECKSUM_TYPE. |
static class |
GoogleHadoopFileSystemBase.GlobAlgorithm
Available GCS glob algorithms for use with
GoogleHadoopFileSystemConfiguration.GCS_GLOB_ALGORITHM. |
static interface |
GoogleHadoopFileSystemBase.InvocationRaisingIOE<R> |
static class |
GoogleHadoopFileSystemBase.OutputStreamType
Available types for use with
GoogleHadoopFileSystemConfiguration.GCS_OUTPUT_STREAM_TYPE. |
| Modifier and Type | Field and Description |
|---|---|
static org.apache.hadoop.fs.PathFilter |
DEFAULT_FILTER
Default PathFilter that accepts all paths.
|
protected long |
defaultBlockSize
Default block size.
|
protected GcsDelegationTokens |
delegationTokens
Delegation token support
|
static String |
GHFS_ID
Identifies this version of the GoogleHadoopFileSystemBase library.
|
protected URI |
initUri
The URI the File System is passed in initialize.
|
static String |
PROPERTIES_FILE
A resource file containing GCS related build properties.
|
static short |
REPLICATION_FACTOR_DEFAULT
Default value of replication factor.
|
static String |
UNKNOWN_VERSION
The version returned when one cannot be found in properties.
|
static String |
VERSION
Current version.
|
static String |
VERSION_PROPERTY
The key in the PROPERTIES_FILE that contains the version built.
|
| Constructor and Description |
|---|
GoogleHadoopFileSystemBase()
Constructs an instance of GoogleHadoopFileSystemBase; the internal
GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called. |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.fs.FSDataOutputStream |
append(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize,
org.apache.hadoop.util.Progressable progress)
Appends to an existing file (optional operation).
|
protected void |
checkPath(org.apache.hadoop.fs.Path path) |
void |
close() |
void |
completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
void |
concat(org.apache.hadoop.fs.Path tgt,
org.apache.hadoop.fs.Path[] srcs)
Concat existing files into one file.
|
protected abstract void |
configureBuckets(GoogleCloudStorageFileSystem gcsFs)
Validates and possibly creates buckets needed by subclass.
|
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path[] srcs,
org.apache.hadoop.fs.Path dst) |
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
void |
copyToLocalFile(boolean delSrc,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
org.apache.hadoop.fs.FSDataOutputStream |
create(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
boolean overwrite,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
Opens the given file for writing.
|
org.apache.hadoop.fs.FSDataOutputStream |
createNonRecursive(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
EnumSet<org.apache.hadoop.fs.CreateFlag> flags,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress) |
boolean |
delete(org.apache.hadoop.fs.Path hadoopPath,
boolean recursive)
Deletes the given file or directory.
|
boolean |
deleteOnExit(org.apache.hadoop.fs.Path f) |
String |
getCanonicalServiceName() |
org.apache.hadoop.fs.ContentSummary |
getContentSummary(org.apache.hadoop.fs.Path f) |
long |
getDefaultBlockSize() |
protected int |
getDefaultPort()
The default port is listed as -1 as an indication that ports are not used.
|
short |
getDefaultReplication()
Gets the default replication factor.
|
abstract org.apache.hadoop.fs.Path |
getDefaultWorkingDirectory()
Gets the default value of working directory.
|
org.apache.hadoop.security.token.Token<?> |
getDelegationToken(String renewer) |
org.apache.hadoop.fs.FileChecksum |
getFileChecksum(org.apache.hadoop.fs.Path hadoopPath) |
org.apache.hadoop.fs.FileStatus |
getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
Gets status of the given path item.
|
abstract org.apache.hadoop.fs.Path |
getFileSystemRoot()
Returns the Hadoop path representing the root of the FileSystem associated with this
FileSystemDescriptor.
|
GoogleCloudStorageFileSystem |
getGcsFs()
Gets GCS FS instance.
|
abstract URI |
getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
Gets GCS path corresponding to the given Hadoop path, which can be relative or absolute, and
can have either
gs://<path> or gs:/<path> forms. |
abstract org.apache.hadoop.fs.Path |
getHadoopPath(URI gcsPath)
Gets Hadoop path corresponding to the given GCS path.
|
org.apache.hadoop.fs.Path |
getHomeDirectory()
Returns home directory of the current user.
|
protected abstract String |
getHomeDirectorySubpath()
Returns an unqualified path without any leading slash, relative to the filesystem root, which
serves as the home directory of the current user; see
getHomeDirectory for a
description of what the home directory means. |
abstract String |
getScheme()
Returns the URI scheme for the Hadoop FileSystem associated with this FileSystemDescriptor.
|
GhfsStorageStatistics |
getStorageStatistics()
Get the storage statistics of this filesystem.
|
URI |
getUri()
Returns a URI of the root of this FileSystem.
|
long |
getUsed() |
org.apache.hadoop.fs.Path |
getWorkingDirectory()
Gets the current working directory.
|
byte[] |
getXAttr(org.apache.hadoop.fs.Path path,
String name) |
Map<String,byte[]> |
getXAttrs(org.apache.hadoop.fs.Path path) |
Map<String,byte[]> |
getXAttrs(org.apache.hadoop.fs.Path path,
List<String> names) |
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern)
Returns an array of FileStatus objects whose path names match pathPattern.
|
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern,
org.apache.hadoop.fs.PathFilter filter)
Returns an array of FileStatus objects whose path names match pathPattern and is accepted by
the user-supplied path filter.
|
boolean |
hasPathCapability(org.apache.hadoop.fs.Path path,
String capability) |
void |
initialize(URI path,
org.apache.hadoop.conf.Configuration config)
Initializes this file system instance.
|
org.apache.hadoop.fs.FileStatus[] |
listStatus(org.apache.hadoop.fs.Path hadoopPath)
Lists file status.
|
List<String> |
listXAttrs(org.apache.hadoop.fs.Path path) |
org.apache.hadoop.fs.Path |
makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root its own parent.
|
boolean |
mkdirs(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission)
Makes the given path and all non-existent parents directories.
|
org.apache.hadoop.fs.FSDataInputStream |
open(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize)
Opens the given file for reading.
|
protected void |
processDeleteOnExit() |
void |
removeXAttr(org.apache.hadoop.fs.Path path,
String name) |
boolean |
rename(org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
Renames src to dst.
|
void |
setOwner(org.apache.hadoop.fs.Path p,
String username,
String groupname) |
void |
setPermission(org.apache.hadoop.fs.Path p,
org.apache.hadoop.fs.permission.FsPermission permission) |
void |
setTimes(org.apache.hadoop.fs.Path p,
long mtime,
long atime) |
void |
setVerifyChecksum(boolean verifyChecksum) |
void |
setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
Sets the current working directory to the given path.
|
void |
setXAttr(org.apache.hadoop.fs.Path path,
String name,
byte[] value,
EnumSet<org.apache.hadoop.fs.XAttrSetFlag> flags) |
org.apache.hadoop.fs.Path |
startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
access, addDelegationTokens, append, append, appendFile, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, clearStatistics, closeAll, closeAllForUGI, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createFile, createNewFile, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, delete, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getAllStoragePolicies, getBlockSize, getCanonicalUri, getChildFileSystems, getDefaultBlockSize, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getQuotaUsage, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getStoragePolicy, getTrashRoot, getTrashRoots, getUsed, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusBatch, listStatusIterator, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, msync, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, removeAcl, removeAclEntries, removeDefaultAcl, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setReplication, setStoragePolicy, setWriteChecksum, setXAttr, supportsSymlinks, truncate, unsetStoragePolicypublic static final short REPLICATION_FACTOR_DEFAULT
public static final org.apache.hadoop.fs.PathFilter DEFAULT_FILTER
public static final String PROPERTIES_FILE
public static final String VERSION_PROPERTY
public static final String UNKNOWN_VERSION
public static final String VERSION
public static final String GHFS_ID
protected URI initUri
protected GcsDelegationTokens delegationTokens
protected long defaultBlockSize
public GoogleHadoopFileSystemBase()
GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called.protected abstract String getHomeDirectorySubpath()
getHomeDirectory for a
description of what the home directory means.public abstract org.apache.hadoop.fs.Path getHadoopPath(URI gcsPath)
gcsPath - Fully-qualified GCS path, of the form gs://bucket/object-path.public abstract URI getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
gs://<path> or gs:/<path> forms.hadoopPath - Hadoop path.public abstract org.apache.hadoop.fs.Path getDefaultWorkingDirectory()
public abstract org.apache.hadoop.fs.Path getFileSystemRoot()
FileSystemDescriptorgetFileSystemRoot in interface FileSystemDescriptorpublic abstract String getScheme()
FileSystemDescriptorgetScheme in interface FileSystemDescriptorgetScheme in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.Path makeQualified(org.apache.hadoop.fs.Path path)
makeQualified in class org.apache.hadoop.fs.FileSystemprotected void checkPath(org.apache.hadoop.fs.Path path)
checkPath in class org.apache.hadoop.fs.FileSystempublic void initialize(URI path, org.apache.hadoop.conf.Configuration config) throws IOException
Note: The path passed to this method could be path of any file/directory. It does not matter because the only thing we check is whether it uses 'gs' scheme. The rest is ignored.
initialize in class org.apache.hadoop.fs.FileSystempath - URI of a file/directory within this file system.config - Hadoop configuration.IOExceptionpublic URI getUri()
getUri in class org.apache.hadoop.fs.FileSystemprotected int getDefaultPort()
getDefaultPort in class org.apache.hadoop.fs.FileSystempublic boolean hasPathCapability(org.apache.hadoop.fs.Path path,
String capability)
throws IOException
IOExceptionpublic org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize)
throws IOException
open in class org.apache.hadoop.fs.FileSystemhadoopPath - File to open.bufferSize - Size of buffer to use for IO.FileNotFoundException - if the given path does not exist.IOException - if an error occurs.public org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
boolean overwrite,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
throws IOException
Note: This function overrides the given bufferSize value with a higher number unless further
overridden using configuration parameter fs.gs.outputstream.buffer.size.
create in class org.apache.hadoop.fs.FileSystemhadoopPath - The file to open.permission - Permissions to set on the new file. Ignored.overwrite - If a file with this name already exists, then if true, the file will be
overwritten, and if false an error will be thrown.bufferSize - The size of the buffer to use.replication - Required block replication for the file. Ignored.blockSize - The block-size to be used for the new file. Ignored.progress - Progress is reported through this. Ignored.IOException - if an error occurs.setPermission(Path, FsPermission)public org.apache.hadoop.fs.FSDataOutputStream createNonRecursive(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
EnumSet<org.apache.hadoop.fs.CreateFlag> flags,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
throws IOException
createNonRecursive in class org.apache.hadoop.fs.FileSystemIOExceptionpublic org.apache.hadoop.fs.FSDataOutputStream append(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize,
org.apache.hadoop.util.Progressable progress)
throws IOException
append in class org.apache.hadoop.fs.FileSystemhadoopPath - The existing file to be appended.bufferSize - The size of the buffer to be used.progress - For reporting progress if it is not null.IOException - if an error occurs.public void concat(org.apache.hadoop.fs.Path tgt,
org.apache.hadoop.fs.Path[] srcs)
throws IOException
concat in class org.apache.hadoop.fs.FileSystemtgt - the path to the target destination.srcs - the paths to the sources to use for the concatenation.IOException - IO failurepublic boolean rename(org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
rename in class org.apache.hadoop.fs.FileSystemsrc - Source path.dst - Destination path.IOException - if an error occurs.public boolean delete(org.apache.hadoop.fs.Path hadoopPath,
boolean recursive)
throws IOException
delete in class org.apache.hadoop.fs.FileSystemhadoopPath - The path to delete.recursive - If path is a directory and set to true, the directory is deleted, else throws
an exception. In case of a file, the recursive parameter is ignored.IOException - if an error occurs.public org.apache.hadoop.fs.FileStatus[] listStatus(org.apache.hadoop.fs.Path hadoopPath)
throws IOException
listStatus in class org.apache.hadoop.fs.FileSystemhadoopPath - Given path.IOException - if an error occurs.public void setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
setWorkingDirectory in class org.apache.hadoop.fs.FileSystemhadoopPath - New working directory.public org.apache.hadoop.fs.Path getWorkingDirectory()
getWorkingDirectory in class org.apache.hadoop.fs.FileSystempublic boolean mkdirs(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission)
throws IOException
mkdirs in class org.apache.hadoop.fs.FileSystemhadoopPath - Given path.permission - Permissions to set on the given directory.IOException - if an error occurs.public short getDefaultReplication()
getDefaultReplication in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
throws IOException
getFileStatus in class org.apache.hadoop.fs.FileSystemhadoopPath - The path we want information about.FileNotFoundException - when the path does not exist;IOException - on other errors.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern)
throws IOException
Return null if pathPattern has no glob and the path does not exist. Return an empty array if pathPattern has a glob and no path matches it.
globStatus in class org.apache.hadoop.fs.FileSystempathPattern - A regular expression specifying the path pattern.IOException - if an error occurs.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern,
org.apache.hadoop.fs.PathFilter filter)
throws IOException
Return null if pathPattern has no glob and the path does not exist. Return an empty array if pathPattern has a glob and no path matches it.
globStatus in class org.apache.hadoop.fs.FileSystempathPattern - A regular expression specifying the path pattern.filter - A user-supplied path filter.IOException - if an error occurs.public org.apache.hadoop.fs.Path getHomeDirectory()
Note: This directory is only used for Hadoop purposes. It is not the same as a user's OS home directory.
getHomeDirectory in class org.apache.hadoop.fs.FileSystempublic String getCanonicalServiceName()
Returns the service if delegation tokens are configured, otherwise, null.
getCanonicalServiceName in class org.apache.hadoop.fs.FileSystempublic GoogleCloudStorageFileSystem getGcsFs()
protected abstract void configureBuckets(GoogleCloudStorageFileSystem gcsFs) throws IOException
gcsFs - GoogleCloudStorageFileSystem to configure bucketsIOException - if bucket name is invalid or cannot be found.public boolean deleteOnExit(org.apache.hadoop.fs.Path f)
throws IOException
deleteOnExit in class org.apache.hadoop.fs.FileSystemIOExceptionprotected void processDeleteOnExit()
processDeleteOnExit in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path f)
throws IOException
getContentSummary in class org.apache.hadoop.fs.FileSystemIOExceptionpublic org.apache.hadoop.security.token.Token<?> getDelegationToken(String renewer) throws IOException
getDelegationToken in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path[] srcs,
org.apache.hadoop.fs.Path dst)
throws IOException
copyFromLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
copyFromLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyToLocalFile(boolean delSrc,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
copyToLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic org.apache.hadoop.fs.Path startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile)
throws IOException
startLocalOutput in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile)
throws IOException
completeLocalOutput in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class org.apache.hadoop.fs.FileSystemIOExceptionpublic long getUsed()
throws IOException
getUsed in class org.apache.hadoop.fs.FileSystemIOExceptionpublic long getDefaultBlockSize()
getDefaultBlockSize in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.FileChecksum getFileChecksum(org.apache.hadoop.fs.Path hadoopPath)
throws IOException
getFileChecksum in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setVerifyChecksum(boolean verifyChecksum)
setVerifyChecksum in class org.apache.hadoop.fs.FileSystempublic void setPermission(org.apache.hadoop.fs.Path p,
org.apache.hadoop.fs.permission.FsPermission permission)
throws IOException
setPermission in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setOwner(org.apache.hadoop.fs.Path p,
String username,
String groupname)
throws IOException
setOwner in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setTimes(org.apache.hadoop.fs.Path p,
long mtime,
long atime)
throws IOException
setTimes in class org.apache.hadoop.fs.FileSystemIOExceptionpublic byte[] getXAttr(org.apache.hadoop.fs.Path path,
String name)
throws IOException
getXAttr in class org.apache.hadoop.fs.FileSystemIOExceptionpublic Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.Path path) throws IOException
getXAttrs in class org.apache.hadoop.fs.FileSystemIOExceptionpublic Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.Path path, List<String> names) throws IOException
getXAttrs in class org.apache.hadoop.fs.FileSystemIOExceptionpublic List<String> listXAttrs(org.apache.hadoop.fs.Path path) throws IOException
listXAttrs in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setXAttr(org.apache.hadoop.fs.Path path,
String name,
byte[] value,
EnumSet<org.apache.hadoop.fs.XAttrSetFlag> flags)
throws IOException
setXAttr in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void removeXAttr(org.apache.hadoop.fs.Path path,
String name)
throws IOException
removeXAttr in class org.apache.hadoop.fs.FileSystemIOExceptionpublic GhfsStorageStatistics getStorageStatistics()
getStorageStatistics in class org.apache.hadoop.fs.FileSystemCopyright © 2024. All rights reserved.