public abstract class GoogleHadoopFileSystemBase extends org.apache.hadoop.fs.FileSystem implements FileSystemDescriptor
Users interact with the files in the storage using fully qualified URIs.
The file system exposed by this class is identified using the 'gs' scheme.
For example, gs://dir1/dir2/file1.txt.
This implementation translates paths between hadoop Path and GCS URI with the convention that the Hadoop root directly corresponds to the GCS "root", e.g. gs:/. This is convenient for many reasons, such as data portability and close equivalence to gsutil paths, but imposes certain inherited constraints, such as files not being allowed in root (only 'directories' can be placed in root), and directory names inside root have a more limited set of allowed characters.
One of the main goals of this implementation is to maintain compatibility with behavior of HDFS implementation when accessed through FileSystem interface. HDFS implementation is not very consistent about the cases when it throws versus the cases when methods return false. We run GHFS tests and HDFS tests against the same test data and use that as a guide to decide whether to throw or to return false.
| Modifier and Type | Class and Description |
|---|---|
static class |
GoogleHadoopFileSystemBase.Counter
Defines names of counters we track for each operation.
|
protected static class |
GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior
Behavior of listStatus when a path is not found.
|
static class |
GoogleHadoopFileSystemBase.ParentTimestampUpdateIncludePredicate
A predicate that processes individual directory paths and evaluates the conditions set in
fs.gs.parent.timestamp.update.enable, fs.gs.parent.timestamp.update.substrings.include and
fs.gs.parent.timestamp.update.substrings.exclude to determine if a path should be ignored
when running directory timestamp updates.
|
| Constructor and Description |
|---|
GoogleHadoopFileSystemBase()
Constructs an instance of GoogleHadoopFileSystemBase; the internal
GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called.
|
GoogleHadoopFileSystemBase(GoogleCloudStorageFileSystem gcsfs)
Constructs an instance of GoogleHadoopFileSystemBase using the provided
GoogleCloudStorageFileSystem; initialize() will not re-initialize it.
|
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.fs.FSDataOutputStream |
append(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize,
org.apache.hadoop.util.Progressable progress)
Appends to an existing file (optional operation).
|
protected void |
checkPath(org.apache.hadoop.fs.Path path) |
void |
close() |
void |
completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
void |
configureBuckets(String systemBucketName,
boolean createSystemBucket)
Validates and possibly creates the system bucket.
|
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path[] srcs,
org.apache.hadoop.fs.Path dst) |
void |
copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
void |
copyToLocalFile(boolean delSrc,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst) |
org.apache.hadoop.fs.FSDataOutputStream |
create(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
boolean overwrite,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
Opens the given file for writing.
|
protected com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> |
createCounterMap() |
boolean |
delete(org.apache.hadoop.fs.Path f)
Deprecated.
Use
delete(Path, boolean) instead |
boolean |
delete(org.apache.hadoop.fs.Path hadoopPath,
boolean recursive)
Deletes the given file or directory.
|
boolean |
deleteOnExit(org.apache.hadoop.fs.Path f) |
String |
getCanonicalServiceName()
Returns null, because GHFS does not use security tokens.
|
org.apache.hadoop.fs.ContentSummary |
getContentSummary(org.apache.hadoop.fs.Path f) |
long |
getDefaultBlockSize() |
protected int |
getDefaultPort()
The default port is listed as -1 as an indication that ports are not used.
|
short |
getDefaultReplication()
Gets the default replication factor.
|
abstract org.apache.hadoop.fs.Path |
getDefaultWorkingDirectory()
Gets the default value of working directory.
|
org.apache.hadoop.security.token.Token<?> |
getDelegationToken(String renewer) |
org.apache.hadoop.fs.FileChecksum |
getFileChecksum(org.apache.hadoop.fs.Path f) |
org.apache.hadoop.fs.FileStatus |
getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
Gets status of the given path item.
|
abstract org.apache.hadoop.fs.Path |
getFileSystemRoot()
Returns the Hadoop path representing the root of the FileSystem associated with this
FileSystemDescriptor.
|
abstract URI |
getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
Gets GCS path corresponding to the given Hadoop path, which can be relative or absolute,
and can have either gs://
|
abstract org.apache.hadoop.fs.Path |
getHadoopPath(URI gcsPath)
Gets Hadoop path corresponding to the given GCS path.
|
String |
getHadoopScheme()
Deprecated.
|
org.apache.hadoop.fs.Path |
getHomeDirectory()
Returns home directory of the current user.
|
protected abstract String |
getHomeDirectorySubpath()
Returns an unqualified path without any leading slash, relative to the filesystem root,
which serves as the home directory of the current user; see
getHomeDirectory for
a description of what the home directory means. |
abstract String |
getScheme()
Returns the URI scheme for the Hadoop FileSystem associated with this FileSystemDescriptor.
|
URI |
getUri()
Returns a URI of the root of this FileSystem.
|
long |
getUsed() |
org.apache.hadoop.fs.Path |
getWorkingDirectory()
Gets the current working directory.
|
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern)
Returns an array of FileStatus objects whose path names match pathPattern.
|
org.apache.hadoop.fs.FileStatus[] |
globStatus(org.apache.hadoop.fs.Path pathPattern,
org.apache.hadoop.fs.PathFilter filter)
Returns an array of FileStatus objects whose path names match pathPattern
and is accepted by the user-supplied path filter.
|
void |
initialize(URI path,
org.apache.hadoop.conf.Configuration config)
See
initialize(URI, Configuration, boolean) for details; calls with third arg
defaulting to 'true' for initializing the superclass. |
void |
initialize(URI path,
org.apache.hadoop.conf.Configuration config,
boolean initSuperclass)
Initializes this file system instance.
|
org.apache.hadoop.fs.FileStatus[] |
listStatus(org.apache.hadoop.fs.Path hadoopPath)
Lists file status.
|
org.apache.hadoop.fs.Path |
makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root it's own parent.
|
boolean |
mkdirs(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission)
Makes the given path and all non-existent parents directories.
|
org.apache.hadoop.fs.FSDataInputStream |
open(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize)
Opens the given file for reading.
|
protected void |
processDeleteOnExit() |
boolean |
rename(org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
Renames src to dst.
|
protected void |
setListStatusFileNotFoundBehavior(GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior behavior) |
void |
setOwner(org.apache.hadoop.fs.Path p,
String username,
String groupname) |
void |
setPermission(org.apache.hadoop.fs.Path p,
org.apache.hadoop.fs.permission.FsPermission permission) |
void |
setTimes(org.apache.hadoop.fs.Path p,
long mtime,
long atime) |
void |
setVerifyChecksum(boolean verifyChecksum) |
void |
setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
Sets the current working directory to the given path.
|
org.apache.hadoop.fs.Path |
startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile) |
addDelegationTokens, append, append, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createNewFile, createNonRecursive, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getBlockSize, getCanonicalUri, getChildFileSystems, getDefaultBlockSize, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileBlockLocations, getFileLinkStatus, getFileSystemClass, getFSofPath, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, removeAcl, removeAclEntries, removeDefaultAcl, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setReplication, setWriteChecksum, supportsSymlinkspublic static final LogUtil log
public static final short REPLICATION_FACTOR_DEFAULT
public static final String BUFFERSIZE_KEY
public static final int BUFFERSIZE_DEFAULT
public static final String WRITE_BUFFERSIZE_KEY
public static final int WRITE_BUFFERSIZE_DEFAULT
public static final String BLOCK_SIZE_KEY
public static final int BLOCK_SIZE_DEFAULT
public static final String AUTHENTICATION_PREFIX
public static final String ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY
public static final String SERVICE_ACCOUNT_AUTH_EMAIL_KEY
public static final String SERVICE_ACCOUNT_AUTH_KEYFILE_KEY
public static final String GCS_PROJECT_ID_KEY
public static final String GCS_CLIENT_ID_KEY
public static final String GCS_CLIENT_SECRET_KEY
public static final String GCS_SYSTEM_BUCKET_KEY
public static final String GCS_CREATE_SYSTEM_BUCKET_KEY
public static final boolean GCS_CREATE_SYSTEM_BUCKET_DEFAULT
public static final String GCS_WORKING_DIRECTORY_KEY
public static final String GCS_FILE_SIZE_LIMIT_250GB
public static final boolean GCS_FILE_SIZE_LIMIT_250GB_DEFAULT
public static final String GCS_ENABLE_METADATA_CACHE_KEY
public static final boolean GCS_ENABLE_METADATA_CACHE_DEFAULT
public static final String GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY
public static final boolean GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_DEFAULT
public static final String GCS_METADATA_CACHE_TYPE_KEY
public static final String GCS_METADATA_CACHE_TYPE_DEFAULT
public static final String GCS_METADATA_CACHE_DIRECTORY_KEY
public static final String GCS_METADATA_CACHE_DIRECTORY_DEFAULT
public static final String GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_DEFAULT
public static final String MR_JOB_HISTORY_INTERMEDIATE_DONE_DIR_KEY
public static final String MR_JOB_HISTORY_DONE_DIR_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY
public static final String GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_DEFAULT
public static final String GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY
public static final boolean GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_DEFAULT
public static final String GCS_ENABLE_FLAT_GLOB_KEY
public static final boolean GCS_ENABLE_FLAT_GLOB_DEFAULT
public static final String GCS_ENABLE_MARKER_FILE_CREATION_KEY
public static final boolean GCS_ENABLE_MARKER_FILE_CREATION_DEFAULT
public static final org.apache.hadoop.fs.PathFilter DEFAULT_FILTER
public static final String PROPERTIES_FILE
public static final String VERSION_PROPERTY
public static final String UNKNOWN_VERSION
public static final String VERSION
public static final String GHFS_ID
protected URI initUri
@Deprecated protected String systemBucket
protected GoogleCloudStorageFileSystem gcsfs
protected long defaultBlockSize
protected final com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> counters
protected GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior listStatusFileNotFoundBehavior
public GoogleHadoopFileSystemBase()
public GoogleHadoopFileSystemBase(GoogleCloudStorageFileSystem gcsfs)
protected com.google.common.collect.ImmutableMap<GoogleHadoopFileSystemBase.Counter,AtomicLong> createCounterMap()
protected void setListStatusFileNotFoundBehavior(GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior behavior)
protected abstract String getHomeDirectorySubpath()
getHomeDirectory for
a description of what the home directory means.public abstract org.apache.hadoop.fs.Path getHadoopPath(URI gcsPath)
gcsPath - Fully-qualified GCS path, of the form gs://public abstract URI getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
hadoopPath - Hadoop path.public abstract org.apache.hadoop.fs.Path getDefaultWorkingDirectory()
public abstract org.apache.hadoop.fs.Path getFileSystemRoot()
FileSystemDescriptorgetFileSystemRoot in interface FileSystemDescriptorpublic abstract String getScheme()
FileSystemDescriptorgetScheme in interface FileSystemDescriptorgetScheme in class org.apache.hadoop.fs.FileSystem@Deprecated public String getHadoopScheme()
FileSystemDescriptorgetHadoopScheme in interface FileSystemDescriptorpublic org.apache.hadoop.fs.Path makeQualified(org.apache.hadoop.fs.Path path)
Overridden to make root it's own parent. This is POSIX compliant, but more importantly guards against poor directory accounting in the PathData class of Hadoop 2's FsShell.
makeQualified in class org.apache.hadoop.fs.FileSystemprotected void checkPath(org.apache.hadoop.fs.Path path)
checkPath in class org.apache.hadoop.fs.FileSystempublic void initialize(URI path, org.apache.hadoop.conf.Configuration config) throws IOException
initialize(URI, Configuration, boolean) for details; calls with third arg
defaulting to 'true' for initializing the superclass.initialize in class org.apache.hadoop.fs.FileSystempath - URI of a file/directory within this file system.config - Hadoop configuration.IOExceptionpublic void initialize(URI path, org.apache.hadoop.conf.Configuration config, boolean initSuperclass) throws IOException
path - URI of a file/directory within this file system.config - Hadoop configuration.initSuperclass - if false, doesn't call super.initialize(path, config); avoids
registering a global Statistics object for this instance.IOExceptionpublic URI getUri()
getUri in class org.apache.hadoop.fs.FileSystemprotected int getDefaultPort()
getDefaultPort in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize)
throws IOException
open in class org.apache.hadoop.fs.FileSystemhadoopPath - File to open.bufferSize - Size of buffer to use for IO.FileNotFoundException - if the given path does not exist.IOException - if an error occurs.public org.apache.hadoop.fs.FSDataOutputStream create(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
boolean overwrite,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress)
throws IOException
create in class org.apache.hadoop.fs.FileSystemhadoopPath - The file to open.permission - Permissions to set on the new file. Ignored.overwrite - If a file with this name already exists, then if true,
the file will be overwritten, and if false an error will be thrown.bufferSize - The size of the buffer to use.replication - Required block replication for the file. Ignored.blockSize - The block-size to be used for the new file. Ignored.progress - Progress is reported through this. Ignored.IOException - if an error occurs.setPermission(Path, FsPermission)public org.apache.hadoop.fs.FSDataOutputStream append(org.apache.hadoop.fs.Path hadoopPath,
int bufferSize,
org.apache.hadoop.util.Progressable progress)
throws IOException
append in class org.apache.hadoop.fs.FileSystemhadoopPath - The existing file to be appended.bufferSize - The size of the buffer to be used.progress - For reporting progress if it is not null.IOException - if an error occurs.public boolean rename(org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
rename in class org.apache.hadoop.fs.FileSystemsrc - Source path.dst - Destination path.FileNotFoundException - if src does not exist.IOException - if an error occurs.@Deprecated public boolean delete(org.apache.hadoop.fs.Path f) throws IOException
delete(Path, boolean) insteaddelete in class org.apache.hadoop.fs.FileSystemIOExceptionpublic boolean delete(org.apache.hadoop.fs.Path hadoopPath,
boolean recursive)
throws IOException
delete in class org.apache.hadoop.fs.FileSystemhadoopPath - The path to delete.recursive - If path is a directory and set to
true, the directory is deleted, else throws an exception.
In case of a file, the recursive parameter is ignored.IOException - if an error occurs.public org.apache.hadoop.fs.FileStatus[] listStatus(org.apache.hadoop.fs.Path hadoopPath)
throws IOException
listStatus in class org.apache.hadoop.fs.FileSystemhadoopPath - Given path.IOException - if an error occurs.public void setWorkingDirectory(org.apache.hadoop.fs.Path hadoopPath)
setWorkingDirectory in class org.apache.hadoop.fs.FileSystemhadoopPath - New working directory.public org.apache.hadoop.fs.Path getWorkingDirectory()
getWorkingDirectory in class org.apache.hadoop.fs.FileSystempublic boolean mkdirs(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission)
throws IOException
mkdirs in class org.apache.hadoop.fs.FileSystemhadoopPath - Given path.permission - Permissions to set on the given directory.IOException - if an error occurs.public short getDefaultReplication()
getDefaultReplication in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.Path hadoopPath)
throws IOException
getFileStatus in class org.apache.hadoop.fs.FileSystemhadoopPath - The path we want information about.FileNotFoundException - when the path does not exist;IOException - on other errors.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern)
throws IOException
globStatus in class org.apache.hadoop.fs.FileSystempathPattern - A regular expression specifying the path pattern.IOException - if an error occurs.public org.apache.hadoop.fs.FileStatus[] globStatus(org.apache.hadoop.fs.Path pathPattern,
org.apache.hadoop.fs.PathFilter filter)
throws IOException
globStatus in class org.apache.hadoop.fs.FileSystempathPattern - A regular expression specifying the path pattern.filter - A user-supplied path filter.IOException - if an error occurs.public org.apache.hadoop.fs.Path getHomeDirectory()
getHomeDirectory in class org.apache.hadoop.fs.FileSystempublic String getCanonicalServiceName()
getCanonicalServiceName in class org.apache.hadoop.fs.FileSystempublic void configureBuckets(String systemBucketName, boolean createSystemBucket) throws IOException
systemBucketName - Name of system bucketcreateSystemBucket - Whether or not to create systemBucketName if it does not exist.IOException - if systemBucketName is invalid or cannot be found.
and createSystemBucket is false.public boolean deleteOnExit(org.apache.hadoop.fs.Path f)
throws IOException
deleteOnExit in class org.apache.hadoop.fs.FileSystemIOExceptionprotected void processDeleteOnExit()
processDeleteOnExit in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path f)
throws IOException
getContentSummary in class org.apache.hadoop.fs.FileSystemIOExceptionpublic org.apache.hadoop.security.token.Token<?> getDelegationToken(String renewer) throws IOException
getDelegationToken in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path[] srcs,
org.apache.hadoop.fs.Path dst)
throws IOException
copyFromLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyFromLocalFile(boolean delSrc,
boolean overwrite,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
copyFromLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void copyToLocalFile(boolean delSrc,
org.apache.hadoop.fs.Path src,
org.apache.hadoop.fs.Path dst)
throws IOException
copyToLocalFile in class org.apache.hadoop.fs.FileSystemIOExceptionpublic org.apache.hadoop.fs.Path startLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile)
throws IOException
startLocalOutput in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void completeLocalOutput(org.apache.hadoop.fs.Path fsOutputFile,
org.apache.hadoop.fs.Path tmpLocalFile)
throws IOException
completeLocalOutput in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableclose in class org.apache.hadoop.fs.FileSystemIOExceptionpublic long getUsed()
throws IOException
getUsed in class org.apache.hadoop.fs.FileSystemIOExceptionpublic long getDefaultBlockSize()
getDefaultBlockSize in class org.apache.hadoop.fs.FileSystempublic org.apache.hadoop.fs.FileChecksum getFileChecksum(org.apache.hadoop.fs.Path f)
throws IOException
getFileChecksum in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setVerifyChecksum(boolean verifyChecksum)
setVerifyChecksum in class org.apache.hadoop.fs.FileSystempublic void setPermission(org.apache.hadoop.fs.Path p,
org.apache.hadoop.fs.permission.FsPermission permission)
throws IOException
setPermission in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setOwner(org.apache.hadoop.fs.Path p,
String username,
String groupname)
throws IOException
setOwner in class org.apache.hadoop.fs.FileSystemIOExceptionpublic void setTimes(org.apache.hadoop.fs.Path p,
long mtime,
long atime)
throws IOException
setTimes in class org.apache.hadoop.fs.FileSystemIOExceptionCopyright © 2015. All rights reserved.