@InterfaceAudience.Private @InterfaceStability.Evolving public final class S3AUtils extends Object
| Modifier and Type | Class and Description |
|---|---|
static interface |
S3AUtils.CallOnLocatedFileStatus
An interface for use in lambda-expressions working with
directory tree listings.
|
static interface |
S3AUtils.LocatedFileStatusMap<T>
An interface for use in lambda-expressions working with
directory tree listings.
|
| Modifier and Type | Field and Description |
|---|---|
static org.apache.hadoop.fs.PathFilter |
ACCEPT_ALL
A Path filter which accepts all filenames.
|
static String |
E_FORBIDDEN_AWS_PROVIDER
Error message when the AWS provider list built up contains a forbidden
entry.
|
static String |
EOF_MESSAGE_IN_XML_PARSER |
static String |
EOF_READ_DIFFERENT_LENGTH |
static org.apache.hadoop.fs.PathFilter |
HIDDEN_FILE_FILTER
Path filter which ignores any file which starts with .
|
static String |
SSE_C_NO_KEY_ERROR
Encryption SSE-C used but the config lacks an encryption key.
|
static String |
SSE_S3_WITH_KEY_ERROR
Encryption SSE-S3 is used but the caller also set an encryption key.
|
static List<Class<?>> |
STANDARD_AWS_PROVIDERS
The standard AWS provider list for AWS connections.
|
| Modifier and Type | Method and Description |
|---|---|
static long |
applyLocatedFiles(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator,
S3AUtils.CallOnLocatedFileStatus eval)
Apply an operation to every
LocatedFileStatus in a remote
iterator. |
static AWSCredentialProviderList |
buildAWSProviderList(URI binding,
org.apache.hadoop.conf.Configuration conf,
String key,
List<Class<?>> defaultValues,
Set<Class<?>> forbidden)
Load list of AWS credential provider/credential provider factory classes;
support a forbidden list to prevent loops, mandate full secrets, etc.
|
static EncryptionSecrets |
buildEncryptionSecrets(String bucket,
org.apache.hadoop.conf.Configuration conf)
Get the server-side encryption or client side encryption algorithm.
|
static void |
clearBucketOption(org.apache.hadoop.conf.Configuration conf,
String bucket,
String genericKey)
Clear a bucket-specific property.
|
static void |
closeAll(org.slf4j.Logger log,
Closeable... closeables)
Deprecated.
|
static void |
closeAutocloseables(org.slf4j.Logger log,
AutoCloseable... closeables)
Close the Closeable objects and ignore any Exception or
null pointers.
|
static com.amazonaws.ClientConfiguration |
createAwsConf(org.apache.hadoop.conf.Configuration conf,
String bucket)
Deprecated.
|
static com.amazonaws.ClientConfiguration |
createAwsConf(org.apache.hadoop.conf.Configuration conf,
String bucket,
String awsServiceIdentifier)
Create a new AWS
ClientConfiguration. |
static AWSCredentialProviderList |
createAWSCredentialProviderSet(URI binding,
org.apache.hadoop.conf.Configuration conf)
Create the AWS credentials from the providers, the URI and
the key
Constants.AWS_CREDENTIALS_PROVIDER in the configuration. |
static S3AFileStatus |
createFileStatus(org.apache.hadoop.fs.Path keyPath,
com.amazonaws.services.s3.model.S3ObjectSummary summary,
long blockSize,
String owner,
String eTag,
String versionId,
boolean isCSEEnabled)
Create a files status instance from a listing.
|
static S3AFileStatus |
createUploadFileStatus(org.apache.hadoop.fs.Path keyPath,
boolean isDir,
long size,
long blockSize,
String owner,
String eTag,
String versionId)
Create a file status for object we just uploaded.
|
static long |
dateToLong(Date date)
Date to long conversion.
|
static void |
deleteQuietly(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
boolean recursive)
Delete a path quietly: failures are logged at DEBUG.
|
static void |
deleteWithWarning(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
boolean recursive)
Delete a path: failures are logged at WARN.
|
static int |
ensureOutputParameterInRange(String name,
long size)
Ensure that the long value is in the range of an integer.
|
static IOException |
extractException(String operation,
String path,
ExecutionException ee)
Extract an exception from a failed future, and convert to an IOE.
|
static <T> List<T> |
flatmapLocatedFiles(org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> iterator,
S3AUtils.LocatedFileStatusMap<Optional<T>> eval)
Map an operation to every
LocatedFileStatus in a remote
iterator, returning a list of the all results which were not empty. |
static S3xLoginHelper.Login |
getAWSAccessKeys(URI name,
org.apache.hadoop.conf.Configuration conf)
Return the access key and secret for S3 API use.
|
static String |
getBucketOption(org.apache.hadoop.conf.Configuration conf,
String bucket,
String genericKey)
Get a bucket-specific property.
|
static S3AEncryptionMethods |
getEncryptionAlgorithm(String bucket,
org.apache.hadoop.conf.Configuration conf)
Get the server-side encryption or client side encryption algorithm.
|
static long |
getMultipartSizeProperty(org.apache.hadoop.conf.Configuration conf,
String property,
long defVal)
Get a size property from the configuration: this property must
be at least equal to
Constants.MULTIPART_MIN_SIZE. |
static String |
getS3EncryptionKey(String bucket,
org.apache.hadoop.conf.Configuration conf)
Get any S3 encryption key, without propagating exceptions from
JCEKs files.
|
static String |
getS3EncryptionKey(String bucket,
org.apache.hadoop.conf.Configuration conf,
boolean propagateExceptions)
Get any SSE/CSE key from a configuration/credential provider.
|
static void |
initConnectionSettings(org.apache.hadoop.conf.Configuration conf,
com.amazonaws.ClientConfiguration awsConf)
Initializes all AWS SDK settings related to connection management.
|
static void |
initProxySupport(org.apache.hadoop.conf.Configuration conf,
String bucket,
com.amazonaws.ClientConfiguration awsConf)
Initializes AWS SDK proxy support in the AWS client configuration
if the S3A settings enable it.
|
static int |
intOption(org.apache.hadoop.conf.Configuration conf,
String key,
int defVal,
int min)
Get a integer option >= the minimum allowed value.
|
static boolean |
isMessageTranslatableToEOF(com.amazonaws.SdkBaseException ex)
Cue that an AWS exception is likely to be an EOF Exception based
on the message coming back from the client.
|
static boolean |
isThrottleException(Exception ex)
Is the exception an instance of a throttling exception.
|
static S3AFileStatus[] |
iteratorToStatuses(org.apache.hadoop.fs.RemoteIterator<S3AFileStatus> iterator,
Set<org.apache.hadoop.fs.Path> tombstones)
Convert the data of an iterator of
S3AFileStatus to
an array. |
static List<org.apache.hadoop.fs.LocatedFileStatus> |
listAndFilter(org.apache.hadoop.fs.FileSystem fileSystem,
org.apache.hadoop.fs.Path path,
boolean recursive,
org.apache.hadoop.fs.PathFilter filter)
List located files and filter them as a classic listFiles(path, filter)
would do.
|
static List<Class<?>> |
loadAWSProviderClasses(org.apache.hadoop.conf.Configuration conf,
String key,
Class<?>... defaultValue)
Load list of AWS credential provider/credential provider factory classes.
|
static long |
longBytesOption(org.apache.hadoop.conf.Configuration conf,
String key,
long defVal,
long min)
Get a long option >= the minimum allowed value, supporting memory
prefixes K,M,G,T,P.
|
static long |
longOption(org.apache.hadoop.conf.Configuration conf,
String key,
long defVal,
long min)
Get a long option >= the minimum allowed value.
|
static String |
lookupPassword(String bucket,
org.apache.hadoop.conf.Configuration conf,
String baseKey)
Get a password from a configuration, including JCEKS files, handling both
the absolute key and bucket override.
|
static String |
lookupPassword(String bucket,
org.apache.hadoop.conf.Configuration conf,
String baseKey,
String overrideVal)
Deprecated.
|
static String |
lookupPassword(String bucket,
org.apache.hadoop.conf.Configuration conf,
String baseKey,
String overrideVal,
String defVal)
Get a password from a configuration, including JCEKS files, handling both
the absolute key and bucket override.
|
static <T> List<T> |
mapLocatedFiles(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator,
S3AUtils.LocatedFileStatusMap<T> eval)
Map an operation to every
LocatedFileStatus in a remote
iterator, returning a list of the results. |
static <T> Optional<T> |
maybe(boolean include,
T value)
Convert a value into a non-empty Optional instance if
the value of
include is true. |
static String |
maybeAddTrailingSlash(String key)
Turns a path (relative or otherwise) into an S3 key, adding a trailing
"/" if the path is not the root and does not already have a "/"
at the end.
|
static boolean |
objectRepresentsDirectory(String name)
Predicate: does the object represent a directory?.
|
static org.apache.hadoop.conf.Configuration |
propagateBucketOptions(org.apache.hadoop.conf.Configuration source,
String bucket)
Propagates bucket-specific settings into generic S3A configuration keys.
|
static void |
setBucketOption(org.apache.hadoop.conf.Configuration conf,
String bucket,
String genericKey,
String value)
Set a bucket-specific property to a particular value.
|
static boolean |
setIfDefined(org.apache.hadoop.conf.Configuration config,
String key,
String val,
String origin)
Set a key if the value is non-empty.
|
static String |
stringify(com.amazonaws.services.s3.model.AmazonS3Exception e)
Get low level details of an amazon exception for logging; multi-line.
|
static String |
stringify(com.amazonaws.AmazonServiceException e)
Get low level details of an amazon exception for logging; multi-line.
|
static String |
stringify(com.amazonaws.services.s3.model.S3ObjectSummary summary)
String information about a summary entry for debug messages.
|
static IOException |
translateDynamoDBException(String path,
String message,
com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException ddbException)
Translate a DynamoDB exception into an IOException.
|
static IOException |
translateException(String operation,
org.apache.hadoop.fs.Path path,
com.amazonaws.AmazonClientException exception)
Translate an exception raised in an operation into an IOException.
|
static IOException |
translateException(String operation,
String path,
com.amazonaws.SdkBaseException exception)
Translate an exception raised in an operation into an IOException.
|
public static final String SSE_C_NO_KEY_ERROR
public static final String SSE_S3_WITH_KEY_ERROR
public static final String EOF_MESSAGE_IN_XML_PARSER
public static final String EOF_READ_DIFFERENT_LENGTH
public static final String E_FORBIDDEN_AWS_PROVIDER
public static final List<Class<?>> STANDARD_AWS_PROVIDERS
public static final org.apache.hadoop.fs.PathFilter HIDDEN_FILE_FILTER
public static final org.apache.hadoop.fs.PathFilter ACCEPT_ALL
public static IOException translateException(String operation, org.apache.hadoop.fs.Path path, com.amazonaws.AmazonClientException exception)
AmazonClientException passed in, and any status codes included
in the operation. That is: HTTP error codes are examined and can be
used to build a more specific response.operation - operationpath - path operated on (must not be null)exception - amazon exception raisedpublic static IOException translateException(@Nullable String operation, String path, com.amazonaws.SdkBaseException exception)
AmazonClientException passed in, and any status codes included
in the operation. That is: HTTP error codes are examined and can be
used to build a more specific response.operation - operationpath - path operated on (may be null)exception - amazon exception raisedpublic static IOException extractException(String operation, String path, ExecutionException ee)
operation - operation which failedpath - path operated on (may be null)ee - execution exceptionpublic static boolean isThrottleException(Exception ex)
AWSServiceThrottledException,
or anything which the AWS SDK's RetryUtils considers to be
a throttling exception.ex - exception to examinepublic static boolean isMessageTranslatableToEOF(com.amazonaws.SdkBaseException ex)
ex - exceptionpublic static IOException translateDynamoDBException(String path, String message, com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException ddbException)
path - path in the DDBmessage - preformatted message for the exceptionddbException - exceptionpublic static String stringify(com.amazonaws.AmazonServiceException e)
e - exceptionpublic static String stringify(com.amazonaws.services.s3.model.AmazonS3Exception e)
e - exceptionpublic static S3AFileStatus createFileStatus(org.apache.hadoop.fs.Path keyPath, com.amazonaws.services.s3.model.S3ObjectSummary summary, long blockSize, String owner, String eTag, String versionId, boolean isCSEEnabled)
keyPath - path to entrysummary - summary from AWSblockSize - block size to declare.owner - owner of the fileeTag - S3 object eTag or null if unavailableversionId - S3 object versionId or null if unavailableisCSEEnabled - is client side encryption enabled?public static S3AFileStatus createUploadFileStatus(org.apache.hadoop.fs.Path keyPath, boolean isDir, long size, long blockSize, String owner, String eTag, String versionId)
keyPath - path for created objectisDir - true iff directorysize - file lengthblockSize - block size for file statusowner - Hadoop usernameeTag - S3 object eTag or null if unavailableversionId - S3 object versionId or null if unavailablepublic static boolean objectRepresentsDirectory(String name)
name - object namepublic static long dateToLong(Date date)
date - date from AWS querypublic static AWSCredentialProviderList createAWSCredentialProviderSet(@Nullable URI binding, org.apache.hadoop.conf.Configuration conf) throws IOException
Constants.AWS_CREDENTIALS_PROVIDER in the configuration.binding - Binding URI -may be nullconf - filesystem configurationIOException - Problems loading the providers (including reading
secrets from credential files).public static List<Class<?>> loadAWSProviderClasses(org.apache.hadoop.conf.Configuration conf, String key, Class<?>... defaultValue) throws IOException
conf - configurationkey - keydefaultValue - list of default valuesIOException - on a failure to load the list.public static AWSCredentialProviderList buildAWSProviderList(@Nullable URI binding, org.apache.hadoop.conf.Configuration conf, String key, List<Class<?>> defaultValues, Set<Class<?>> forbidden) throws IOException
binding - Binding URI -may be nullconf - configurationkey - keyforbidden - a possibly empty set of forbidden classes.defaultValues - list of default providers.IOException - on a failure to load the list.public static boolean setIfDefined(org.apache.hadoop.conf.Configuration config,
String key,
String val,
String origin)
config - config to patchkey - key to setval - value to probe and setorigin - originpublic static S3xLoginHelper.Login getAWSAccessKeys(URI name, org.apache.hadoop.conf.Configuration conf) throws IOException
name - the URI for which we need the access keys; may be nullconf - the Configuration object to interrogate for keys.IOException - problems retrieving passwords from KMS.@Deprecated public static String lookupPassword(String bucket, org.apache.hadoop.conf.Configuration conf, String baseKey, String overrideVal) throws IOException
bucket - bucket or "" if none knownconf - configurationbaseKey - base key to look up, e.g "fs.s3a.secret.key"overrideVal - override value: if non empty this is used instead of
querying the configuration.IOException - on any IO problemIllegalArgumentException - bad argumentspublic static String lookupPassword(String bucket, org.apache.hadoop.conf.Configuration conf, String baseKey) throws IOException
bucket - bucket or "" if none knownconf - configurationbaseKey - base key to look up, e.g "fs.s3a.secret.key"IOException - on any IO problemIllegalArgumentException - bad argumentspublic static String lookupPassword(String bucket, org.apache.hadoop.conf.Configuration conf, String baseKey, String overrideVal, String defVal) throws IOException
bucket - bucket or "" if none knownconf - configurationbaseKey - base key to look up, e.g "fs.s3a.secret.key"overrideVal - override value: if non empty this is used instead of
querying the configuration.defVal - value to return if there is no passwordIOException - on any IO problemIllegalArgumentException - bad argumentspublic static String stringify(com.amazonaws.services.s3.model.S3ObjectSummary summary)
summary - summary objectpublic static int intOption(org.apache.hadoop.conf.Configuration conf,
String key,
int defVal,
int min)
conf - configurationkey - key to look updefVal - default valuemin - minimum valueIllegalArgumentException - if the value is below the minimumpublic static long longOption(org.apache.hadoop.conf.Configuration conf,
String key,
long defVal,
long min)
conf - configurationkey - key to look updefVal - default valuemin - minimum valueIllegalArgumentException - if the value is below the minimumpublic static long longBytesOption(org.apache.hadoop.conf.Configuration conf,
String key,
long defVal,
long min)
conf - configurationkey - key to look updefVal - default valuemin - minimum valueIllegalArgumentException - if the value is below the minimumpublic static long getMultipartSizeProperty(org.apache.hadoop.conf.Configuration conf,
String property,
long defVal)
Constants.MULTIPART_MIN_SIZE.
If it is too small, it is rounded up to that minimum, and a warning
printed.conf - configurationproperty - property namedefVal - default valuepublic static int ensureOutputParameterInRange(String name, long size)
name - property name for error messagessize - original sizepublic static org.apache.hadoop.conf.Configuration propagateBucketOptions(org.apache.hadoop.conf.Configuration source,
String bucket)
fs.s3a.bucket.${bucket}.key to
fs.s3a.key, for all values of "key" other than a small set
of unmodifiable values.
The source of the updated property is set to the key name of the bucket
property, to aid in diagnostics of where things came from.
Returns a new configuration. Why the clone?
You can use the same conf for different filesystems, and the original
values are not updated.
The fs.s3a.impl property cannot be set, nor can
any with the prefix fs.s3a.bucket.
This method does not propagate security provider path information from
the S3A property into the Hadoop common provider: callers must call
patchSecurityCredentialProviders(Configuration) explicitly.source - Source Configuration object.bucket - bucket name. Must not be empty.public static void deleteQuietly(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
boolean recursive)
fs - filesystempath - pathrecursive - recursive?public static void deleteWithWarning(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
boolean recursive)
fs - filesystempath - pathrecursive - recursive?@Deprecated public static com.amazonaws.ClientConfiguration createAwsConf(org.apache.hadoop.conf.Configuration conf, String bucket) throws IOException
createAwsConf(Configuration, String, String)ClientConfiguration.
All clients to AWS services MUST use this for consistent setup
of connectivity, UA, proxy settings.conf - The Hadoop configurationbucket - Optional bucket to use to look up per-bucket proxy secretsIOException - problem creating AWS client configurationpublic static com.amazonaws.ClientConfiguration createAwsConf(org.apache.hadoop.conf.Configuration conf,
String bucket,
String awsServiceIdentifier)
throws IOException
ClientConfiguration. All clients to AWS services
MUST use this or the equivalents for the specific service for
consistent setup of connectivity, UA, proxy settings.conf - The Hadoop configurationbucket - Optional bucket to use to look up per-bucket proxy secretsawsServiceIdentifier - a string representing the AWS service (S3,
DDB, etc) for which the ClientConfiguration is being created.IOException - problem creating AWS client configurationpublic static void initConnectionSettings(org.apache.hadoop.conf.Configuration conf,
com.amazonaws.ClientConfiguration awsConf)
throws IOException
conf - Hadoop configurationawsConf - AWS SDK configurationIOException - if there was an error initializing the protocol
settingspublic static void initProxySupport(org.apache.hadoop.conf.Configuration conf,
String bucket,
com.amazonaws.ClientConfiguration awsConf)
throws IllegalArgumentException,
IOException
conf - Hadoop configurationbucket - Optional bucket to use to look up per-bucket proxy secretsawsConf - AWS SDK configuration to updateIllegalArgumentException - if misconfiguredIOException - problem getting username/secret from password source.public static S3AFileStatus[] iteratorToStatuses(org.apache.hadoop.fs.RemoteIterator<S3AFileStatus> iterator, Set<org.apache.hadoop.fs.Path> tombstones) throws IOException
S3AFileStatus to
an array. Given tombstones are filtered out. If the iterator
does return any item, an empty array is returned.iterator - a non-null iteratortombstones - possibly empty set of tombstonesIOException - failurepublic static long applyLocatedFiles(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator,
S3AUtils.CallOnLocatedFileStatus eval)
throws IOException
LocatedFileStatus in a remote
iterator.iterator - iterator from a listeval - closure to evaluateIOException - anything in the closure, or iteration logic.public static <T> List<T> mapLocatedFiles(org.apache.hadoop.fs.RemoteIterator<? extends org.apache.hadoop.fs.LocatedFileStatus> iterator, S3AUtils.LocatedFileStatusMap<T> eval) throws IOException
LocatedFileStatus in a remote
iterator, returning a list of the results.T - return type of mapiterator - iterator from a listeval - closure to evaluateIOException - anything in the closure, or iteration logic.public static <T> List<T> flatmapLocatedFiles(org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> iterator, S3AUtils.LocatedFileStatusMap<Optional<T>> eval) throws IOException
LocatedFileStatus in a remote
iterator, returning a list of the all results which were not empty.T - return type of mapiterator - iterator from a listeval - closure to evaluateIOException - anything in the closure, or iteration logic.public static List<org.apache.hadoop.fs.LocatedFileStatus> listAndFilter(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.Path path, boolean recursive, org.apache.hadoop.fs.PathFilter filter) throws IOException
fileSystem - filesystempath - path to listrecursive - recursive listing?filter - filter for the filenameIOException - IO failure.public static <T> Optional<T> maybe(boolean include, T value)
include is true.T - type of option.include - flag to indicate the value is to be included.value - value to returnpublic static String getS3EncryptionKey(String bucket, org.apache.hadoop.conf.Configuration conf)
bucket - bucket to query forconf - configuration to examineIllegalArgumentException - bad arguments.public static String getS3EncryptionKey(String bucket, org.apache.hadoop.conf.Configuration conf, boolean propagateExceptions) throws IOException
SERVER_SIDE_ENCRYPTION_KEY.
IOExceptions raised during retrieval are swallowed.bucket - bucket to query forconf - configuration to examinepropagateExceptions - should IO exceptions be rethrown?IllegalArgumentException - bad arguments.IOException - if propagateExceptions==true and reading a JCEKS file raised an IOEpublic static S3AEncryptionMethods getEncryptionAlgorithm(String bucket, org.apache.hadoop.conf.Configuration conf) throws IOException
bucket - bucket to query forconf - configuration to scanNONE unless
one is set.IOException - on JCKES lookup or invalid method/key configuration.public static EncryptionSecrets buildEncryptionSecrets(String bucket, org.apache.hadoop.conf.Configuration conf) throws IOException
bucket - bucket to query forconf - configuration to scanNONE unless
one is set and secrets.IOException - on JCKES lookup or invalid method/key configuration.@Deprecated public static void closeAll(org.slf4j.Logger log, Closeable... closeables)
IOUtils.cleanupWithLogger(Logger, Closeable...)log - the log to log at debug level. Can be null.closeables - the objects to closepublic static void closeAutocloseables(org.slf4j.Logger log,
AutoCloseable... closeables)
IOUtils).log - the log to log at debug level. Can be null.closeables - the objects to closepublic static void setBucketOption(org.apache.hadoop.conf.Configuration conf,
String bucket,
String genericKey,
String value)
fs.s3a. prefix,
that's stripped off, so that when the the bucket properties are propagated
down to the generic values, that value gets copied down.conf - configuration to setbucket - bucket namegenericKey - key; can start with "fs.s3a."value - value to setpublic static void clearBucketOption(org.apache.hadoop.conf.Configuration conf,
String bucket,
String genericKey)
fs.s3a. prefix,
that's stripped off, so that when the the bucket properties are propagated
down to the generic values, that value gets copied down.conf - configuration to setbucket - bucket namegenericKey - key; can start with "fs.s3a."public static String getBucketOption(org.apache.hadoop.conf.Configuration conf, String bucket, String genericKey)
fs.s3a. prefix,
that's stripped off.conf - configuration to setbucket - bucket namegenericKey - key; can start with "fs.s3a."public static String maybeAddTrailingSlash(String key)
key - s3 key or ""Copyright © 2008–2022 Apache Software Foundation. All rights reserved.