@InterfaceAudience.Private @InterfaceStability.Evolving public class DynamoDBMetadataStore extends Object implements MetadataStore
MetadataStore that persists
file system metadata to DynamoDB.
The current implementation uses a schema consisting of a single table. The
name of the table can be configured by config key
Constants.S3GUARD_DDB_TABLE_NAME_KEY.
By default, it matches the name of the S3 bucket. Each item in the table
represents a single directory or file. Its path is split into separate table
attributes:
s3a://bucket/dir1
|-- dir2
| |-- file1
| `-- file2
`-- dir3
|-- dir4
| `-- file3
|-- dir5
| `-- file4
`-- dir6
This is persisted to a single DynamoDB table as:
========================================================================= | parent | child | is_dir | mod_time | len | ... | ========================================================================= | /bucket | dir1 | true | | | | | /bucket/dir1 | dir2 | true | | | | | /bucket/dir1 | dir3 | true | | | | | /bucket/dir1/dir2 | file1 | | 100 | 111 | | | /bucket/dir1/dir2 | file2 | | 200 | 222 | | | /bucket/dir1/dir3 | dir4 | true | | | | | /bucket/dir1/dir3 | dir5 | true | | | | | /bucket/dir1/dir3/dir4 | file3 | | 300 | 333 | | | /bucket/dir1/dir3/dir5 | file4 | | 400 | 444 | | | /bucket/dir1/dir3 | dir6 | true | | | | =========================================================================This choice of schema is efficient for read access patterns.
get(Path) can be served from a single item lookup.
listChildren(Path) can be served from a query against all rows
matching the parent (the partition key) and the returned list is guaranteed
to be sorted by child (the range key). Tracking whether or not a path is a
directory helps prevent unnecessary queries during traversal of an entire
sub-tree.
Some mutating operations, notably deleteSubtree(Path) and
move(Collection, Collection), are less efficient with this schema.
They require mutating multiple items in the DynamoDB table.
By default, DynamoDB access is performed within the same AWS region as
the S3 bucket that hosts the S3A instance. During initialization, it checks
the location of the S3 bucket and creates a DynamoDB client connected to the
same region. The region may also be set explicitly by setting the config
parameter fs.s3a.s3guard.ddb.region to the corresponding region.| Modifier and Type | Field and Description |
|---|---|
static String |
E_INCOMPATIBLE_VERSION
Error: version mismatch.
|
static String |
E_NO_VERSION_MARKER
Error: version marker not found in table.
|
static org.slf4j.Logger |
LOG |
static int |
VERSION
Current version number.
|
static String |
VERSION_MARKER
parent/child name to use in the version marker.
|
| Constructor and Description |
|---|
DynamoDBMetadataStore() |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
delete(org.apache.hadoop.fs.Path path)
Deletes exactly one path, leaving a tombstone to prevent lingering,
inconsistent copies of it from being listed.
|
void |
deleteSubtree(org.apache.hadoop.fs.Path path)
Deletes the entire sub-tree rooted at the given path, leaving tombstones
to prevent lingering, inconsistent copies of it from being listed.
|
void |
destroy()
Destroy all resources associated with the metadata store.
|
static com.amazonaws.SdkBaseException |
extractInnerException(IllegalArgumentException ex)
Take an
IllegalArgumentException raised by a DDB operation
and if it contains an inner SDK exception, unwrap it. |
void |
forgetMetadata(org.apache.hadoop.fs.Path path)
Removes the record of exactly one path.
|
DDBPathMetadata |
get(org.apache.hadoop.fs.Path path)
Gets metadata for a path.
|
DDBPathMetadata |
get(org.apache.hadoop.fs.Path path,
boolean wantEmptyDirectoryFlag)
Gets metadata for a path.
|
com.amazonaws.services.dynamodbv2.AmazonDynamoDB |
getAmazonDynamoDB() |
long |
getBatchWriteCapacityExceededCount() |
Map<String,String> |
getDiagnostics()
Get any diagnostics information from a store, as a list of (key, value)
tuples for display.
|
Invoker |
getInvoker() |
long |
getReadThrottleEventCount()
Get the count of read throttle events.
|
String |
getTableName() |
long |
getWriteThrottleEventCount()
Get the count of write throttle events.
|
void |
initialize(org.apache.hadoop.conf.Configuration config)
Performs one-time initialization of the metadata store via configuration.
|
void |
initialize(org.apache.hadoop.fs.FileSystem fs)
Performs one-time initialization of the metadata store.
|
DirListingMetadata |
listChildren(org.apache.hadoop.fs.Path path)
Lists metadata for all direct children of a path.
|
void |
move(Collection<org.apache.hadoop.fs.Path> pathsToDelete,
Collection<PathMetadata> pathsToCreate)
Record the effects of a
FileSystem.rename(Path, Path) in the
MetadataStore. |
void |
prune(long modTime)
Clear any metadata older than a specified time from the repository.
|
void |
prune(long modTime,
String keyPrefix)
Prune files, in batches.
|
void |
put(Collection<PathMetadata> metas)
Saves metadata for any number of paths.
|
void |
put(DirListingMetadata meta)
Save directory listing metadata.
|
void |
put(PathMetadata meta)
Saves metadata for exactly one path.
|
void |
tagTable()
Add tags from configuration to the existing DynamoDB table.
|
String |
toString() |
void |
updateParameters(Map<String,String> parameters)
Tune/update parameters for an existing table.
|
public static final org.slf4j.Logger LOG
public static final String VERSION_MARKER
public static final int VERSION
public static final String E_NO_VERSION_MARKER
public static final String E_INCOMPATIBLE_VERSION
@Retries.OnceRaw public void initialize(org.apache.hadoop.fs.FileSystem fs) throws IOException
S3AFileSystem.shareCredentials(String); this will
increment the reference counter of these credentials.initialize in interface MetadataStorefs - S3AFileSystem associated with the MetadataStoreIOException - on a failure@Retries.OnceRaw public void initialize(org.apache.hadoop.conf.Configuration config) throws IOException
initialize(FileSystem)
with an initialized S3AFileSystem instance.
Without a filesystem to act as a reference point, the configuration itself
must declare the table name and region in the
Constants.S3GUARD_DDB_TABLE_NAME_KEY and
Constants.S3GUARD_DDB_REGION_KEY respectively.
It also creates a new credential provider list from the configuration,
using the base fs.s3a.* options, as there is no bucket to infer per-bucket
settings from.initialize in interface MetadataStoreconfig - Configuration.IOException - if there is an errorIllegalArgumentException - if the configuration is incompleteinitialize(FileSystem)@Retries.RetryTranslated public void delete(org.apache.hadoop.fs.Path path) throws IOException
MetadataStoredelete in interface MetadataStorepath - the path to deleteIOException - if there is an error@Retries.RetryTranslated public void forgetMetadata(org.apache.hadoop.fs.Path path) throws IOException
MetadataStoreMetadataStore.delete(Path). It is currently intended for testing
only, and a need to use it as part of normal FileSystem usage is not
anticipated.forgetMetadata in interface MetadataStorepath - the path to deleteIOException - if there is an error@Retries.RetryTranslated public void deleteSubtree(org.apache.hadoop.fs.Path path) throws IOException
MetadataStoreMetadataStore.get(Path),
implementations must also update any stored DirListingMetadata
objects which track the parent of this file.deleteSubtree in interface MetadataStorepath - the root of the sub-tree to deleteIOException - if there is an error@Retries.RetryTranslated public DDBPathMetadata get(org.apache.hadoop.fs.Path path) throws IOException
MetadataStoreget in interface MetadataStorepath - the path to getpath, null if not foundIOException - if there is an error@Retries.RetryTranslated public DDBPathMetadata get(org.apache.hadoop.fs.Path path, boolean wantEmptyDirectoryFlag) throws IOException
MetadataStorePathMetadata.isEmptyDirectory(). Since determining emptiness
may be an expensive operation, this can save wasted work.get in interface MetadataStorepath - the path to getwantEmptyDirectoryFlag - Set to true to give a hint to the
MetadataStore that it should try to compute the empty directory flag.path, null if not foundIOException - if there is an error@Retries.RetryTranslated public DirListingMetadata listChildren(org.apache.hadoop.fs.Path path) throws IOException
MetadataStorelistChildren in interface MetadataStorepath - the path to listpath which are being
tracked by the MetadataStore, or null if the path was not found
in the MetadataStore.IOException - if there is an error@Retries.RetryTranslated public void move(Collection<org.apache.hadoop.fs.Path> pathsToDelete, Collection<PathMetadata> pathsToCreate) throws IOException
MetadataStoreFileSystem.rename(Path, Path) in the
MetadataStore. Clients provide explicit enumeration of the affected
paths (recursively), before and after the rename.
This operation is not atomic, unless specific implementations claim
otherwise.
On the need to provide an enumeration of directory trees instead of just
source and destination paths:
Since a MetadataStore does not have to track all metadata for the
underlying storage system, and a new MetadataStore may be created on an
existing underlying filesystem, this move() may be the first time the
MetadataStore sees the affected paths. Therefore, simply providing src
and destination paths may not be enough to record the deletions (under
src path) and creations (at destination) that are happening during the
rename().move in interface MetadataStorepathsToDelete - Collection of all paths that were removed from the
source directory tree of the move.pathsToCreate - Collection of all PathMetadata for the new paths
that were created at the destination of the rename
().IOException - if there is an error@Retries.RetryTranslated public void put(PathMetadata meta) throws IOException
MetadataStoreDirListingMetadata objects which
track the immediate parent of this file.put in interface MetadataStoremeta - the metadata to saveIOException - if there is an error@Retries.RetryTranslated public void put(Collection<PathMetadata> metas) throws IOException
MetadataStoreput in interface MetadataStoremetas - the metadata to saveIOException - if there is an error@Retries.RetryTranslated public void put(DirListingMetadata meta) throws IOException
MetadataStore implementations may
subsequently keep track of all modifications to the directory contents at
this path, and return authoritative results from subsequent calls to
MetadataStore.listChildren(Path). See DirListingMetadata.
Any authoritative results returned are only authoritative for the scope
of the MetadataStore: A per-process MetadataStore, for
example, would only show results visible to that process, potentially
missing metadata updates (create, delete) made to the same path by
another process..
There is retry around building the list of paths to update, but
the call to processBatchWriteRequest(PrimaryKey[], Item[])
is only tried once.put in interface MetadataStoremeta - Directory listing metadata.IOException - IO problempublic void close()
close in interface Closeableclose in interface AutoCloseable@Retries.RetryTranslated public void destroy() throws IOException
MetadataStoredestroy in interface MetadataStoreIOException - if there is an error@Retries.RetryTranslated public void prune(long modTime) throws IOException
MetadataStoreprune in interface MetadataStoremodTime - Oldest modification time to allowIOException - if there is an error@Retries.RetryTranslated public void prune(long modTime, String keyPrefix) throws IOException
prune in interface MetadataStoremodTime - Oldest modification time to allowkeyPrefix - The prefix for the keys that should be removedIOException - Any IO/DDB failure.InterruptedIOException - if the prune was interrupted@Retries.OnceRaw public void tagTable()
public com.amazonaws.services.dynamodbv2.AmazonDynamoDB getAmazonDynamoDB()
public String getTableName()
@Retries.OnceRaw public Map<String,String> getDiagnostics() throws IOException
MetadataStoregetDiagnostics in interface MetadataStoreIOException - if there is an error@Retries.OnceRaw public void updateParameters(Map<String,String> parameters) throws IOException
MetadataStoreupdateParameters in interface MetadataStoreparameters - map of params to change.IOException - if there is an errorpublic long getReadThrottleEventCount()
public long getWriteThrottleEventCount()
public long getBatchWriteCapacityExceededCount()
public Invoker getInvoker()
public static com.amazonaws.SdkBaseException extractInnerException(IllegalArgumentException ex)
IllegalArgumentException raised by a DDB operation
and if it contains an inner SDK exception, unwrap it.ex - exception.Copyright © 2008–2019 Apache Software Foundation. All rights reserved.