public class StagingCommitter extends AbstractS3ACommitter
AbstractS3ACommitter.ActiveCommit, AbstractS3ACommitter.JobUUIDSource| Modifier and Type | Field and Description |
|---|---|
static String |
NAME
Name: "staging".
|
E_SELF_GENERATED_JOB_UUID, THREAD_PREFIX| Constructor and Description |
|---|
StagingCommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Committer for a single task attempt.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
abortJobInternal(CommitContext commitContext,
boolean suppressExceptions)
The internal job abort operation; can be overridden in tests.
|
void |
abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Abort the task.
|
protected void |
cleanup(CommitContext commitContext,
boolean suppressExceptions)
Staging committer cleanup includes calling wrapped committer's
cleanup method, and removing staging uploads path and all
destination paths in the final filesystem.
|
void |
cleanupStagingDirs()
Clean up any staging directories.
|
void |
commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
protected int |
commitTaskInternal(org.apache.hadoop.mapreduce.TaskAttemptContext context,
List<? extends org.apache.hadoop.fs.FileStatus> taskOutput,
CommitContext commitContext)
Commit the task by uploading all created files and then
writing a pending entry for them.
|
protected org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter |
createWrappedCommitter(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration conf)
Create the wrapped committer.
|
protected void |
deleteDestinationPaths(org.apache.hadoop.mapreduce.JobContext context)
Delete the working paths of a job.
|
protected void |
deleteStagingUploadsParentDirectory(org.apache.hadoop.mapreduce.JobContext context)
Delete the multipart upload staging directory.
|
protected void |
deleteTaskWorkingPathQuietly(org.apache.hadoop.mapreduce.JobContext context)
Delete the working path of a task; no-op if there is none, that
is: this is a job.
|
protected org.apache.hadoop.fs.PathExistsException |
failDestinationExists(org.apache.hadoop.fs.Path path,
String description)
Generate a
PathExistsException because the destination exists. |
org.apache.hadoop.fs.Path |
getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Return the local work path as the destination for writing work.
|
protected org.apache.hadoop.fs.Path |
getCommittedTaskPath(int appAttemptId,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the
entire job is committed for a specific application attempt.
|
org.apache.hadoop.fs.Path |
getCommittedTaskPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a committed task is stored until
the entire job is committed.
|
static String |
getConfictModeOption(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration fsConf,
String defVal)
Get the conflict mode option string.
|
ConflictResolution |
getConflictResolutionMode(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration fsConf)
Returns the
ConflictResolution mode for this commit. |
protected String |
getFinalKey(String relative,
org.apache.hadoop.mapreduce.JobContext context)
Returns the final S3 key for a relative path.
|
protected org.apache.hadoop.fs.Path |
getFinalPath(String relative,
org.apache.hadoop.mapreduce.JobContext context)
Returns the final S3 location for a relative path as a Hadoop
Path. |
org.apache.hadoop.fs.FileSystem |
getJobAttemptFileSystem(org.apache.hadoop.mapreduce.JobContext context)
Get the filesystem for the job attempt.
|
protected org.apache.hadoop.fs.Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
For a job attempt path, the staging committer returns that of the
wrapped committer.
|
static org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path out)
Compute the path where the output of a given job attempt will be placed.
|
protected org.apache.hadoop.fs.Path |
getJobPath()
Compute the path under which all job attempts will be placed.
|
String |
getName()
Get the name of this committer.
|
static org.apache.hadoop.fs.Path |
getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context,
org.apache.hadoop.fs.Path out)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
protected List<org.apache.hadoop.fs.LocatedFileStatus> |
getTaskOutput(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Lists the output of a task under the task attempt path.
|
org.apache.hadoop.fs.Path |
getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get a temporary directory for data.
|
protected void |
initFileOutputCommitterOptions(org.apache.hadoop.mapreduce.JobContext context)
Init the context config with everything needed for the file output
committer.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploads(CommitContext commitContext,
boolean suppressExceptions)
Get the list of pending uploads for this job attempt.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploadsToAbort(CommitContext commitContext)
Get the list of pending uploads for this job attempt, swallowing
exceptions.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploadsToCommit(CommitContext commitContext)
Get the list of pending uploads for this job attempt.
|
boolean |
needsTaskCommit(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
void |
preCommitJob(CommitContext commitContext,
AbstractS3ACommitter.ActiveCommit pending)
Pre-commit actions for a job.
|
void |
setupJob(org.apache.hadoop.mapreduce.JobContext context)
Set up the job, including calling the same method on the
wrapped committer.
|
void |
setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Task setup.
|
String |
toString() |
Boolean |
useUniqueFilenames()
Is this committer using unique filenames?
|
abortJob, abortPendingUploads, abortPendingUploads, abortPendingUploadsInCleanup, buildJobUUID, cleanupJob, commitJob, commitJobInternal, commitPendingUploads, deleteTaskAttemptPathQuietly, getAuditSpanSource, getCommitOperations, getConf, getDestFS, getDestinationFS, getDestS3AFS, getIOStatistics, getJobContext, getOutputPath, getRole, getTaskAttemptFilesystem, getTaskAttemptPath, getUUID, getUUIDSource, getWorkPath, initiateJobOperation, initiateTaskOperation, initOutput, jobCompleted, maybeCreateSuccessMarker, maybeCreateSuccessMarkerFromCommits, maybeIgnore, maybeIgnore, precommitCheckPendingFiles, recoverTask, requiresDelayedCommitOutputInFileSystem, setConf, setDestFS, setOutputPath, setWorkPath, startOperation, updateCommonContext, warnOnActiveUploadshasOutputPathpublic static final String NAME
public StagingCommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
outputPath - final output pathcontext - task contextIOException - on a failurepublic String getName()
AbstractS3ACommittergetName in class AbstractS3ACommitterprotected org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter createWrappedCommitter(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration conf)
throws IOException
context - job/task context.conf - configIOException - on a failureprotected void initFileOutputCommitterOptions(org.apache.hadoop.mapreduce.JobContext context)
context - context to configure.public String toString()
toString in class AbstractS3ACommitterpublic Boolean useUniqueFilenames()
public org.apache.hadoop.fs.FileSystem getJobAttemptFileSystem(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
context - the context of the job. This is used to get the
application attempt ID.IOException - failure to create the FS.public static org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path out)
context - the context of the job. This is used to get the
application attempt ID.out - the output path to place these in.protected org.apache.hadoop.fs.Path getJobAttemptPath(int appAttemptId)
AbstractS3ACommittergetJobAttemptPath in class AbstractS3ACommitterappAttemptId - the ID of the application attempt for this job.protected org.apache.hadoop.fs.Path getJobPath()
getJobPath in class AbstractS3ACommitterpublic static org.apache.hadoop.fs.Path getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context,
org.apache.hadoop.fs.Path out)
context - the context of the task attempt.out - The output path to put things in.public org.apache.hadoop.fs.Path getCommittedTaskPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context - the context of the task attemptprotected org.apache.hadoop.fs.Path getCommittedTaskPath(int appAttemptId,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
appAttemptId - the ID of the application attempt to usecontext - the context of any task.public org.apache.hadoop.fs.Path getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
AbstractS3ACommittergetTempTaskAttemptPath in class AbstractS3ACommittercontext - task contextprotected List<org.apache.hadoop.fs.LocatedFileStatus> getTaskOutput(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
This implementation lists the files that are direct children of the output path and filters hidden files (file names starting with '.' or '_').
The task attempt path is provided by
AbstractS3ACommitter.getTaskAttemptPath(TaskAttemptContext)
context - this task's TaskAttemptContextIOException - on a failureprotected String getFinalKey(String relative, org.apache.hadoop.mapreduce.JobContext context)
This implementation concatenates the relative path with the key prefix
from the output path.
If CommitConstants.FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES is
set, then the task UUID is also included in the calculation
relative - the path of a file relative to the task attempt pathcontext - the JobContext or TaskAttemptContext for this jobprotected final org.apache.hadoop.fs.Path getFinalPath(String relative, org.apache.hadoop.mapreduce.JobContext context) throws IOException
Path.
This is a final method that calls getFinalKey(String, JobContext)
to determine the final location.relative - the path of a file relative to the task attempt pathcontext - the JobContext or TaskAttemptContext for this jobIOException - IO problempublic org.apache.hadoop.fs.Path getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
getBaseTaskAttemptPath in class AbstractS3ACommittercontext - the context of the task attempt.public org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
getJobAttemptPath in class AbstractS3ACommittercontext - the context of the job.public void setupJob(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
setupJob in class AbstractS3ACommittercontext - job contextIOException - IO failure.protected AbstractS3ACommitter.ActiveCommit listPendingUploadsToCommit(CommitContext commitContext) throws IOException
listPendingUploadsToCommit in class AbstractS3ACommittercommitContext - job contextIOException - Any IO failureprotected AbstractS3ACommitter.ActiveCommit listPendingUploadsToAbort(CommitContext commitContext) throws IOException
commitContext - commit contextIOException - shouldn't be raised, but retained for the compilerprotected AbstractS3ACommitter.ActiveCommit listPendingUploads(CommitContext commitContext, boolean suppressExceptions) throws IOException
commitContext - commit contextsuppressExceptions - should exceptions be swallowed?IOException - Any IO failure which wasn't swallowed.public void cleanupStagingDirs()
AbstractS3ACommittercleanupStagingDirs in class AbstractS3ACommitterprotected void cleanup(CommitContext commitContext, boolean suppressExceptions) throws IOException
cleanup in class AbstractS3ACommittercommitContext - commit contextsuppressExceptions - should exceptions be suppressed?IOException - IO failures if exceptions are not suppressed.protected void abortJobInternal(CommitContext commitContext, boolean suppressExceptions) throws IOException
AbstractS3ACommitterAbstractS3ACommitter.abortJob(JobContext, JobStatus.State) call.
The base implementation calls AbstractS3ACommitter.cleanup(CommitContext, boolean)
so cleans up the filesystems and destroys the thread pool.
Subclasses must always invoke this superclass method after their
own operations.
Creates and closes its own commit context.abortJobInternal in class AbstractS3ACommittercommitContext - commit contextsuppressExceptions - should exceptions be suppressed?IOException - any IO problem raised when suppressExceptions is false.protected void deleteStagingUploadsParentDirectory(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
context - job contextIOException - IO failureprotected void deleteDestinationPaths(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
$dest/__temporarycontext - job contextIOException - IO failurepublic void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
AbstractS3ACommittersetupTask in class AbstractS3ACommitterIOExceptionpublic boolean needsTaskCommit(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
needsTaskCommit in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionpublic void commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
commitTask in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionprotected int commitTaskInternal(org.apache.hadoop.mapreduce.TaskAttemptContext context,
List<? extends org.apache.hadoop.fs.FileStatus> taskOutput,
CommitContext commitContext)
throws IOException
context - task contexttaskOutput - list of files from the outputcommitContext - commit contextIOException - IO Failures.public void abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
abortTask in class org.apache.hadoop.mapreduce.OutputCommittercontext - task contextIOException - any failureprotected void deleteTaskWorkingPathQuietly(org.apache.hadoop.mapreduce.JobContext context)
context - job/task contextpublic final ConflictResolution getConflictResolutionMode(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.conf.Configuration fsConf)
ConflictResolution mode for this commit.context - the JobContext for this commitfsConf - filesystem configprotected org.apache.hadoop.fs.PathExistsException failDestinationExists(org.apache.hadoop.fs.Path path,
String description)
PathExistsException because the destination exists.
Lists some of the child entries first, to help diagnose the problem.path - path which existsdescription - description (usually task/job ID)public static String getConfictModeOption(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.conf.Configuration fsConf, String defVal)
context - context with the configfsConf - filesystem configdefVal - default value.public void preCommitJob(CommitContext commitContext, AbstractS3ACommitter.ActiveCommit pending) throws IOException
preCommitJob in class AbstractS3ACommittercommitContext - commit contextpending - pending commitsIOException - any failureCopyright © 2008–2024 Apache Software Foundation. All rights reserved.