public class StagingCommitter extends AbstractS3ACommitter
AbstractS3ACommitter.ActiveCommit, AbstractS3ACommitter.JobUUIDSource| Modifier and Type | Field and Description |
|---|---|
static String |
NAME
Name: "staging".
|
E_SELF_GENERATED_JOB_UUID, THREAD_PREFIX| Constructor and Description |
|---|
StagingCommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Committer for a single task attempt.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
abortJobInternal(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
The internal job abort operation; can be overridden in tests.
|
void |
abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Abort the task.
|
protected void |
cleanup(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
Cleanup the job context, including aborting anything pending
and destroying the thread pool.
|
void |
cleanupStagingDirs()
Clean up any staging directories.
|
void |
commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
protected int |
commitTaskInternal(org.apache.hadoop.mapreduce.TaskAttemptContext context,
List<? extends org.apache.hadoop.fs.FileStatus> taskOutput)
Commit the task by uploading all created files and then
writing a pending entry for them.
|
protected org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter |
createWrappedCommitter(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration conf)
Create the wrapped committer.
|
protected void |
deleteDestinationPaths(org.apache.hadoop.mapreduce.JobContext context)
Delete the working paths of a job.
|
protected void |
deleteTaskWorkingPathQuietly(org.apache.hadoop.mapreduce.JobContext context)
Delete the working path of a task; no-op if there is none, that
is: this is a job.
|
protected org.apache.hadoop.fs.PathExistsException |
failDestinationExists(org.apache.hadoop.fs.Path path,
String description)
Generate a
PathExistsException because the destination exists. |
org.apache.hadoop.fs.Path |
getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Return the local work path as the destination for writing work.
|
protected org.apache.hadoop.fs.Path |
getCommittedTaskPath(int appAttemptId,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a committed task is stored until the
entire job is committed for a specific application attempt.
|
org.apache.hadoop.fs.Path |
getCommittedTaskPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a committed task is stored until
the entire job is committed.
|
static String |
getConfictModeOption(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration fsConf,
String defVal)
Get the conflict mode option string.
|
ConflictResolution |
getConflictResolutionMode(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration fsConf)
Returns the
ConflictResolution mode for this commit. |
protected String |
getFinalKey(String relative,
org.apache.hadoop.mapreduce.JobContext context)
Returns the final S3 key for a relative path.
|
protected org.apache.hadoop.fs.Path |
getFinalPath(String relative,
org.apache.hadoop.mapreduce.JobContext context)
Returns the final S3 location for a relative path as a Hadoop
Path. |
org.apache.hadoop.fs.FileSystem |
getJobAttemptFileSystem(org.apache.hadoop.mapreduce.JobContext context)
Get the filesystem for the job attempt.
|
protected org.apache.hadoop.fs.Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
For a job attempt path, the staging committer returns that of the
wrapped committer.
|
static org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path out)
Compute the path where the output of a given job attempt will be placed.
|
String |
getName()
Get the name of this committer.
|
static org.apache.hadoop.fs.Path |
getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context,
org.apache.hadoop.fs.Path out)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
protected List<org.apache.hadoop.fs.LocatedFileStatus> |
getTaskOutput(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Lists the output of a task under the task attempt path.
|
org.apache.hadoop.fs.Path |
getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get a temporary directory for data.
|
protected void |
initFileOutputCommitterOptions(org.apache.hadoop.mapreduce.JobContext context)
Init the context config with everything needed for the file output
committer.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
Get the list of pending uploads for this job attempt.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploadsToAbort(org.apache.hadoop.mapreduce.JobContext context)
Get the list of pending uploads for this job attempt, swallowing
exceptions.
|
protected AbstractS3ACommitter.ActiveCommit |
listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context)
Get the list of pending uploads for this job attempt.
|
boolean |
needsTaskCommit(org.apache.hadoop.mapreduce.TaskAttemptContext context) |
void |
preCommitJob(org.apache.hadoop.mapreduce.JobContext context,
AbstractS3ACommitter.ActiveCommit pending)
Pre-commit actions for a job.
|
void |
setupJob(org.apache.hadoop.mapreduce.JobContext context)
Set up the job, including calling the same method on the
wrapped committer.
|
void |
setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Task setup.
|
String |
toString() |
Boolean |
useUniqueFilenames()
Is this committer using unique filenames?
|
abortJob, abortPendingUploads, abortPendingUploads, abortPendingUploadsInCleanup, buildJobUUID, buildSubmitter, cleanupJob, commitJob, commitJobInternal, commitPendingUploads, deleteTaskAttemptPathQuietly, destroyThreadPool, getAuditSpanSource, getCommitOperations, getConf, getDestFS, getDestinationFS, getDestS3AFS, getIOStatistics, getJobContext, getOutputPath, getRole, getTaskAttemptFilesystem, getTaskAttemptPath, getUUID, getUUIDSource, getWorkPath, hasThreadPool, initiateCommitOperation, initOutput, jobCompleted, maybeCreateSuccessMarker, maybeCreateSuccessMarkerFromCommits, maybeIgnore, maybeIgnore, precommitCheckPendingFiles, recoverTask, requiresDelayedCommitOutputInFileSystem, resetCommonContext, setConf, setDestFS, setOutputPath, setWorkPath, singleThreadSubmitter, startOperation, updateCommonContext, warnOnActiveUploadshasOutputPathpublic static final String NAME
public StagingCommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
outputPath - final output pathcontext - task contextIOException - on a failurepublic String getName()
AbstractS3ACommittergetName in class AbstractS3ACommitterprotected org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter createWrappedCommitter(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.conf.Configuration conf)
throws IOException
context - job/task context.conf - configIOException - on a failureprotected void initFileOutputCommitterOptions(org.apache.hadoop.mapreduce.JobContext context)
context - context to configure.public String toString()
toString in class AbstractS3ACommitterpublic Boolean useUniqueFilenames()
public org.apache.hadoop.fs.FileSystem getJobAttemptFileSystem(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
context - the context of the job. This is used to get the
application attempt ID.IOException - failure to create the FS.public static org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path out)
context - the context of the job. This is used to get the
application attempt ID.out - the output path to place these in.protected org.apache.hadoop.fs.Path getJobAttemptPath(int appAttemptId)
AbstractS3ACommittergetJobAttemptPath in class AbstractS3ACommitterappAttemptId - the ID of the application attempt for this job.public static org.apache.hadoop.fs.Path getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context,
org.apache.hadoop.fs.Path out)
context - the context of the task attempt.out - The output path to put things in.public org.apache.hadoop.fs.Path getCommittedTaskPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context - the context of the task attemptprotected org.apache.hadoop.fs.Path getCommittedTaskPath(int appAttemptId,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
appAttemptId - the ID of the application attempt to usecontext - the context of any task.public org.apache.hadoop.fs.Path getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
AbstractS3ACommittergetTempTaskAttemptPath in class AbstractS3ACommittercontext - task contextprotected List<org.apache.hadoop.fs.LocatedFileStatus> getTaskOutput(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
This implementation lists the files that are direct children of the output path and filters hidden files (file names starting with '.' or '_').
The task attempt path is provided by
AbstractS3ACommitter.getTaskAttemptPath(TaskAttemptContext)
context - this task's TaskAttemptContextIOException - on a failureprotected String getFinalKey(String relative, org.apache.hadoop.mapreduce.JobContext context)
This implementation concatenates the relative path with the key prefix
from the output path.
If CommitConstants.FS_S3A_COMMITTER_STAGING_UNIQUE_FILENAMES is
set, then the task UUID is also included in the calculation
relative - the path of a file relative to the task attempt pathcontext - the JobContext or TaskAttemptContext for this jobprotected final org.apache.hadoop.fs.Path getFinalPath(String relative, org.apache.hadoop.mapreduce.JobContext context) throws IOException
Path.
This is a final method that calls getFinalKey(String, JobContext)
to determine the final location.relative - the path of a file relative to the task attempt pathcontext - the JobContext or TaskAttemptContext for this jobIOException - IO problempublic org.apache.hadoop.fs.Path getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
getBaseTaskAttemptPath in class AbstractS3ACommittercontext - the context of the task attempt.public org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
getJobAttemptPath in class AbstractS3ACommittercontext - the context of the job.public void setupJob(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
setupJob in class AbstractS3ACommittercontext - job contextIOException - IO failure.protected AbstractS3ACommitter.ActiveCommit listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context) throws IOException
listPendingUploadsToCommit in class AbstractS3ACommittercontext - job contextIOException - Any IO failureprotected AbstractS3ACommitter.ActiveCommit listPendingUploadsToAbort(org.apache.hadoop.mapreduce.JobContext context) throws IOException
context - job contextIOException - shouldn't be raised, but retained for the compilerprotected AbstractS3ACommitter.ActiveCommit listPendingUploads(org.apache.hadoop.mapreduce.JobContext context, boolean suppressExceptions) throws IOException
context - job contextsuppressExceptions - should exceptions be swallowed?IOException - Any IO failure which wasn't swallowed.public void cleanupStagingDirs()
AbstractS3ACommittercleanupStagingDirs in class AbstractS3ACommitterprotected void cleanup(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
throws IOException
AbstractS3ACommittercleanup in class AbstractS3ACommittercontext - job contextsuppressExceptions - should exceptions be suppressed?IOException - any failure if exceptions were not suppressed.protected void abortJobInternal(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
throws IOException
AbstractS3ACommitterAbstractS3ACommitter.abortJob(JobContext, JobStatus.State) call.
The base implementation calls AbstractS3ACommitter.cleanup(JobContext, boolean)
so cleans up the filesystems and destroys the thread pool.
Subclasses must always invoke this superclass method after their
own operations.abortJobInternal in class AbstractS3ACommittercontext - job contextsuppressExceptions - should exceptions be suppressed?IOException - any IO problem raised when suppressExceptions is false.protected void deleteDestinationPaths(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
$dest/__temporarycontext - job contextIOException - IO failurepublic void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
AbstractS3ACommittersetupTask in class AbstractS3ACommitterIOExceptionpublic boolean needsTaskCommit(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
needsTaskCommit in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionpublic void commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
commitTask in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionprotected int commitTaskInternal(org.apache.hadoop.mapreduce.TaskAttemptContext context,
List<? extends org.apache.hadoop.fs.FileStatus> taskOutput)
throws IOException
context - task contexttaskOutput - list of files from the outputIOException - IO Failures.public void abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
abortTask in class org.apache.hadoop.mapreduce.OutputCommittercontext - task contextIOException - any failureprotected void deleteTaskWorkingPathQuietly(org.apache.hadoop.mapreduce.JobContext context)
context - job/task contextpublic final ConflictResolution getConflictResolutionMode(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.conf.Configuration fsConf)
ConflictResolution mode for this commit.context - the JobContext for this commitfsConf - filesystem configprotected org.apache.hadoop.fs.PathExistsException failDestinationExists(org.apache.hadoop.fs.Path path,
String description)
PathExistsException because the destination exists.
Lists some of the child entries first, to help diagnose the problem.path - path which existsdescription - description (usually task/job ID)public static String getConfictModeOption(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.conf.Configuration fsConf, String defVal)
context - context with the configfsConf - filesystem configdefVal - default value.public void preCommitJob(org.apache.hadoop.mapreduce.JobContext context,
AbstractS3ACommitter.ActiveCommit pending)
throws IOException
preCommitJob in class AbstractS3ACommittercontext - job contextpending - pending commitsIOException - any failureCopyright © 2008–2022 Apache Software Foundation. All rights reserved.