Package org.apache.druid.indexer
Class JobHelper
- java.lang.Object
-
- org.apache.druid.indexer.JobHelper
-
public class JobHelper extends Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceJobHelper.DataPusherSimple interface for retry operations
-
Constructor Summary
Constructors Constructor Description JobHelper()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidauthenticate()Does authenticate against a secured hadoop cluster In case of any bug fix make sure to fix the code at HdfsStorageAuthentication#authenticate as well.static longcopyFileToZipStream(File file, ZipOutputStream zipOutputStream, org.apache.hadoop.util.Progressable progressable)static org.apache.hadoop.fs.PathdistributedClassPath(String path)static org.apache.hadoop.fs.PathdistributedClassPath(org.apache.hadoop.fs.Path base)static voidensurePaths(HadoopDruidIndexerConfig config)static StringgetJobTrackerAddress(org.apache.hadoop.conf.Configuration config)static URIgetURIFromSegment(org.apache.druid.timeline.DataSegment dataSegment)static voidinjectDruidProperties(org.apache.hadoop.conf.Configuration configuration, HadoopDruidIndexerConfig hadoopDruidIndexerConfig)static org.apache.hadoop.conf.ConfigurationinjectSystemProperties(org.apache.hadoop.conf.Configuration conf, HadoopDruidIndexerConfig hadoopDruidIndexerConfig)static org.apache.hadoop.fs.PathmakeFileNamePath(org.apache.hadoop.fs.Path basePath, org.apache.hadoop.fs.FileSystem fs, org.apache.druid.timeline.DataSegment segmentTemplate, String baseFileName, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)static org.apache.hadoop.fs.PathmakeTmpPath(org.apache.hadoop.fs.Path basePath, org.apache.hadoop.fs.FileSystem fs, org.apache.druid.timeline.DataSegment segmentTemplate, org.apache.hadoop.mapreduce.TaskAttemptID taskAttemptID, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)static voidmaybeDeleteIntermediatePath(boolean jobSucceeded, HadoopIngestionSpec indexerSchema)static org.apache.hadoop.fs.PathprependFSIfNullScheme(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)static voidrenameIndexFilesForSegments(HadoopIngestionSpec indexerSchema, List<DataSegmentAndIndexZipFilePath> segmentAndIndexZipFilePaths)Renames the index files for the segments.static booleanrunJobs(List<org.apache.druid.indexer.Jobby> jobs)static booleanrunSingleJob(org.apache.druid.indexer.Jobby job)static DataSegmentAndIndexZipFilePathserializeOutIndex(org.apache.druid.timeline.DataSegment segmentTemplate, org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.util.Progressable progressable, File mergedBase, org.apache.hadoop.fs.Path finalIndexZipFilePath, org.apache.hadoop.fs.Path tmpPath, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)static voidsetupClasspath(org.apache.hadoop.fs.Path distributedClassPath, org.apache.hadoop.fs.Path intermediateClassPath, org.apache.hadoop.mapreduce.Job job)Uploads jar files to hdfs and configures the classpath.static com.google.common.base.Predicate<Throwable>shouldRetryPredicate()static longunzipNoGuava(org.apache.hadoop.fs.Path zip, org.apache.hadoop.conf.Configuration configuration, File outDir, org.apache.hadoop.util.Progressable progressable, org.apache.hadoop.io.retry.RetryPolicy retryPolicy)static voidwriteJobIdToFile(String hadoopJobIdFileName, String hadoopJobId)static voidwriteSegmentDescriptor(org.apache.hadoop.fs.FileSystem outputFS, DataSegmentAndIndexZipFilePath segmentAndPath, org.apache.hadoop.fs.Path descriptorPath, org.apache.hadoop.util.Progressable progressable)static longzipAndCopyDir(File baseDir, OutputStream baseOutputStream, org.apache.hadoop.util.Progressable progressable)
-
-
-
Field Detail
-
INDEX_ZIP
public static final String INDEX_ZIP
- See Also:
- Constant Field Values
-
-
Method Detail
-
distributedClassPath
public static org.apache.hadoop.fs.Path distributedClassPath(String path)
-
distributedClassPath
public static org.apache.hadoop.fs.Path distributedClassPath(org.apache.hadoop.fs.Path base)
-
authenticate
public static void authenticate()
Does authenticate against a secured hadoop cluster In case of any bug fix make sure to fix the code at HdfsStorageAuthentication#authenticate as well.
-
setupClasspath
public static void setupClasspath(org.apache.hadoop.fs.Path distributedClassPath, org.apache.hadoop.fs.Path intermediateClassPath, org.apache.hadoop.mapreduce.Job job) throws IOExceptionUploads jar files to hdfs and configures the classpath. Snapshot jar files are uploaded to intermediateClasspath and not shared across multiple jobs. Non-Snapshot jar files are uploaded to a distributedClasspath and shared across multiple jobs.- Parameters:
distributedClassPath- classpath shared across multiple jobsintermediateClassPath- classpath exclusive for this job. used to upload SNAPSHOT jar files.job- job to run- Throws:
IOException
-
shouldRetryPredicate
public static com.google.common.base.Predicate<Throwable> shouldRetryPredicate()
-
injectDruidProperties
public static void injectDruidProperties(org.apache.hadoop.conf.Configuration configuration, HadoopDruidIndexerConfig hadoopDruidIndexerConfig)
-
injectSystemProperties
public static org.apache.hadoop.conf.Configuration injectSystemProperties(org.apache.hadoop.conf.Configuration conf, HadoopDruidIndexerConfig hadoopDruidIndexerConfig)
-
ensurePaths
public static void ensurePaths(HadoopDruidIndexerConfig config)
-
writeJobIdToFile
public static void writeJobIdToFile(String hadoopJobIdFileName, String hadoopJobId)
-
runSingleJob
public static boolean runSingleJob(org.apache.druid.indexer.Jobby job)
-
runJobs
public static boolean runJobs(List<org.apache.druid.indexer.Jobby> jobs)
-
maybeDeleteIntermediatePath
public static void maybeDeleteIntermediatePath(boolean jobSucceeded, HadoopIngestionSpec indexerSchema)
-
serializeOutIndex
public static DataSegmentAndIndexZipFilePath serializeOutIndex(org.apache.druid.timeline.DataSegment segmentTemplate, org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.util.Progressable progressable, File mergedBase, org.apache.hadoop.fs.Path finalIndexZipFilePath, org.apache.hadoop.fs.Path tmpPath, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher) throws IOException
- Throws:
IOException
-
writeSegmentDescriptor
public static void writeSegmentDescriptor(org.apache.hadoop.fs.FileSystem outputFS, DataSegmentAndIndexZipFilePath segmentAndPath, org.apache.hadoop.fs.Path descriptorPath, org.apache.hadoop.util.Progressable progressable) throws IOException- Throws:
IOException
-
zipAndCopyDir
public static long zipAndCopyDir(File baseDir, OutputStream baseOutputStream, org.apache.hadoop.util.Progressable progressable) throws IOException
- Throws:
IOException
-
copyFileToZipStream
public static long copyFileToZipStream(File file, ZipOutputStream zipOutputStream, org.apache.hadoop.util.Progressable progressable) throws IOException
- Throws:
IOException
-
makeFileNamePath
public static org.apache.hadoop.fs.Path makeFileNamePath(org.apache.hadoop.fs.Path basePath, org.apache.hadoop.fs.FileSystem fs, org.apache.druid.timeline.DataSegment segmentTemplate, String baseFileName, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)
-
makeTmpPath
public static org.apache.hadoop.fs.Path makeTmpPath(org.apache.hadoop.fs.Path basePath, org.apache.hadoop.fs.FileSystem fs, org.apache.druid.timeline.DataSegment segmentTemplate, org.apache.hadoop.mapreduce.TaskAttemptID taskAttemptID, org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)
-
renameIndexFilesForSegments
public static void renameIndexFilesForSegments(HadoopIngestionSpec indexerSchema, List<DataSegmentAndIndexZipFilePath> segmentAndIndexZipFilePaths) throws IOException
Renames the index files for the segments. This works around some limitations of both FileContext (no s3n support) and NativeS3FileSystem.rename which will not overwrite. Note: segments should be renamed in the index task, not in a hadoop job, as race conditions between job retries can cause the final segment index file path to get clobbered.- Parameters:
indexerSchema- the hadoop ingestion specsegmentAndIndexZipFilePaths- the list of segments with their currently stored tmp path and the final path that they should be renamed to.- Throws:
IOException
-
prependFSIfNullScheme
public static org.apache.hadoop.fs.Path prependFSIfNullScheme(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
-
unzipNoGuava
public static long unzipNoGuava(org.apache.hadoop.fs.Path zip, org.apache.hadoop.conf.Configuration configuration, File outDir, org.apache.hadoop.util.Progressable progressable, @Nullable org.apache.hadoop.io.retry.RetryPolicy retryPolicy) throws IOException- Throws:
IOException
-
getURIFromSegment
public static URI getURIFromSegment(org.apache.druid.timeline.DataSegment dataSegment)
-
getJobTrackerAddress
public static String getJobTrackerAddress(org.apache.hadoop.conf.Configuration config)
-
-