Class JobHelper


  • public class JobHelper
    extends Object
    • Constructor Detail

      • JobHelper

        public JobHelper()
    • Method Detail

      • distributedClassPath

        public static org.apache.hadoop.fs.Path distributedClassPath​(String path)
      • distributedClassPath

        public static org.apache.hadoop.fs.Path distributedClassPath​(org.apache.hadoop.fs.Path base)
      • authenticate

        public static void authenticate()
        Does authenticate against a secured hadoop cluster In case of any bug fix make sure to fix the code at HdfsStorageAuthentication#authenticate as well.
      • setupClasspath

        public static void setupClasspath​(org.apache.hadoop.fs.Path distributedClassPath,
                                          org.apache.hadoop.fs.Path intermediateClassPath,
                                          org.apache.hadoop.mapreduce.Job job)
                                   throws IOException
        Uploads jar files to hdfs and configures the classpath. Snapshot jar files are uploaded to intermediateClasspath and not shared across multiple jobs. Non-Snapshot jar files are uploaded to a distributedClasspath and shared across multiple jobs.
        Parameters:
        distributedClassPath - classpath shared across multiple jobs
        intermediateClassPath - classpath exclusive for this job. used to upload SNAPSHOT jar files.
        job - job to run
        Throws:
        IOException
      • shouldRetryPredicate

        public static com.google.common.base.Predicate<Throwable> shouldRetryPredicate()
      • injectDruidProperties

        public static void injectDruidProperties​(org.apache.hadoop.conf.Configuration configuration,
                                                 HadoopDruidIndexerConfig hadoopDruidIndexerConfig)
      • injectSystemProperties

        public static org.apache.hadoop.conf.Configuration injectSystemProperties​(org.apache.hadoop.conf.Configuration conf,
                                                                                  HadoopDruidIndexerConfig hadoopDruidIndexerConfig)
      • writeJobIdToFile

        public static void writeJobIdToFile​(String hadoopJobIdFileName,
                                            String hadoopJobId)
      • runSingleJob

        public static boolean runSingleJob​(org.apache.druid.indexer.Jobby job)
      • runJobs

        public static boolean runJobs​(List<org.apache.druid.indexer.Jobby> jobs)
      • maybeDeleteIntermediatePath

        public static void maybeDeleteIntermediatePath​(boolean jobSucceeded,
                                                       HadoopIngestionSpec indexerSchema)
      • serializeOutIndex

        public static DataSegmentAndIndexZipFilePath serializeOutIndex​(org.apache.druid.timeline.DataSegment segmentTemplate,
                                                                       org.apache.hadoop.conf.Configuration configuration,
                                                                       org.apache.hadoop.util.Progressable progressable,
                                                                       File mergedBase,
                                                                       org.apache.hadoop.fs.Path finalIndexZipFilePath,
                                                                       org.apache.hadoop.fs.Path tmpPath,
                                                                       org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)
                                                                throws IOException
        Throws:
        IOException
      • writeSegmentDescriptor

        public static void writeSegmentDescriptor​(org.apache.hadoop.fs.FileSystem outputFS,
                                                  DataSegmentAndIndexZipFilePath segmentAndPath,
                                                  org.apache.hadoop.fs.Path descriptorPath,
                                                  org.apache.hadoop.util.Progressable progressable)
                                           throws IOException
        Throws:
        IOException
      • zipAndCopyDir

        public static long zipAndCopyDir​(File baseDir,
                                         OutputStream baseOutputStream,
                                         org.apache.hadoop.util.Progressable progressable)
                                  throws IOException
        Throws:
        IOException
      • makeFileNamePath

        public static org.apache.hadoop.fs.Path makeFileNamePath​(org.apache.hadoop.fs.Path basePath,
                                                                 org.apache.hadoop.fs.FileSystem fs,
                                                                 org.apache.druid.timeline.DataSegment segmentTemplate,
                                                                 String baseFileName,
                                                                 org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)
      • makeTmpPath

        public static org.apache.hadoop.fs.Path makeTmpPath​(org.apache.hadoop.fs.Path basePath,
                                                            org.apache.hadoop.fs.FileSystem fs,
                                                            org.apache.druid.timeline.DataSegment segmentTemplate,
                                                            org.apache.hadoop.mapreduce.TaskAttemptID taskAttemptID,
                                                            org.apache.druid.segment.loading.DataSegmentPusher dataSegmentPusher)
      • renameIndexFilesForSegments

        public static void renameIndexFilesForSegments​(HadoopIngestionSpec indexerSchema,
                                                       List<DataSegmentAndIndexZipFilePath> segmentAndIndexZipFilePaths)
                                                throws IOException
        Renames the index files for the segments. This works around some limitations of both FileContext (no s3n support) and NativeS3FileSystem.rename which will not overwrite. Note: segments should be renamed in the index task, not in a hadoop job, as race conditions between job retries can cause the final segment index file path to get clobbered.
        Parameters:
        indexerSchema - the hadoop ingestion spec
        segmentAndIndexZipFilePaths - the list of segments with their currently stored tmp path and the final path that they should be renamed to.
        Throws:
        IOException
      • prependFSIfNullScheme

        public static org.apache.hadoop.fs.Path prependFSIfNullScheme​(org.apache.hadoop.fs.FileSystem fs,
                                                                      org.apache.hadoop.fs.Path path)
      • unzipNoGuava

        public static long unzipNoGuava​(org.apache.hadoop.fs.Path zip,
                                        org.apache.hadoop.conf.Configuration configuration,
                                        File outDir,
                                        org.apache.hadoop.util.Progressable progressable,
                                        @Nullable
                                        org.apache.hadoop.io.retry.RetryPolicy retryPolicy)
                                 throws IOException
        Throws:
        IOException
      • getURIFromSegment

        public static URI getURIFromSegment​(org.apache.druid.timeline.DataSegment dataSegment)
      • getJobTrackerAddress

        public static String getJobTrackerAddress​(org.apache.hadoop.conf.Configuration config)