Packages

c

com.ebiznext.comet.workflow

IngestionWorkflow

class IngestionWorkflow extends StrictLogging

The whole worklfow works as follow :

  • loadLanding : Zipped files are uncompressed or raw files extracted from the local filesystem. -loadPending : files recognized with filename patterns are stored in the ingesting area and submitted for ingestion files with unrecognized filename patterns are stored in the unresolved area
  • ingest : files are finally ingested and saved as parquet/orc/... files and hive tables
Linear Supertypes
StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. IngestionWorkflow
  2. StrictLogging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new IngestionWorkflow(storageHandler: StorageHandler, schemaHandler: SchemaHandler, launchHandler: LaunchHandler)(implicit settings: Settings)

    storageHandler

    : Minimum set of features required for the underlying filesystem

    schemaHandler

    : Schema interface

    launchHandler

    : Cron Manager interface

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def atlas(config: AtlasConfig): Boolean
  6. def autoJob(config: TransformConfig): Boolean

    Successively run each task of a job

    Successively run each task of a job

    config

    : job name as defined in the YML file and sql parameters to pass to SQL statements.

  7. def bqload(config: BigQueryLoadConfig, maybeSchema: Option[Schema] = None): Try[JobResult]
  8. def buildTasks(jobName: String, jobOptions: Map[String, String]): Seq[AutoTaskJob]
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  10. val domains: List[Domain]
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  13. def esLoad(config: ESLoadConfig): Try[JobResult]
  14. def esload(action: AutoTaskJob): Boolean
  15. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  18. def infer(config: InferSchemaConfig): Try[Unit]
  19. def ingest(domain: Domain, schema: Schema, ingestingPath: List[Path], options: Map[String, String]): Try[JobResult]
  20. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  21. def jdbcload(config: ConnectionLoadConfig): Try[JobResult]
  22. def load(config: LoadConfig): Boolean

    Ingest the file (called by the cron manager at ingestion time for a specific dataset

  23. def loadLanding(): Unit

    Move the files from the landing area to the pending area.

    Move the files from the landing area to the pending area. files are loaded one domain at a time each domain has its own directory and is specified in the "directory" key of Domain YML file compressed files are uncompressed if a corresponding ack file exist. Compressed files are recognized by their extension which should be one of .tgz, .zip, .gz. raw file should also have a corresponding ack file before moving the files to the pending area, the ack files are deleted To import files without ack specify an empty "ack" key (aka ack:"") in the domain YML file. "ack" is the default ack extension searched for but you may specify a different one in the domain YML file.

  24. def loadPending(config: WatchConfig = WatchConfig()): Boolean

    Split files into resolved and unresolved datasets.

    Split files into resolved and unresolved datasets. A file is unresolved if a corresponding schema is not found. Schema matching is based on the dataset filename pattern

    config

    : includes Load pending dataset of these domain only excludes : Do not load datasets of these domains if both lists are empty, all domains are included

  25. val logger: Logger
    Attributes
    protected
    Definition Classes
    StrictLogging
  26. def metric(cliConfig: MetricsConfig): Try[JobResult]

    Runs the metrics job

    Runs the metrics job

    cliConfig

    : Client's configuration for metrics computing

  27. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  28. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  30. def setNullableStateOfColumn(df: DataFrame, nullable: Boolean): DataFrame

    Set nullable property of column.

    Set nullable property of column.

    df

    source DataFrame

    nullable

    is the flag to set, such that the column is either nullable or not

  31. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  32. def toString(): String
    Definition Classes
    AnyRef → Any
  33. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped