class IngestionWorkflow extends StrictLogging
The whole worklfow works as follow :
- loadLanding : Zipped files are uncompressed or raw files extracted from the local filesystem. -loadPending : files recognized with filename patterns are stored in the ingesting area and submitted for ingestion files with unrecognized filename patterns are stored in the unresolved area
- ingest : files are finally ingested and saved as parquet/orc/... files and hive tables
- Alphabetic
- By Inheritance
- IngestionWorkflow
- StrictLogging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
IngestionWorkflow(storageHandler: StorageHandler, schemaHandler: SchemaHandler, launchHandler: LaunchHandler)(implicit settings: Settings)
- storageHandler
: Minimum set of features required for the underlying filesystem
- schemaHandler
: Schema interface
- launchHandler
: Cron Manager interface
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def atlas(config: AtlasConfig): Boolean
-
def
autoJob(config: TransformConfig): Boolean
Successively run each task of a job
Successively run each task of a job
- config
: job name as defined in the YML file and sql parameters to pass to SQL statements.
- def bqload(config: BigQueryLoadConfig, maybeSchema: Option[Schema] = None): Try[JobResult]
- def buildTasks(jobName: String, jobOptions: Map[String, String]): Seq[AutoTaskJob]
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val domains: List[Domain]
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def esLoad(config: ESLoadConfig): Try[JobResult]
- def esload(action: AutoTaskJob): Boolean
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def infer(config: InferSchemaConfig): Try[Unit]
- def ingest(domain: Domain, schema: Schema, ingestingPath: List[Path], options: Map[String, String]): Try[JobResult]
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def jdbcload(config: ConnectionLoadConfig): Try[JobResult]
-
def
load(config: LoadConfig): Boolean
Ingest the file (called by the cron manager at ingestion time for a specific dataset
-
def
loadLanding(): Unit
Move the files from the landing area to the pending area.
Move the files from the landing area to the pending area. files are loaded one domain at a time each domain has its own directory and is specified in the "directory" key of Domain YML file compressed files are uncompressed if a corresponding ack file exist. Compressed files are recognized by their extension which should be one of .tgz, .zip, .gz. raw file should also have a corresponding ack file before moving the files to the pending area, the ack files are deleted To import files without ack specify an empty "ack" key (aka ack:"") in the domain YML file. "ack" is the default ack extension searched for but you may specify a different one in the domain YML file.
-
def
loadPending(config: WatchConfig = WatchConfig()): Boolean
Split files into resolved and unresolved datasets.
Split files into resolved and unresolved datasets. A file is unresolved if a corresponding schema is not found. Schema matching is based on the dataset filename pattern
- config
: includes Load pending dataset of these domain only excludes : Do not load datasets of these domains if both lists are empty, all domains are included
-
val
logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
-
def
metric(cliConfig: MetricsConfig): Try[JobResult]
Runs the metrics job
Runs the metrics job
- cliConfig
: Client's configuration for metrics computing
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
setNullableStateOfColumn(df: DataFrame, nullable: Boolean): DataFrame
Set nullable property of column.
Set nullable property of column.
- df
source DataFrame
- nullable
is the flag to set, such that the column is either nullable or not
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()