class XmlIngestionJob extends IngestionJob
Main class to XML file If your json contains only one level simple attribute aka. kind of dsv but in json format please use SIMPLE_JSON instead. It's way faster
- Alphabetic
- By Inheritance
- XmlIngestionJob
- IngestionJob
- SparkJob
- JobBase
- StrictLogging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
XmlIngestionJob(domain: Domain, schema: Schema, types: List[Type], path: List[Path], storageHandler: StorageHandler, schemaHandler: SchemaHandler, options: Map[String, String])(implicit settings: Settings)
- domain
: Input Dataset Domain
- schema
: Input Dataset Schema
- types
: List of globally defined types
- path
: Input dataset path
- storageHandler
: Storage Handler
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
analyze(fullTableName: String): Any
- Attributes
- protected
- Definition Classes
- SparkJob
-
def
applyIgnore(dfIn: DataFrame): Dataset[Row]
- Attributes
- protected
- Definition Classes
- IngestionJob
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createSparkViews(views: Views, sqlParameters: Map[String, String]): Unit
- Attributes
- protected
- Definition Classes
- SparkJob
-
val
domain: Domain
- Definition Classes
- XmlIngestionJob → IngestionJob
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
lazy val
extension: String
- Definition Classes
- IngestionJob
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
val
flatRowValidator: GenericRowValidator
- Attributes
- protected
- Definition Classes
- IngestionJob
-
lazy val
format: String
- Definition Classes
- IngestionJob
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getWriteMode(): WriteMode
- Definition Classes
- IngestionJob
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
ingest(dataset: DataFrame): (RDD[_], RDD[_])
Where the magic happen
Where the magic happen
- dataset
input dataset as a RDD of string
- Attributes
- protected
- Definition Classes
- XmlIngestionJob → IngestionJob
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
loadDataSet(): Try[DataFrame]
load the json as an RDD of String
load the json as an RDD of String
- returns
Spark Dataframe loaded using metadata options
- Attributes
- protected
- Definition Classes
- XmlIngestionJob → IngestionJob
-
val
logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
-
lazy val
metadata: Metadata
Merged metadata
Merged metadata
- Definition Classes
- IngestionJob
-
def
name: String
- Definition Classes
- XmlIngestionJob → JobBase
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
now: Timestamp
- Definition Classes
- IngestionJob
-
val
options: Map[String, String]
- Definition Classes
- XmlIngestionJob → IngestionJob
-
def
parseViewDefinition(valueWithEnv: String): (SinkType, Option[JdbcConfigName], String)
- valueWithEnv
in the form [SinkType:[configName:]]viewName
- returns
(SinkType, configName, viewName)
- Attributes
- protected
- Definition Classes
- JobBase
-
def
partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame
- Attributes
- protected
- Definition Classes
- SparkJob
-
def
partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]
Partition a dataset using dataset columns.
Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :
- comet_date
- comet_year
- comet_month
- comet_day
- comet_hour
- comet_minute These columns are renamed to "date", "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
- dataset
: Input dataset
- partition
: list of columns to use for partitioning.
- returns
The Spark session used to run this job
- Attributes
- protected
- Definition Classes
- SparkJob
-
val
path: List[Path]
- Definition Classes
- XmlIngestionJob → IngestionJob
-
def
registerUdf(udf: String): Unit
- Attributes
- protected
- Definition Classes
- SparkJob
-
def
reorderAttributes(dataFrame: DataFrame): List[Attribute]
- Definition Classes
- IngestionJob
-
def
run(): Try[JobResult]
Main entry point as required by the Spark Job interface
Main entry point as required by the Spark Job interface
- returns
: Spark Session used for the job
- Definition Classes
- IngestionJob → JobBase
-
def
saveAccepted(dataframe: DataFrame): (DataFrame, Path)
Merge new and existing dataset if required Save using overwrite / Append mode
Merge new and existing dataset if required Save using overwrite / Append mode
- Attributes
- protected
- Definition Classes
- IngestionJob
-
def
saveRejected(rejectedRDD: RDD[String]): Try[Path]
- Attributes
- protected
- Definition Classes
- IngestionJob
-
val
schema: Schema
- Definition Classes
- XmlIngestionJob → IngestionJob
-
val
schemaHandler: SchemaHandler
- Definition Classes
- XmlIngestionJob → IngestionJob
- lazy val schemaSparkType: StructType
-
lazy val
session: SparkSession
- Definition Classes
- SparkJob
-
implicit
val
settings: Settings
- Definition Classes
- XmlIngestionJob → JobBase
-
lazy val
sparkEnv: SparkEnv
- Definition Classes
- SparkJob
-
val
storageHandler: StorageHandler
- Definition Classes
- XmlIngestionJob → IngestionJob
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
val
treeRowValidator: GenericRowValidator
- Attributes
- protected
- Definition Classes
- IngestionJob
-
val
types: List[Type]
- Definition Classes
- XmlIngestionJob → IngestionJob
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()