Packages

class Parquet2CSV extends SparkJob

Convert parquet files to CSV. The folder hierarchy should be in the form /input_folder/domain/schema/part*.parquet Once converted the csv files is put in the folder /output_folder/domain/schema.csv file When the specified number of parittions is 1 then /output_folder/domain/schema.csv is the file containing the data otherwise, it is a folder containing the part*.csv files. When output_folder is not specified, then the input_folder is used a the base output folder.

Linear Supertypes
SparkJob, JobBase, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Parquet2CSV
  2. SparkJob
  3. JobBase
  4. StrictLogging
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Parquet2CSV(config: Parquet2CSVConfig, storageHandler: StorageHandler)(implicit settings: Settings)

Type Members

  1. type JdbcConfigName = String
    Definition Classes
    JobBase

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def analyze(fullTableName: String): Any
    Attributes
    protected
    Definition Classes
    SparkJob
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  7. def createSparkViews(views: Views, sqlParameters: Map[String, String]): Unit
    Attributes
    protected
    Definition Classes
    SparkJob
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. val logger: Logger
    Attributes
    protected
    Definition Classes
    StrictLogging
  15. def name: String
    Definition Classes
    Parquet2CSVJobBase
  16. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  17. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. def parseViewDefinition(valueWithEnv: String): (SinkType, Option[JdbcConfigName], String)

    valueWithEnv

    in the form [SinkType:[configName:]]viewName

    returns

    (SinkType, configName, viewName)

    Attributes
    protected
    Definition Classes
    JobBase
  20. def partitionDataset(dataset: DataFrame, partition: List[String]): DataFrame
    Attributes
    protected
    Definition Classes
    SparkJob
  21. def partitionedDatasetWriter(dataset: DataFrame, partition: List[String]): DataFrameWriter[Row]

    Partition a dataset using dataset columns.

    Partition a dataset using dataset columns. To partition the dataset using the ingestion time, use the reserved column names :

    • comet_date
    • comet_year
    • comet_month
    • comet_day
    • comet_hour
    • comet_minute These columns are renamed to "date", "year", "month", "day", "hour", "minute" in the dataset and their values is set to the current date/time.
    dataset

    : Input dataset

    partition

    : list of columns to use for partitioning.

    returns

    The Spark session used to run this job

    Attributes
    protected
    Definition Classes
    SparkJob
  22. def registerUdf(udf: String): Unit
    Attributes
    protected
    Definition Classes
    SparkJob
  23. def run(): Try[JobResult]

    Just to force any job to implement its entry point using within the "run" method

    Just to force any job to implement its entry point using within the "run" method

    returns

    : Spark Dataframe for Spark Jobs None otherwise

    Definition Classes
    Parquet2CSVJobBase
  24. lazy val session: SparkSession
    Definition Classes
    SparkJob
  25. implicit val settings: Settings
    Definition Classes
    Parquet2CSVJobBase
  26. lazy val sparkEnv: SparkEnv
    Definition Classes
    SparkJob
  27. val storageHandler: StorageHandler
  28. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  29. def toString(): String
    Definition Classes
    AnyRef → Any
  30. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from SparkJob

Inherited from JobBase

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped