Packages

class FolderCompaction extends Logging

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FolderCompaction
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FolderCompaction()

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. def compact(inputModel: RawModel, outputModel: RawModel, partitions: Map[String, List[String]], numPartitions: Int, spark: SparkSession): Unit

    See compact(conf:Config,spark:SparkSession)

    See compact(conf:Config,spark:SparkSession)

    inputModel

    the input model to read

    outputModel

    the output model to write

    partitions

    the partitions to compact and values that are part of the

    numPartitions

    number of output partitions

    spark

    the Spark session

  7. def compact(conf: Config, spark: SparkSession): Unit

    Receives the following conf: { "inputModel" : "name of the input model" | ModelConf, "outputModel" : "name of the input model" | ModelConf, "partitions" : { "String1" : ["value1", "value2"], "String2" : ["value1", "value2"] }, "numPartitions" : "integer" }

    Receives the following conf: { "inputModel" : "name of the input model" | ModelConf, "outputModel" : "name of the input model" | ModelConf, "partitions" : { "String1" : ["value1", "value2"], "String2" : ["value1", "value2"] }, "numPartitions" : "integer" }

    ModelConf has the following structure:

    { "name": "a name you like" "uri": "URI of the dataset: basePath if the dataset is partitioned" "schema": "Spark json representation of the schema" "timed": true/false "options": { saveMode: "spark save mode" format: "spark data format" extraOptions: { // "extra format to the spark reader/writer" key: value } partitionBy: [ // "partition columns" partitionColumn1, partitionColumn2, ... partitionColumnN ] } }

    The function will retrieve or build the two indicated models, check that they are file-based, and check that the indicated columns are partition columns. If the previous requirements are met then it will generate each combination of column values and then perform read, repartition to the "numPartitions" number, write to the outputModel and delete the files read.

    conf

    the Config

    spark

    the Spark session

  8. def delete(spark: SparkSession, rootDir: String, dfs: List[DataFrame]): Unit

    Deletes all files that have been read and all empty folders after

    Deletes all files that have been read and all empty folders after

    spark

    the Spark session

    rootDir

    the root dir all dataframes originated from

    dfs

    Dataframes that have been read and filtered

  9. def deleteEmptyPartitionFolders(fs: FileSystem, dataframeRoot: Path, deletedFiles: List[Path]): List[Path]

    fs

    the filesystem to be used for deletion

    dataframeRoot

    the root path of the dataset

    deletedFiles

    the files that have been just deleted from the fs

    returns

    the paths that have been deleted

  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  12. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  13. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. final def isParent(child: Path, parentToFind: Path): Boolean
    Annotations
    @tailrec()
  16. val logger: WaspLogger
    Attributes
    protected
    Definition Classes
    Logging
  17. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  20. def read(spark: SparkSession, inputModel: RawModel, partitions: Map[String, List[String]], whereConditions: List[WhereCondition]): (List[DataFrame], List[Path])

    spark

    the Spark session

    inputModel

    the input model to read

    partitions

    the partitions to compact and values that are part of the

    whereConditions

    the where conditions to filter the Dataframe read from the inputModel

    returns

    the list of DataFrames read, the list of files read from the inputModel

  21. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  22. def toString(): String
    Definition Classes
    AnyRef → Any
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  25. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. def write(writer: RawSparkBatchWriter, dataframes: List[DataFrame]): Unit

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped