case class Metadata(mode: Option[Mode] = None, format: Option[Format] = None, encoding: Option[String] = None, multiline: Option[Boolean] = None, array: Option[Boolean] = None, withHeader: Option[Boolean] = None, separator: Option[String] = None, quote: Option[String] = None, escape: Option[String] = None, write: Option[WriteMode] = None, partition: Option[Partition] = None, sink: Option[Sink] = None, ignore: Option[String] = None, clustering: Option[Seq[String]] = None, xml: Option[Map[String, String]] = None) extends Product with Serializable

Specify Schema properties. These properties may be specified at the schema or domain level Any property not specified at the schema level is taken from the one specified at the domain level or else the default value is returned.

mode

: FILE mode by default. FILE and STREAM are the two accepted values. FILE is currently the only supported mode.

format

: DSV by default. Supported file formats are :

  • DSV : Delimiter-separated values file. Delimiter value iss specified in the "separator" field.
  • POSITION : FIXED format file where values are located at an exact position in each line.
  • SIMPLE_JSON : For optimisation purpose, we differentiate JSON with top level values from JSON with deep level fields. SIMPLE_JSON are JSON files with top level fields only.
  • JSON : Deep JSON file. Use only when your json documents contain subdocuments, otherwise prefer to use SIMPLE_JSON since it is much faster.
  • XML : XML files
encoding

: UTF-8 if not specified.

multiline

: are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array

: Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader

: does the dataset has a header ? true bu default

separator

: the values delimiter, ';' by default value may be a multichar string starting from Spark3

quote

: The String quote char, '"' by default

escape

: escaping char '\' by default

write

: Write mode, APPEND by default

partition

: Partition columns, no partitioning by default

sink

: should the dataset be indexed in elasticsearch after ingestion ?

ignore

: Pattern to ignore or UDF to apply to ignore some lines

clustering

: List of attributes to use for clustering

xml

: com.databricks.spark.xml options to use (eq. rowTag)

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Metadata
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Metadata(mode: Option[Mode] = None, format: Option[Format] = None, encoding: Option[String] = None, multiline: Option[Boolean] = None, array: Option[Boolean] = None, withHeader: Option[Boolean] = None, separator: Option[String] = None, quote: Option[String] = None, escape: Option[String] = None, write: Option[WriteMode] = None, partition: Option[Partition] = None, sink: Option[Sink] = None, ignore: Option[String] = None, clustering: Option[Seq[String]] = None, xml: Option[Map[String, String]] = None)

    mode

    : FILE mode by default. FILE and STREAM are the two accepted values. FILE is currently the only supported mode.

    format

    : DSV by default. Supported file formats are :

    • DSV : Delimiter-separated values file. Delimiter value iss specified in the "separator" field.
    • POSITION : FIXED format file where values are located at an exact position in each line.
    • SIMPLE_JSON : For optimisation purpose, we differentiate JSON with top level values from JSON with deep level fields. SIMPLE_JSON are JSON files with top level fields only.
    • JSON : Deep JSON file. Use only when your json documents contain subdocuments, otherwise prefer to use SIMPLE_JSON since it is much faster.
    • XML : XML files
    encoding

    : UTF-8 if not specified.

    multiline

    : are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

    array

    : Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

    withHeader

    : does the dataset has a header ? true bu default

    separator

    : the values delimiter, ';' by default value may be a multichar string starting from Spark3

    quote

    : The String quote char, '"' by default

    escape

    : escaping char '\' by default

    write

    : Write mode, APPEND by default

    partition

    : Partition columns, no partitioning by default

    sink

    : should the dataset be indexed in elasticsearch after ingestion ?

    ignore

    : Pattern to ignore or UDF to apply to ignore some lines

    clustering

    : List of attributes to use for clustering

    xml

    : com.databricks.spark.xml options to use (eq. rowTag)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val array: Option[Boolean]
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def checkValidity(schemaHandler: SchemaHandler): Either[List[String], Boolean]
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. val clustering: Option[Seq[String]]
  9. val encoding: Option[String]
  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. val escape: Option[String]
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. val format: Option[Format]
  14. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def getEncoding(): String
  16. def getEscape(): String
  17. def getFormat(): Format
  18. def getMode(): Mode
  19. def getMultiline(): Boolean
  20. def getPartitionAttributes(): List[String]
    Annotations
    @JsonIgnore()
  21. def getQuote(): String
  22. def getSamplingStrategy(): Double
    Annotations
    @JsonIgnore()
  23. def getSeparator(): String
  24. def getSink(): Option[Sink]
  25. def getWrite(): WriteMode
  26. val ignore: Option[String]
  27. def import(child: Metadata): Metadata

    Merge this metadata with its child.

    Merge this metadata with its child. Any property defined at the child level overrides the one defined at this level This allow a schema to override the domain metadata attribute Applied to a Domain level metadata

    child

    : Schema level metadata

    returns

    the metadata resulting of the merge of the schema and the domain metadata.

  28. def isArray(): Boolean
  29. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  30. def isWithHeader(): Boolean
  31. def merge[T](parent: Option[T], child: Option[T]): Option[T]

    Merge a single attribute

    Merge a single attribute

    parent

    : Domain level metadata attribute

    child

    : Schema level metadata attribute

    returns

    attribute if merge, the domain attribute otherwise.

    Attributes
    protected
  32. val mode: Option[Mode]
  33. val multiline: Option[Boolean]
  34. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  35. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  36. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  37. val partition: Option[Partition]
  38. val quote: Option[String]
  39. val separator: Option[String]
  40. val sink: Option[Sink]
  41. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  42. def toString(): String
    Definition Classes
    Metadata → AnyRef → Any
  43. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  46. val withHeader: Option[Boolean]
  47. val write: Option[WriteMode]
  48. val xml: Option[Map[String, String]]

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped