Packages

c

com.whylogs.spark

WhyProfileSession

case class WhyProfileSession(dataFrame: DataFrame, name: String, timeColumn: String = null, groupByColumns: Seq[String] = List(), modelProfile: ModelProfileSession = null) extends Product with Serializable

A class that enable easy access to the profiling API

dataFrame

the dataframe to profile

name

the name of the dataset

timeColumn

the time column, if the data is to be broken down by time

groupByColumns

the group by column

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. WhyProfileSession
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new WhyProfileSession(dataFrame: DataFrame, name: String, timeColumn: String = null, groupByColumns: Seq[String] = List(), modelProfile: ModelProfileSession = null)

    dataFrame

    the dataframe to profile

    name

    the name of the dataset

    timeColumn

    the time column, if the data is to be broken down by time

    groupByColumns

    the group by column

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val PROFILE_FIELD: String
  5. def aggProfiles(timestamp: Instant = Instant.now()): DataFrame

    Run aggregation and build profile based on the specification of this session

    Run aggregation and build profile based on the specification of this session

    timestamp

    the session timestamp for the whole run (often the current time, or the start of the batch run

    returns

    a DataFrame with aggregated profiles under 'why_profile' column

  6. def aggProfiles(timestamp: Long): DataFrame

    Run aggregation and build profile based on the specification of this session

    Run aggregation and build profile based on the specification of this session

    timestamp

    the session timestamp for the whole run

    returns

    a DataFrame with aggregated profiles under 'why_profile' column

  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  11. def groupBy(columns: List[String]): WhyProfileSession

    A Java friendly API.

    A Java friendly API. This is used by the Py4J gateway to pass data into the JV

    columns

    list of columns for grouping

    returns

    a new WhyProfileSession object

  12. def groupBy(col1: String, cols: String*): WhyProfileSession
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. def log(orgId: String, modelId: String, apiKey: String): Unit
  15. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  17. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  18. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  19. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. def withModelProfile(predictionField: String, targetField: String, scoreField: String): WhyProfileSession
  23. def withTimeColumn(timeColumn: String): WhyProfileSession

    Set the column for grouping by time.

    Set the column for grouping by time. This column must be of Timestamp type in Spark SQL.

    Note that WhyLogs uses this column to group data together, so please make sure you truncate the data to the appropriate level of precision (i.e. daily, hourly) before calling this. We only accept a column name at the moment. You can alias raw Column into a column name with String, col: Column)

    timeColumn

    the column that contains the timestamp.

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped