Packages

o

org.apache.spark.sql.catalyst.optimizer

StarSchemaDetection

object StarSchemaDetection extends PredicateHelper with SQLConfHelper

Encapsulates star-schema detection logic.

Linear Supertypes
SQLConfHelper, PredicateHelper, Logging, AliasHelper, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StarSchemaDetection
  2. SQLConfHelper
  3. PredicateHelper
  4. Logging
  5. AliasHelper
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def buildBalancedPredicate(expressions: Seq[Expression], op: (Expression, Expression) => Expression): Expression

    Builds a balanced output predicate in bottom up approach, by applying binary operator op pair by pair on input predicates exprs recursively.

    Builds a balanced output predicate in bottom up approach, by applying binary operator op pair by pair on input predicates exprs recursively. Example: exprs = [a, b, c, d], op = And, returns (a And b) And (c And d) exprs = [a, b, c, d, e, f], op = And, returns ((a And b) And (c And d)) And (e And f)

    Attributes
    protected
    Definition Classes
    PredicateHelper
  6. def canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

    Returns true if expr can be evaluated using only the output of plan.

    Returns true if expr can be evaluated using only the output of plan. This method can be used to determine when it is acceptable to move expression evaluation within a query plan.

    For example consider a join between two relations R(a, b) and S(c, d).

    - canEvaluate(EqualTo(a,b), R) returns true - canEvaluate(EqualTo(a,c), R) returns false - canEvaluate(Literal(1), R) returns true as literals CAN be evaluated on any plan

    Attributes
    protected
    Definition Classes
    PredicateHelper
  7. def canEvaluateWithinJoin(expr: Expression): Boolean

    Returns true iff expr could be evaluated as a condition within join.

    Returns true iff expr could be evaluated as a condition within join.

    Attributes
    protected
    Definition Classes
    PredicateHelper
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  9. def conf: SQLConf

    The active config object within the current scope.

    The active config object within the current scope. See SQLConf.get for more information.

    Definition Classes
    SQLConfHelper
  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  12. def extractPredicatesWithinOutputSet(condition: Expression, outputSet: AttributeSet): Option[Expression]

    Returns a filter that its reference is a subset of outputSet and it contains the maximum constraints from condition.

    Returns a filter that its reference is a subset of outputSet and it contains the maximum constraints from condition. This is used for predicate pushdown. When there is no such filter, None is returned.

    Attributes
    protected
    Definition Classes
    PredicateHelper
  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  14. def findExpressionAndTrackLineageDown(exp: Expression, plan: LogicalPlan): Option[(Expression, LogicalPlan)]

    Find the origin of where the input references of expression exp were scanned in the tree of plan, and if they originate from a single leaf node.

    Find the origin of where the input references of expression exp were scanned in the tree of plan, and if they originate from a single leaf node. Returns optional tuple with Expression, undoing any projections and aliasing that has been done along the way from plan to origin, and the origin LeafNode plan from which all the exp

    Definition Classes
    PredicateHelper
  15. def findStarJoins(input: Seq[LogicalPlan], conditions: Seq[Expression]): Seq[LogicalPlan]

    Star schema consists of one or more fact tables referencing a number of dimension tables.

    Star schema consists of one or more fact tables referencing a number of dimension tables. In general, star-schema joins are detected using the following conditions:

    1. Informational RI constraints (reliable detection) + Dimension contains a primary key that is being joined to the fact table. + Fact table contains foreign keys referencing multiple dimension tables. 2. Cardinality based heuristics + Usually, the table with the highest cardinality is the fact table. + Table being joined with the most number of tables is the fact table.

    To detect star joins, the algorithm uses a combination of the above two conditions. The fact table is chosen based on the cardinality heuristics, and the dimension tables are chosen based on the RI constraints. A star join will consist of the largest fact table joined with the dimension tables on their primary keys. To detect that a column is a primary key, the algorithm uses table and column statistics.

    The algorithm currently returns only the star join with the largest fact table. Choosing the largest fact table on the driving arm to avoid large inners is in general a good heuristic. This restriction will be lifted to observe multiple star joins.

    The highlights of the algorithm are the following:

    Given a set of joined tables/plans, the algorithm first verifies if they are eligible for star join detection. An eligible plan is a base table access with valid statistics. A base table access represents Project or Filter operators above a LeafNode. Conservatively, the algorithm only considers base table access as part of a star join since they provide reliable statistics. This restriction can be lifted with the CBO enablement by default.

    If some of the plans are not base table access, or statistics are not available, the algorithm returns an empty star join plan since, in the absence of statistics, it cannot make good planning decisions. Otherwise, the algorithm finds the table with the largest cardinality (number of rows), which is assumed to be a fact table.

    Next, it computes the set of dimension tables for the current fact table. A dimension table is assumed to be in a RI relationship with a fact table. To infer column uniqueness, the algorithm compares the number of distinct values with the total number of rows in the table. If their relative difference is within certain limits (i.e. ndvMaxError * 2, adjusted based on 1TB TPC-DS data), the column is assumed to be unique.

  16. def getAliasMap(exprs: Seq[NamedExpression]): AttributeMap[Alias]
    Attributes
    protected
    Definition Classes
    AliasHelper
  17. def getAliasMap(plan: Aggregate): AttributeMap[Alias]
    Attributes
    protected
    Definition Classes
    AliasHelper
  18. def getAliasMap(plan: Project): AttributeMap[Alias]
    Attributes
    protected
    Definition Classes
    AliasHelper
  19. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  22. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. def isLikelySelective(e: Expression): Boolean

    Returns whether an expression is likely to be selective

    Returns whether an expression is likely to be selective

    Definition Classes
    PredicateHelper
  25. def isNullIntolerant(expr: Expression): Boolean
    Attributes
    protected
    Definition Classes
    PredicateHelper
  26. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  27. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  28. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  35. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  40. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  41. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  42. def outputWithNullability(output: Seq[Attribute], nonNullAttrExprIds: Seq[ExprId]): Seq[Attribute]
    Attributes
    protected
    Definition Classes
    PredicateHelper
  43. def reorderStarJoins(input: Seq[(LogicalPlan, InnerLike)], conditions: Seq[Expression]): Seq[(LogicalPlan, InnerLike)]

    Reorders a star join based on heuristics.

    Reorders a star join based on heuristics. It is called from ReorderJoin if CBO is disabled. 1) Finds the star join with the largest fact table. 2) Places the fact table the driving arm of the left-deep tree. This plan avoids large table access on the inner, and thus favor hash joins. 3) Applies the most selective dimensions early in the plan to reduce the amount of data flow.

  44. def replaceAlias(expr: Expression, aliasMap: AttributeMap[Alias]): Expression

    Replace all attributes, that reference an alias, with the aliased expression

    Replace all attributes, that reference an alias, with the aliased expression

    Attributes
    protected
    Definition Classes
    AliasHelper
  45. def replaceAliasButKeepName(expr: NamedExpression, aliasMap: AttributeMap[Alias]): NamedExpression

    Replace all attributes, that reference an alias, with the aliased expression, but keep the name of the outermost attribute.

    Replace all attributes, that reference an alias, with the aliased expression, but keep the name of the outermost attribute.

    Attributes
    protected
    Definition Classes
    AliasHelper
  46. def splitConjunctivePredicates(condition: Expression): Seq[Expression]
    Attributes
    protected
    Definition Classes
    PredicateHelper
  47. def splitDisjunctivePredicates(condition: Expression): Seq[Expression]
    Attributes
    protected
    Definition Classes
    PredicateHelper
  48. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  49. def toString(): String
    Definition Classes
    AnyRef → Any
  50. def trimAliases(e: Expression): Expression
    Attributes
    protected
    Definition Classes
    AliasHelper
  51. def trimNonTopLevelAliases[T <: Expression](e: T): T
    Attributes
    protected
    Definition Classes
    AliasHelper
  52. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  53. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  54. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from SQLConfHelper

Inherited from PredicateHelper

Inherited from Logging

Inherited from AliasHelper

Inherited from AnyRef

Inherited from Any

Ungrouped