org.apache.spark.sql.execution.streaming.state
SymmetricHashJoinStateManager
Companion object SymmetricHashJoinStateManager
class SymmetricHashJoinStateManager extends Logging
Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec. The interface of this class is basically that of a multi-map: - Get: Returns an iterator of multiple values for given key - Append: Append a new value to the given key - Remove Data by predicate: Drop any state using a predicate condition on keys or values
- Alphabetic
- By Inheritance
- SymmetricHashJoinStateManager
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new SymmetricHashJoinStateManager(joinSide: JoinSide, inputValueAttributes: Seq[Attribute], joinKeys: Seq[Expression], stateInfo: Option[StatefulOperatorStateInfo], storeConf: StateStoreConf, hadoopConf: Configuration, partitionId: Int, stateFormatVersion: Int)
- joinSide
Defines the join side
- inputValueAttributes
Attributes of the input row which will be stored as value
- joinKeys
Expressions to generate rows that will be used to key the value rows
- stateInfo
Information about how to retrieve the correct version of state
- storeConf
Configuration for the state store.
- hadoopConf
Hadoop configuration for reading state data from storage
- partitionId
A partition ID of source RDD.
- stateFormatVersion
The version of format for state. Internally, the key -> multiple values is stored in two StateStores. - Store 1 (KeyToNumValuesStore) maintains mapping between key -> number of values - Store 2 (KeyWithIndexToValueStore) maintains mapping; the mapping depends on the state format version:
- version 1: [(key, index) -> value]
- version 2: [(key, index) -> (value, matched)] - Put: update count in KeyToNumValuesStore, insert new (key, count) -> value in KeyWithIndexToValueStore - Get: read count from KeyToNumValuesStore, read each of the n values in KeyWithIndexToValueStore - Remove state by predicate on keys: scan all keys in KeyToNumValuesStore to find keys that do match the predicate, delete from key from KeyToNumValuesStore, delete values in KeyWithIndexToValueStore - Remove state by condition on values: scan all elements in KeyWithIndexToValueStore to find values that match the predicate, delete corresponding (key, indexToDelete) from KeyWithIndexToValueStore by overwriting with the value of (key, maxIndex), and removing [(key, maxIndex), decrement corresponding num values in KeyToNumValuesStore
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def abortIfNeeded(): Unit
Abort any changes to the state stores if needed
- def append(key: UnsafeRow, value: UnsafeRow, matched: Boolean): Unit
Append a new value to the key
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def commit(): Unit
Commit all the changes to all the state stores
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def get(key: UnsafeRow): Iterator[UnsafeRow]
Get all the values of a key
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getInternalRowOfKeyWithIndex(currentKey: UnsafeRow): InternalRow
Projects the key of unsafe row to internal row for printable log message.
- def getJoinedRows(key: UnsafeRow, generateJoinedRow: (InternalRow) => JoinedRow, predicate: (JoinedRow) => Boolean, excludeRowsAlreadyMatched: Boolean = false): Iterator[JoinedRow]
Get all the matched values for given join condition, with marking matched.
Get all the matched values for given join condition, with marking matched. This method is designed to mark joined rows properly without exposing internal index of row.
- excludeRowsAlreadyMatched
Do not join with rows already matched previously. This is used for right side of left semi join in StreamingSymmetricHashJoinExec only.
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val joinSide: JoinSide
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def metrics: StateStoreMetrics
Get the combined metrics of all the state stores
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def removeByKeyCondition(removalCondition: (UnsafeRow) => Boolean): Iterator[KeyToValuePair]
Remove using a predicate on keys.
Remove using a predicate on keys.
This produces an iterator over the (key, value, matched) tuples satisfying condition(key), where the underlying store is updated as a side-effect of producing next.
This implies the iterator must be consumed fully without any other operations on this manager or the underlying store being interleaved.
- def removeByValueCondition(removalCondition: (UnsafeRow) => Boolean): Iterator[KeyToValuePair]
Remove using a predicate on values.
Remove using a predicate on values.
At a high level, this produces an iterator over the (key, value, matched) tuples such that value satisfies the predicate, where producing an element removes the value from the state store and producing all elements with a given key updates it accordingly.
This implies the iterator must be consumed fully without any other operations on this manager or the underlying store being interleaved.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()