org.apache.spark.sql.execution.streaming.state
SymmetricHashJoinStateManager
Companion object SymmetricHashJoinStateManager
class SymmetricHashJoinStateManager extends Logging
Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec. The interface of this class is basically that of a multi-map: - Get: Returns an iterator of multiple values for given key - Append: Append a new value to the given key - Remove Data by predicate: Drop any state using a predicate condition on keys or values
- Alphabetic
- By Inheritance
- SymmetricHashJoinStateManager
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
SymmetricHashJoinStateManager(joinSide: JoinSide, inputValueAttributes: Seq[Attribute], joinKeys: Seq[Expression], stateInfo: Option[StatefulOperatorStateInfo], storeConf: StateStoreConf, hadoopConf: Configuration, partitionId: Int, stateFormatVersion: Int, skippedNullValueCount: Option[SQLMetric] = None)
- joinSide
Defines the join side
- inputValueAttributes
Attributes of the input row which will be stored as value
- joinKeys
Expressions to generate rows that will be used to key the value rows
- stateInfo
Information about how to retrieve the correct version of state
- storeConf
Configuration for the state store.
- hadoopConf
Hadoop configuration for reading state data from storage
- partitionId
A partition ID of source RDD.
- stateFormatVersion
The version of format for state. Internally, the key -> multiple values is stored in two StateStores. - Store 1 (KeyToNumValuesStore) maintains mapping between key -> number of values - Store 2 (KeyWithIndexToValueStore) maintains mapping; the mapping depends on the state format version:
- version 1: [(key, index) -> value]
- version 2: [(key, index) -> (value, matched)] - Put: update count in KeyToNumValuesStore, insert new (key, count) -> value in KeyWithIndexToValueStore - Get: read count from KeyToNumValuesStore, read each of the n values in KeyWithIndexToValueStore - Remove state by predicate on keys: scan all keys in KeyToNumValuesStore to find keys that do match the predicate, delete from key from KeyToNumValuesStore, delete values in KeyWithIndexToValueStore - Remove state by condition on values: scan all elements in KeyWithIndexToValueStore to find values that match the predicate, delete corresponding (key, indexToDelete) from KeyWithIndexToValueStore by overwriting with the value of (key, maxIndex), and removing [(key, maxIndex), decrement corresponding num values in KeyToNumValuesStore
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
abortIfNeeded(): Unit
Abort any changes to the state stores if needed
-
def
append(key: UnsafeRow, value: UnsafeRow, matched: Boolean): Unit
Append a new value to the key
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
commit(): Unit
Commit all the changes to all the state stores
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
get(key: UnsafeRow): Iterator[UnsafeRow]
Get all the values of a key
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getInternalRowOfKeyWithIndex(currentKey: UnsafeRow): InternalRow
Projects the key of unsafe row to internal row for printable log message.
-
def
getJoinedRows(key: UnsafeRow, generateJoinedRow: (InternalRow) ⇒ JoinedRow, predicate: (JoinedRow) ⇒ Boolean, excludeRowsAlreadyMatched: Boolean = false): Iterator[JoinedRow]
Get all the matched values for given join condition, with marking matched.
Get all the matched values for given join condition, with marking matched. This method is designed to mark joined rows properly without exposing internal index of row.
- excludeRowsAlreadyMatched
Do not join with rows already matched previously. This is used for right side of left semi join in StreamingSymmetricHashJoinExec only.
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val joinSide: JoinSide
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
metrics: StateStoreMetrics
Get the combined metrics of all the state stores
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
removeByKeyCondition(removalCondition: (UnsafeRow) ⇒ Boolean): Iterator[KeyToValuePair]
Remove using a predicate on keys.
Remove using a predicate on keys.
This produces an iterator over the (key, value, matched) tuples satisfying condition(key), where the underlying store is updated as a side-effect of producing next.
This implies the iterator must be consumed fully without any other operations on this manager or the underlying store being interleaved.
-
def
removeByValueCondition(removalCondition: (UnsafeRow) ⇒ Boolean): Iterator[KeyToValuePair]
Remove using a predicate on values.
Remove using a predicate on values.
At a high level, this produces an iterator over the (key, value, matched) tuples such that value satisfies the predicate, where producing an element removes the value from the state store and producing all elements with a given key updates it accordingly.
This implies the iterator must be consumed fully without any other operations on this manager or the underlying store being interleaved.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()