object OrcFilters extends OrcFiltersBase
Helper object for building ORC SearchArguments, which are used for ORC predicate push-down.
Due to limitation of ORC SearchArgument builder, we had to implement separate checking and
conversion passes through the Filter to make sure we only convert predicates that are known
to be convertible.
An ORC SearchArgument must be built in one pass using a single builder. For example, you can't
build a = 1 and b = 2 first, and then combine them into a = 1 AND b = 2. This is quite
different from the cases in Spark SQL or Parquet, where complex filters can be easily built using
existing simpler ones.
The annoying part is that, SearchArgument builder methods like startAnd(), startOr(), and
startNot() mutate internal state of the builder instance. This forces us to translate all
convertible filters with a single builder instance. However, if we try to translate a filter
before checking whether it can be converted or not, we may end up with a builder whose internal
state is inconsistent in the case of an inconvertible filter.
For example, to convert an And filter with builder b, we call b.startAnd() first, and then
try to convert its children. Say we convert left child successfully, but find that right
child is inconvertible. Alas, b.startAnd() call can't be rolled back, and b is inconsistent
now.
The workaround employed here is to trim the Spark filters before trying to convert them. This way, we can only do the actual conversion on the part of the Filter that is known to be convertible.
P.S.: Hive seems to use SearchArgument together with ExprNodeGenericFuncDesc only. Usage of
builder methods mentioned above can only be found in test code, where all tested filters are
known to be convertible.
- Alphabetic
- By Inheritance
- OrcFilters
- OrcFiltersBase
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
OrcPrimitiveField(fieldName: String, fieldType: DataType) extends Product with Serializable
- Definition Classes
- OrcFiltersBase
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
- def convertibleFilters(dataTypeMap: Map[String, OrcPrimitiveField], filters: Seq[Filter]): Seq[Filter]
-
def
createFilter(schema: StructType, filters: Seq[Filter]): Option[SearchArgument]
Create ORC filter as a SearchArgument instance.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getPredicateLeafType(dataType: DataType): Type
Get PredicateLeafType which is corresponding to the given DataType.
-
def
getSearchableTypeMap(schema: StructType, caseSensitive: Boolean): Map[String, OrcPrimitiveField]
This method returns a map which contains ORC field name and data type.
This method returns a map which contains ORC field name and data type. Each key represents a column;
dotsare used as separators for nested columns. If any part of the names containsdots, it is quoted to avoid confusion. Seeorg.apache.spark.sql.connector.catalog.quotedfor implementation details.BinaryType, UserDefinedType, ArrayType and MapType are ignored.
- Attributes
- protected[org.apache.spark.sql]
- Definition Classes
- OrcFiltersBase
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )