UDF(User Defined Function,用户自定义函数)
和 UDTF (User Defined Table-generating Function,用户自定义生成表函数)See: Description
| Interface | Description |
|---|---|
| BaseStorageHandler |
This is the interface on top of which both Hive-compatible and ODPS storage handlers are built
Internal usage *ONLY* within ODPS framework.
|
| BridgeStorageHandler |
Hive-compatible APIs for building storage handler
TODO: see if we can remove bridged version of inputformat and outputformat, then we can
have BridgeHiveStorageHandler implements *both* BaseStorageHandler and Hive's HiveStorageHandler interfaces
|
| ContextFunction |
拥有
ExecutionContext的一类自定义函数。 |
| DataCollector | |
| InputFormat |
TODO: see if we can remove this
|
| OutputFormat |
TODO: see if we can remove this
OutputFormat describes the output-specification
|
| RecordReader |
RecordReader converts the byte-oriented view of the input, provided by the InputSplit,
and presents a record-oriented Writable (which will usually de-serialized into
Record
by an implementation of RecordSerDe ). |
| RecordWriter |
RecordWriter writes the output Writable record (usually serialized by an implementation of
RecordSerDe)to output. |
| TableRecordReader | |
| UDTFCollector | |
| UDTFPuller |
| Class | Description |
|---|---|
| Aggregator |
继承
Aggregator 实现 UDAF。UDAF (User Defined Aggregation Function) :用户自定义聚合函数,其输入输出是多对一的关系,即将多条输入记录聚合成一条输出值。 可以与 SQL 中的 Group By 语句联用。 实现 Java UDAF 类需要继承 Aggregator 类。 Aggregator 流程主要分为四部分,分别对应四个主要接口: Aggregator.newBuffer() 聚合中间值 buffer 的创建和初始化。
Aggregator.iterate(Writable, Writable[]) 实现此方法对输入数据进行计算,聚合到中间值 buffer。其中第一个参数是 newBuffer() 产生的结果,第二个参数是数据源。
Aggregator.merge(Writable, Writable) 实现此方法将两个中间值 merge 聚合到一起。其中第一个参数是 newBuffer() 产生的结果,第二个参数是 iterate 操作完成后产生的中间结果。
Aggregator.terminate(Writable) 实现此方法将 merge 操作完成后产生的中间结果转换为 ODPS SQL 基本类型。
初始化流程在 Aggregator.setup(ExecutionContext)调用中完成,用户可重写此方法来实现一次性初始操作,例如共享资源的读取等。
聚合过程的中间数据 buffer 类继承于 Writable, 除内建类型外,用户可继承 Writable 类实现自定义类。buffer 大小不应该随数据量递增,最好不要超过 2MB,否则会造成内存占用过大。 |
| DataAttributes |
Provides interfaces to access different attributes of the underlying data, including the attributes provided
by the user, as well as different (system) properties that govern the underlying data, such as
the record columns, resources used, etc.
|
| ExecutionContext |
运行时的执行上下文信息。
|
| Extractor |
Base extractor class, user-defined extractors shall extend from this class
|
| OdpsStorageHandler |
Recommended class (over
HiveStorageHandler) to extend from for custom storage handler
This provides interfaces to reason about the Extractor/Outputer implemented by the user,
for converting raw byte stream into records and vice versa. |
| Outputer |
Base outputer class, custom outputer shall extend from this class
|
| RecordSerDe |
SerDe interface for ODPS record to and from
Writable |
| StandaloneUDTF |
具有拉数据功能的UDTF,可以主动调用getNextRow()获取一条记录。
仅在LOT中才能使用,并且有如下限制:
1.
|
| TableOutputerAttributes |
Provides interface to access outputer-only related attributes.
|
| UDF |
UDF 基类
UDF (User Defined Scalar Function) 自定义函数,其输入输出是一对一的关系,即读入一行数据,写出一条输出值。
|
| UDJ |
UDJ (User Defined Join)
|
| UDTF |
UDTF 是 User Defined Table-generating Function 缩写,用来解决一次函数调用输出多行数据的场景,也是唯一能返回多个字段的自定义函数。 |
| Enum | Description |
|---|---|
| OdpsType |
映射到ODPS数据类型,包括
STRING:字符串
BIGINT:长整数型
DOUBLE:双精度符点数类型
BOOLEAN:双精度符点数类型
IGNORE:忽略类型映射
不推荐直接使用。
|
| Exception | Description |
|---|---|
| InvalidInvocationException | |
| UDFException |
UDF(User Defined Function,用户自定义函数)
和 UDTF (User Defined Table-generating Function,用户自定义生成表函数)Copyright © 2021 Alibaba Cloud Computing. All rights reserved.