T - type of objectpublic interface HoodieData<T> extends Serializable
T
allowing to perform common transformation on it.
This abstraction provides common API implemented by
HoodieListData, HoodieListPairData), where all objects
are held in-memory by the executing processHoodieJavaRDD, etc)map, filter, etc)| Modifier and Type | Method and Description |
|---|---|
List<T> |
collectAsList()
Collects results of the underlying collection into a
List
This is a terminal operation |
long |
count()
Returns number of objects held in the collection
|
HoodieData<T> |
distinct()
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
HoodieData<T> |
distinct(int parallelism)
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
default <O> HoodieData<T> |
distinctWithKey(SerializableFunction<T,O> keyGetter,
int parallelism) |
HoodieData<T> |
filter(SerializableFunction<T,Boolean> filterFunc)
Returns new instance of
HoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on) |
<O> HoodieData<O> |
flatMap(SerializableFunction<T,Iterator<O>> func)
Maps every element in the collection into a collection of the new elements (provided by
Iterator) using provided mapping func, subsequently flattening the result
(by concatenating) into a single collection
This is an intermediate operation |
int |
getNumPartitions() |
boolean |
isEmpty()
Returns whether the collection is empty.
|
<O> HoodieData<O> |
map(SerializableFunction<T,O> func)
Maps every element in the collection using provided mapping
func. |
<O> HoodieData<O> |
mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func,
boolean preservesPartitioning)
Maps every element in the collection's partition (if applicable) by applying provided
mapping
func to every collection's partition
This is an intermediate operation |
<K,V> HoodiePairData<K,V> |
mapToPair(SerializablePairFunction<T,K,V> func)
Maps every element in the collection using provided mapping
func into a Pair
of elements K and V |
void |
persist(String level)
Persists the data w/ provided
level (if applicable) |
HoodieData<T> |
repartition(int parallelism)
Re-partitions underlying collection (if applicable) making sure new
HoodieData has
exactly parallelism partitions |
HoodieData<T> |
union(HoodieData<T> other)
Unions
HoodieData with another instance of HoodieData. |
void |
unpersist()
Un-persists the data (if previously persisted)
|
void persist(String level)
level (if applicable)void unpersist()
boolean isEmpty()
long count()
NOTE: This is a terminal operation
int getNumPartitions()
<O> HoodieData<O> map(SerializableFunction<T,O> func)
func.
This is an intermediate operation
O - output object typefunc - serializable map functionHoodieData holding mapped elements<O> HoodieData<O> mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func, boolean preservesPartitioning)
func to every collection's partition
This is an intermediate operationO - output object typefunc - serializable map function accepting Iterator of a single
partition's elements and returning a new Iterator mapping
every element of the partition into a new onepreservesPartitioning - whether to preserve partitioning in the resulting collectionHoodieData holding mapped elements<O> HoodieData<O> flatMap(SerializableFunction<T,Iterator<O>> func)
Iterator) using provided mapping func, subsequently flattening the result
(by concatenating) into a single collection
This is an intermediate operationO - output object typefunc - serializable function mapping every element T into Iterator<O>HoodieData holding mapped elements<K,V> HoodiePairData<K,V> mapToPair(SerializablePairFunction<T,K,V> func)
func into a Pair
of elements K and V
This is an intermediate operation
K - key type of the pairV - value type of the pairfunc - serializable map functionHoodiePairData holding mapped elementsHoodieData<T> distinct()
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationHoodieData<T> distinct(int parallelism)
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationHoodieData<T> filter(SerializableFunction<T,Boolean> filterFunc)
HoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on)filterFunc - filtering func either accepting or rejecting the elementsHoodieData holding filtered elementsHoodieData<T> union(HoodieData<T> other)
HoodieData with another instance of HoodieData.
Note that, it's only able to union same underlying collection implementations.
This is a stateful intermediate operationother - HoodieData collectionHoodieData holding superset of elements of this and other collectionsList<T> collectAsList()
List
This is a terminal operationHoodieData<T> repartition(int parallelism)
HoodieData has
exactly parallelism partitionsparallelism - target number of partitions in the underlying collectionHoodieData holding re-partitioned collectiondefault <O> HoodieData<T> distinctWithKey(SerializableFunction<T,O> keyGetter, int parallelism)
Copyright © 2022 The Apache Software Foundation. All rights reserved.