T - type of object.public class HoodieListData<T> extends HoodieBaseListData<T> implements HoodieData<T>
HoodieData holding internally a Stream of objects.
HoodieListData can have either of the 2 execution semantics:
HoodieJavaRDD, and it strives to provide
similar semantic as RDD container -- all intermediate (non-terminal, not de-referencing
the stream like "collect", "groupBy", etc) operations are executed *lazily*.
This allows to make sure that compute/memory churn is minimal since only necessary
computations will ultimately be performed.
Please note, however, that while RDD container allows the same collection to be
de-referenced more than once (ie terminal operation invoked more than once),
HoodieListData allows that only when instantiated w/ an eager execution semantic.data, lazy| Modifier and Type | Method and Description |
|---|---|
List<T> |
collectAsList()
Collects results of the underlying collection into a
List
This is a terminal operation |
long |
count()
Returns number of objects held in the collection
|
HoodieData<T> |
distinct()
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
HoodieData<T> |
distinct(int parallelism)
Returns new
HoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operation |
<O> HoodieData<T> |
distinctWithKey(SerializableFunction<T,O> keyGetter,
int parallelism) |
static <T> HoodieListData<T> |
eager(List<T> listData)
Creates instance of
HoodieListData bearing *eager* execution semantic |
HoodieData<T> |
filter(SerializableFunction<T,Boolean> filterFunc)
Returns new instance of
HoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on) |
<O> HoodieData<O> |
flatMap(SerializableFunction<T,Iterator<O>> func)
Maps every element in the collection into a collection of the new elements (provided by
Iterator) using provided mapping func, subsequently flattening the result
(by concatenating) into a single collection
This is an intermediate operation |
int |
getNumPartitions() |
boolean |
isEmpty()
Returns whether the collection is empty.
|
static <T> HoodieListData<T> |
lazy(List<T> listData)
Creates instance of
HoodieListData bearing *lazy* execution semantic |
<O> HoodieData<O> |
map(SerializableFunction<T,O> func)
Maps every element in the collection using provided mapping
func. |
<O> HoodieData<O> |
mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func,
boolean preservesPartitioning)
Maps every element in the collection's partition (if applicable) by applying provided
mapping
func to every collection's partition
This is an intermediate operation |
<K,V> HoodiePairData<K,V> |
mapToPair(SerializablePairFunction<T,K,V> func)
Maps every element in the collection using provided mapping
func into a Pair
of elements K and V |
void |
persist(String level)
Persists the data w/ provided
level (if applicable) |
HoodieData<T> |
repartition(int parallelism)
Re-partitions underlying collection (if applicable) making sure new
HoodieData has
exactly parallelism partitions |
HoodieData<T> |
union(HoodieData<T> other)
Unions
HoodieData with another instance of HoodieData. |
void |
unpersist()
Un-persists the data (if previously persisted)
|
asStreampublic static <T> HoodieListData<T> eager(List<T> listData)
HoodieListData bearing *eager* execution semanticT - type of objectlistData - a List of objects in type TList referencepublic static <T> HoodieListData<T> lazy(List<T> listData)
HoodieListData bearing *lazy* execution semanticT - type of objectlistData - a List of objects in type TList referencepublic void persist(String level)
HoodieDatalevel (if applicable)persist in interface HoodieData<T>public void unpersist()
HoodieDataunpersist in interface HoodieData<T>public <O> HoodieData<O> map(SerializableFunction<T,O> func)
HoodieDatafunc.
This is an intermediate operation
map in interface HoodieData<T>O - output object typefunc - serializable map functionHoodieData holding mapped elementspublic <O> HoodieData<O> mapPartitions(SerializableFunction<Iterator<T>,Iterator<O>> func, boolean preservesPartitioning)
HoodieDatafunc to every collection's partition
This is an intermediate operationmapPartitions in interface HoodieData<T>O - output object typefunc - serializable map function accepting Iterator of a single
partition's elements and returning a new Iterator mapping
every element of the partition into a new onepreservesPartitioning - whether to preserve partitioning in the resulting collectionHoodieData holding mapped elementspublic <O> HoodieData<O> flatMap(SerializableFunction<T,Iterator<O>> func)
HoodieDataIterator) using provided mapping func, subsequently flattening the result
(by concatenating) into a single collection
This is an intermediate operationflatMap in interface HoodieData<T>O - output object typefunc - serializable function mapping every element T into Iterator<O>HoodieData holding mapped elementspublic <K,V> HoodiePairData<K,V> mapToPair(SerializablePairFunction<T,K,V> func)
HoodieDatafunc into a Pair
of elements K and V
This is an intermediate operation
mapToPair in interface HoodieData<T>K - key type of the pairV - value type of the pairfunc - serializable map functionHoodiePairData holding mapped elementspublic HoodieData<T> distinct()
HoodieDataHoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationdistinct in interface HoodieData<T>public HoodieData<T> distinct(int parallelism)
HoodieDataHoodieData collection holding only distinct objects of the original one
This is a stateful intermediate operationdistinct in interface HoodieData<T>public <O> HoodieData<T> distinctWithKey(SerializableFunction<T,O> keyGetter, int parallelism)
distinctWithKey in interface HoodieData<T>public HoodieData<T> filter(SerializableFunction<T,Boolean> filterFunc)
HoodieDataHoodieData collection only containing elements matching provided
filterFunc (ie ones it returns true on)filter in interface HoodieData<T>filterFunc - filtering func either accepting or rejecting the elementsHoodieData holding filtered elementspublic HoodieData<T> union(HoodieData<T> other)
HoodieDataHoodieData with another instance of HoodieData.
Note that, it's only able to union same underlying collection implementations.
This is a stateful intermediate operationunion in interface HoodieData<T>other - HoodieData collectionHoodieData holding superset of elements of this and other collectionspublic HoodieData<T> repartition(int parallelism)
HoodieDataHoodieData has
exactly parallelism partitionsrepartition in interface HoodieData<T>parallelism - target number of partitions in the underlying collectionHoodieData holding re-partitioned collectionpublic boolean isEmpty()
HoodieDataisEmpty in interface HoodieData<T>isEmpty in class HoodieBaseListData<T>public long count()
HoodieDataNOTE: This is a terminal operation
count in interface HoodieData<T>count in class HoodieBaseListData<T>public int getNumPartitions()
getNumPartitions in interface HoodieData<T>public List<T> collectAsList()
HoodieDataList
This is a terminal operationcollectAsList in interface HoodieData<T>collectAsList in class HoodieBaseListData<T>Copyright © 2022 The Apache Software Foundation. All rights reserved.