public abstract class PrefetchSqlFirehoseFactory<T> extends Object implements FiniteFirehoseFactory<InputRowParser<Map<String,Object>>,T>
- Caching: for the first call of connect(InputRowParser, File), it caches objects in a local disk
up to maxCacheCapacityBytes. These caches are NOT deleted until the process terminates, and thus can be used for
future reads.
- Fetching: when it reads all cached data, it fetches remaining objects into a local disk and reads data from
them. For the performance reason, prefetch technique is used, that is, when the size of remaining fetched data is
smaller than FetchConfig.prefetchTriggerBytes, a background prefetch thread automatically starts to fetch remaining
objects.
This implementation aims to avoid maintaining a persistent connection to the database by prefetching the resultset into disk.
Prefetching can be turned on/off by setting maxFetchCapacityBytes. Depending on prefetching is enabled or
disabled, the behavior of the firehose is different like below.
1. If prefetch is enabled this firehose can fetch input objects in background.
2. When next() is called, it first checks that there are already fetched files in local storage.
2.1 If exists, it simply chooses a fetched file and returns a LineIterator reading that file.
2.2 If there is no fetched files in local storage but some objects are still remained to be read, the firehose
fetches one of input objects in background immediately. Finally, the firehose returns an iterator of JsonIterator
for deserializing the saved resultset.
3. If prefetch is disabled, the firehose saves the resultset to file and returns an iterator of JsonIterator
which directly reads the stream opened by openObjectStream(T, java.io.File). If there is an IOException, it will throw it
and the read will fail.
| Constructor and Description |
|---|
PrefetchSqlFirehoseFactory(Long maxCacheCapacityBytes,
Long maxFetchCapacityBytes,
Long prefetchTriggerBytes,
Long fetchTimeout,
com.fasterxml.jackson.databind.ObjectMapper objectMapper) |
| Modifier and Type | Method and Description |
|---|---|
Firehose |
connect(InputRowParser<Map<String,Object>> firehoseParser,
File temporaryDirectory) |
long |
getFetchTimeout() |
long |
getMaxCacheCapacityBytes() |
long |
getMaxFetchCapacityBytes() |
int |
getNumSplits(SplitHintSpec splitHintSpec) |
List<T> |
getObjects() |
long |
getPrefetchTriggerBytes() |
Stream<InputSplit<T>> |
getSplits(SplitHintSpec splitHintSpec) |
protected void |
initializeObjectsIfNeeded() |
protected abstract Collection<T> |
initObjects() |
protected abstract InputStream |
openObjectStream(T object,
File filename)
Open an input stream from the given object.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitisSplittable, withSplitconnect, connectForSamplerpublic long getMaxCacheCapacityBytes()
public long getMaxFetchCapacityBytes()
public long getPrefetchTriggerBytes()
public long getFetchTimeout()
public Firehose connect(InputRowParser<Map<String,Object>> firehoseParser, @Nullable File temporaryDirectory)
connect in interface FirehoseFactory<InputRowParser<Map<String,Object>>>protected void initializeObjectsIfNeeded()
public Stream<InputSplit<T>> getSplits(@Nullable SplitHintSpec splitHintSpec)
getSplits in interface FiniteFirehoseFactory<InputRowParser<Map<String,Object>>,T>public int getNumSplits(@Nullable SplitHintSpec splitHintSpec)
getNumSplits in interface FiniteFirehoseFactory<InputRowParser<Map<String,Object>>,T>protected abstract InputStream openObjectStream(T object, File filename) throws IOException
object - an object to be readfilename - file to which the object is fetched intoIOExceptionprotected abstract Collection<T> initObjects()
Copyright © 2011–2020 The Apache Software Foundation. All rights reserved.