public class BatchAppenderator extends Object implements Appenderator
AppenderatorImpl was split. For historical
reasons, the code for creating segments was all handled by the same code path in that class. The code
was correct but inefficient for batch ingestion from a memory perspective. If the input file being processed
by batch ingestion had enough sinks & hydrants produced then it may run out of memory either in the
hydrant creation phase (append) of this class or in the merge hydrants phase. Therefore, a new class,
BatchAppenderator, this class, was created to specialize in batch ingestion and the old class
for stream ingestion was renamed to StreamAppenderator.Appenderator.AppenderatorAddResult| Modifier and Type | Field and Description |
|---|---|
static int |
ROUGH_OVERHEAD_PER_HYDRANT |
static int |
ROUGH_OVERHEAD_PER_SINK |
| Modifier and Type | Method and Description |
|---|---|
Appenderator.AppenderatorAddResult |
add(SegmentIdWithShardSpec identifier,
InputRow row,
com.google.common.base.Supplier<Committer> committerSupplier,
boolean allowIncrementalPersists)
Add a row.
|
void |
clear()
Drop all in-memory and on-disk data, and forget any previously-remembered commit metadata.
|
void |
close()
Stop any currently-running processing and clean up after ourselves.
|
void |
closeNow()
Stop all processing, abandoning current pushes, currently running persist may be allowed to finish if they persist
critical metadata otherwise shutdown immediately.
|
com.google.common.util.concurrent.ListenableFuture<?> |
drop(SegmentIdWithShardSpec identifier)
Schedule dropping all data associated with a particular pending segment.
|
long |
getBytesCurrentlyInMemory() |
long |
getBytesInMemory(SegmentIdWithShardSpec identifier) |
String |
getDataSource()
Return the name of the dataSource associated with this Appenderator.
|
String |
getId()
Return the identifier of this Appenderator; useful for log messages and such.
|
List<SegmentIdWithShardSpec> |
getInMemorySegments() |
List<File> |
getPersistedidentifierPaths() |
<T> QueryRunner<T> |
getQueryRunnerForIntervals(Query<T> query,
Iterable<org.joda.time.Interval> intervals) |
<T> QueryRunner<T> |
getQueryRunnerForSegments(Query<T> query,
Iterable<SegmentDescriptor> specs) |
int |
getRowCount(SegmentIdWithShardSpec identifier)
Returns the number of rows in a particular pending segment.
|
int |
getRowsInMemory() |
List<SegmentIdWithShardSpec> |
getSegments()
Returns all active segments regardless whether they are in memory or persisted
|
int |
getTotalRowCount()
Returns the number of total rows in this appenderator of all segments pending push.
|
com.google.common.util.concurrent.ListenableFuture<Object> |
persistAll(Committer committer)
Persist any in-memory indexed data to durable storage.
|
com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> |
push(Collection<SegmentIdWithShardSpec> identifiers,
Committer committer,
boolean useUniquePath)
Merge and push particular segments to deep storage.
|
Object |
startJob()
Perform any initial setup.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitaddpublic static final int ROUGH_OVERHEAD_PER_SINK
public static final int ROUGH_OVERHEAD_PER_HYDRANT
public String getId()
AppenderatorgetId in interface Appenderatorpublic String getDataSource()
AppenderatorgetDataSource in interface Appenderatorpublic Object startJob()
AppenderatorstartJob in interface Appenderatorpublic Appenderator.AppenderatorAddResult add(SegmentIdWithShardSpec identifier, InputRow row, @Nullable com.google.common.base.Supplier<Committer> committerSupplier, boolean allowIncrementalPersists) throws IndexSizeExceededException, SegmentNotWritableException
AppenderatorIf no pending segment exists for the provided identifier, a new one will be created.
This method may trigger a Appenderator.persistAll(Committer) using the supplied Committer. If it does this, the
Committer is guaranteed to be *created* synchronously with the call to add, but will actually be used
asynchronously.
If committer is not provided, no metadata is persisted.
add in interface Appenderatoridentifier - the segment into which this row should be addedrow - the row to addcommitterSupplier - supplier of a committer associated with all data that has been added, including
this row if allowIncrementalPersists is set to false then this will not be
used as no persist will be done automaticallyallowIncrementalPersists - indicate whether automatic persist should be performed or not if required.
If this flag is set to false then the return value should have
Appenderator.AppenderatorAddResult.isPersistRequired set to true if persist was skipped
because of this flag and it is assumed that the responsibility of calling
Appenderator.persistAll(Committer) is on the caller.Appenderator.AppenderatorAddResultIndexSizeExceededException - if this row cannot be added because it is too largeSegmentNotWritableException - if the requested segment is known, but has been closedpublic List<SegmentIdWithShardSpec> getSegments()
getSegments in interface Appenderatorpublic List<SegmentIdWithShardSpec> getInMemorySegments()
public int getRowCount(SegmentIdWithShardSpec identifier)
AppenderatorgetRowCount in interface Appenderatoridentifier - segment to examinepublic int getTotalRowCount()
AppenderatorgetTotalRowCount in interface Appenderatorpublic int getRowsInMemory()
public long getBytesCurrentlyInMemory()
public long getBytesInMemory(SegmentIdWithShardSpec identifier)
public <T> QueryRunner<T> getQueryRunnerForIntervals(Query<T> query, Iterable<org.joda.time.Interval> intervals)
getQueryRunnerForIntervals in interface QuerySegmentWalkerpublic <T> QueryRunner<T> getQueryRunnerForSegments(Query<T> query, Iterable<SegmentDescriptor> specs)
getQueryRunnerForSegments in interface QuerySegmentWalkerpublic void clear()
Appenderatorclear in interface Appenderatorpublic com.google.common.util.concurrent.ListenableFuture<?> drop(SegmentIdWithShardSpec identifier)
AppenderatorAppenderator.clear()), any on-disk
commit metadata will remain unchanged. If there is no pending segment with this identifier, then this method will
do nothing.
You should not write to the dropped segment after calling "drop". If you need to drop all your data and
re-write it, consider Appenderator.clear() instead.
This method might be called concurrently from a thread different from the "main data appending / indexing thread",
from where all other methods in this class (except those inherited from QuerySegmentWalker) are called.
This typically happens when drop() is called in an async future callback. drop() itself is cheap
and relays heavy dropping work to an internal executor of this Appenderator.
drop in interface Appenderatoridentifier - the pending segment to droppublic com.google.common.util.concurrent.ListenableFuture<Object> persistAll(@Nullable Committer committer)
AppenderatorIf committer is not provided, no metadata is persisted.
persistAll in interface Appenderatorcommitter - a committer associated with all data that has been added so farpublic com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> push(Collection<SegmentIdWithShardSpec> identifiers, @Nullable Committer committer, boolean useUniquePath)
AppenderatorAppenderator.persistAll(Committer) using the provided Committer.
After this method is called, you cannot add new data to any segments that were previously under construction.
If committer is not provided, no metadata is persisted.
push in interface Appenderatoridentifiers - list of segments to pushcommitter - a committer associated with all data that has been added so faruseUniquePath - true if the segment should be written to a path with a unique identifierpublic void close()
Appenderatorclose in interface Appenderatorpublic void closeNow()
AppenderatorcloseNow in interface AppenderatorCopyright © 2011–2021 The Apache Software Foundation. All rights reserved.