protected static class CloudBigtableIO.SourceWithKeys extends BoundedSource<Result>
BoundedSource for a Cloud Bigtable Table with a start/stop key range, along
with a potential filter via a Scan.BoundedSource.BoundedReader<T>Source.Reader<T>| Modifier and Type | Field and Description |
|---|---|
protected static long |
SIZED_BASED_MAX_SPLIT_COUNT |
protected static org.slf4j.Logger |
SOURCE_LOG |
| Modifier | Constructor and Description |
|---|---|
protected |
SourceWithKeys(CloudBigtableScanConfiguration configuration,
long estimatedSize) |
| Modifier and Type | Method and Description |
|---|---|
protected long |
calculateEstimatedSizeBytes(PipelineOptions options) |
BoundedSource.BoundedReader<Result> |
createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on the
Scan in the
configuration. |
protected CloudBigtableScanConfiguration |
getConfiguration() |
long |
getEstimatedSize() |
long |
getEstimatedSizeBytes(PipelineOptions options)
Gets an estimated size based on data returned from
getSampleRowKeys(). |
Coder<Result> |
getOutputCoder() |
List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> |
getSampleRowKeys()
Performs a call to get sample row keys from
CloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration) if they are not
yet cached. |
protected List<CloudBigtableIO.SourceWithKeys> |
getSplits(long desiredBundleSizeBytes) |
protected static boolean |
isWithinRange(byte[] scanStartKey,
byte[] scanEndKey,
byte[] startKey,
byte[] endKey)
Checks if the range of the region is within the range of the scan.
|
void |
populateDisplayData(DisplayData.Builder builder) |
protected List<CloudBigtableIO.SourceWithKeys> |
split(long regionSize,
long desiredBundleSizeBytes,
byte[] startKey,
byte[] stopKey)
Splits the region based on the start and stop key.
|
List<? extends BoundedSource<Result>> |
split(long desiredBundleSizeBytes,
PipelineOptions options)
Splits the bundle based on the assumption that the data is distributed evenly between
startKey and stopKey.
|
String |
toString() |
void |
validate()
Validates the existence of the table in the configuration.
|
getDefaultOutputCoderprotected static final org.slf4j.Logger SOURCE_LOG
protected static final long SIZED_BASED_MAX_SPLIT_COUNT
protected SourceWithKeys(CloudBigtableScanConfiguration configuration, long estimatedSize)
protected long calculateEstimatedSizeBytes(PipelineOptions options) throws IOException
IOExceptionpublic long getEstimatedSize()
public List<? extends BoundedSource<Result>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception
This method is called internally by Beam. Do not call it directly.
split in class BoundedSource<Result>desiredBundleSizeBytes - The desired size for each bundle, in bytes.options - The pipeline options.Exceptionprotected List<CloudBigtableIO.SourceWithKeys> getSplits(long desiredBundleSizeBytes) throws Exception
Exceptionprotected static boolean isWithinRange(byte[] scanStartKey,
byte[] scanEndKey,
byte[] startKey,
byte[] endKey)
@InternalApi(value="For internal usage only") public List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> getSampleRowKeys() throws IOException
CloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration) if they are not
yet cached. The sample row keys give information about tablet key boundaries and estimated
sizes.
For internal use only - public for technical reasons.
IOExceptionpublic void validate()
public long getEstimatedSizeBytes(PipelineOptions options) throws IOException
getSampleRowKeys(). The estimate
will be high if a Scan is set on the CloudBigtableScanConfiguration; in such cases, the estimate will not take the Scan into
account, and will return a larger estimate than what the CloudBigtableIO.Reader will
actually read.getEstimatedSizeBytes in class BoundedSource<Result>options - The pipeline options.IOExceptionprotected List<CloudBigtableIO.SourceWithKeys> split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey) throws IOException
Bytes.split(byte[], byte[],
int) under the covers.IOExceptionpublic BoundedSource.BoundedReader<Result> createReader(PipelineOptions options)
Scan in the
configuration.createReader in class BoundedSource<Result>protected CloudBigtableScanConfiguration getConfiguration()
public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData in interface HasDisplayDatapopulateDisplayData in class Source<Result>