@InternalExtensionOnly public static class CloudBigtableIO.Source extends BoundedSource<Result>
BoundedSource.BoundedReader<T>Source.Reader<T>| Modifier and Type | Field and Description |
|---|---|
protected static long |
SIZED_BASED_MAX_SPLIT_COUNT |
protected static org.slf4j.Logger |
SOURCE_LOG |
| Modifier and Type | Method and Description |
|---|---|
protected long |
calculateEstimatedSizeBytes(PipelineOptions options) |
BoundedSource.BoundedReader<Result> |
createReader(PipelineOptions options)
Creates a reader that will scan the entire table based on the
Scan in the
configuration. |
protected CloudBigtableScanConfiguration |
getConfiguration() |
long |
getEstimatedSizeBytes(PipelineOptions options)
Gets an estimated size based on data returned from
getSampleRowKeys(). |
Coder<Result> |
getOutputCoder() |
List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> |
getSampleRowKeys()
Performs a call to get sample row keys from
CloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration) if they are not
yet cached. |
protected List<CloudBigtableIO.SourceWithKeys> |
getSplits(long desiredBundleSizeBytes) |
protected static boolean |
isWithinRange(byte[] scanStartKey,
byte[] scanEndKey,
byte[] startKey,
byte[] endKey)
Checks if the range of the region is within the range of the scan.
|
void |
populateDisplayData(DisplayData.Builder builder) |
protected List<CloudBigtableIO.SourceWithKeys> |
split(long regionSize,
long desiredBundleSizeBytes,
byte[] startKey,
byte[] stopKey)
Splits the region based on the start and stop key.
|
List<? extends BoundedSource<Result>> |
split(long desiredBundleSizeBytes,
PipelineOptions options)
Splits the table based on keys that belong to tablets, known as "regions" in the HBase API.
|
void |
validate()
Validates the existence of the table in the configuration.
|
getDefaultOutputCoderprotected static final org.slf4j.Logger SOURCE_LOG
protected static final long SIZED_BASED_MAX_SPLIT_COUNT
public List<? extends BoundedSource<Result>> split(long desiredBundleSizeBytes, PipelineOptions options) throws Exception
RegionLocator interface, which calls CloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration) under the covers.
A CloudBigtableIO.SourceWithKeys may correspond to a single region or a portion of a region.
If a split is smaller than a single region, the split is calculated based on the assumption that the data is distributed evenly between the region's startKey and stopKey. That assumption may not be correct for any specific start/stop key combination.
This method is called internally by Beam. Do not call it directly.
split in class BoundedSource<Result>desiredBundleSizeBytes - The desired size for each bundle, in bytes.options - The pipeline options.Exceptionprotected List<CloudBigtableIO.SourceWithKeys> getSplits(long desiredBundleSizeBytes) throws Exception
Exceptionprotected static boolean isWithinRange(byte[] scanStartKey,
byte[] scanEndKey,
byte[] startKey,
byte[] endKey)
@InternalApi(value="For internal usage only") public List<com.google.bigtable.repackaged.com.google.cloud.bigtable.data.v2.models.KeyOffset> getSampleRowKeys() throws IOException
CloudBigtableServiceImpl.getSampleRowKeys(CloudBigtableTableConfiguration) if they are not
yet cached. The sample row keys give information about tablet key boundaries and estimated
sizes.
For internal use only - public for technical reasons.
IOExceptionpublic void validate()
public long getEstimatedSizeBytes(PipelineOptions options) throws IOException
getSampleRowKeys(). The estimate
will be high if a Scan is set on the CloudBigtableScanConfiguration; in such cases, the estimate will not take the Scan into
account, and will return a larger estimate than what the CloudBigtableIO.Reader will
actually read.getEstimatedSizeBytes in class BoundedSource<Result>options - The pipeline options.IOExceptionprotected long calculateEstimatedSizeBytes(PipelineOptions options) throws IOException
IOExceptionprotected List<CloudBigtableIO.SourceWithKeys> split(long regionSize, long desiredBundleSizeBytes, byte[] startKey, byte[] stopKey) throws IOException
Bytes.split(byte[], byte[],
int) under the covers.IOExceptionpublic BoundedSource.BoundedReader<Result> createReader(PipelineOptions options)
Scan in the
configuration.createReader in class BoundedSource<Result>protected CloudBigtableScanConfiguration getConfiguration()
public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData in interface HasDisplayDatapopulateDisplayData in class Source<Result>