@Experimental public class CloudBigtableIO extends Object
PTransforms for reading and writing Google Cloud Bigtable entities in a Beam pipeline.
Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years--it's the database driving major applications such as Google Analytics and Gmail.
To use CloudBigtableIO, users must use gcloud to get a credential for Cloud Bigtable:
$ gcloud auth login
To read a PCollection from a table, with an optional Scan, use read(CloudBigtableScanConfiguration):
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline p = Pipeline.create(options);
PCollection<Result> = p.apply(
Read.from(CloudBigtableIO.read(
new CloudBigtableScanConfiguration.Builder()
.withProjectId("project-id")
.withInstanceId("instance-id")
.withTableId("table-id")
.build())));
To write a PCollection to a table, use writeToTable(CloudBigtableTableConfiguration):
PipelineOptions options =
PipelineOptionsFactory.fromArgs(args).create();
Pipeline p = Pipeline.create(options);
PCollection<Mutation> mutationCollection = ...;
mutationCollection.apply(
CloudBigtableIO.writeToTable(
new CloudBigtableScanConfiguration.Builder()
.withProjectId("project-id")
.withInstanceId("instance-id")
.withTableId("table-id")
.build()));
| Modifier and Type | Class and Description |
|---|---|
static class |
CloudBigtableIO.CloudBigtableMultiTableWriteFn
A
DoFn that can write either a bounded or unbounded PCollection of KV
of (String tableName, List of Mutations) to the specified table. |
static class |
CloudBigtableIO.CloudBigtableSingleTableBufferedWriteFn
A
DoFn that can write either a bounded or unbounded PCollection of Mutations to a table specified via a CloudBigtableTableConfiguration using the
BufferedMutator. |
static class |
CloudBigtableIO.Source
|
protected static class |
CloudBigtableIO.SourceWithKeys
A
BoundedSource for a Cloud Bigtable Table with a start/stop key range, along
with a potential filter via a Scan. |
| Constructor and Description |
|---|
CloudBigtableIO() |
| Modifier and Type | Method and Description |
|---|---|
static BoundedSource<Result> |
read(CloudBigtableScanConfiguration config)
|
static PTransform<PCollection<KV<String,Iterable<Mutation>>>,PDone> |
writeToMultipleTables(CloudBigtableConfiguration config)
Creates a
PTransform that can write either a bounded or unbounded PCollection
of KV of (String tableName, List of Mutations) to the specified table. |
static PTransform<PCollection<Mutation>,PDone> |
writeToTable(CloudBigtableTableConfiguration config)
Creates a
PTransform that can write either a bounded or unbounded PCollection
of Mutations to a table specified via a CloudBigtableTableConfiguration. |
public static PTransform<PCollection<Mutation>,PDone> writeToTable(CloudBigtableTableConfiguration config)
PTransform that can write either a bounded or unbounded PCollection
of Mutations to a table specified via a CloudBigtableTableConfiguration.
NOTE: This PTransform will write Puts and Deletes, not Appends and Increments.
This limitation exists because if the batch fails partway through, Appends/Increments might be
re-run, causing the Mutation to be executed twice, which is never the user's intent.
Re-running a Delete will not cause any differences. Re-running a Put isn't normally a problem,
but might cause problems in some cases when the number of versions supported by the column
family is greater than one. In a case where multiple versions could be a problem, it's best to
add a timestamp to the Put.
public static PTransform<PCollection<KV<String,Iterable<Mutation>>>,PDone> writeToMultipleTables(CloudBigtableConfiguration config)
PTransform that can write either a bounded or unbounded PCollection
of KV of (String tableName, List of Mutations) to the specified table.
NOTE: This PTransform will write Puts and Deletes, not Appends and Increments.
This limitation exists because if the batch fails partway through, Appends/Increments might be
re-run, causing the Mutation to be executed twice, which is never the user's intent.
Re-running a Delete will not cause any differences. Re-running a Put isn't normally a problem,
but might cause problems in some cases when the number of versions supported by the column
family is greater than one. In a case where multiple versions could be a problem, it's best to
add a timestamp to the Put.
public static BoundedSource<Result> read(CloudBigtableScanConfiguration config)