trait StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of TableCatalog that support staging creation of
the a table before committing the table's metadata along with its contents in CREATE TABLE AS
SELECT or REPLACE TABLE AS SELECT operations.
It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS
SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE
TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first
drop the table via TableCatalog#dropTable(Identifier), then create the table via
StructType, Transform[], Map), and then perform
the write via SupportsWrite#newWriteBuilder(LogicalWriteInfo).
However, if the write operation fails, the catalog will have already dropped the table, and the
planner cannot roll back the dropping of the table.
If the catalog implements this plugin, the catalog can implement the methods to "stage" the
creation and the replacement of a table. After the table's
BatchWrite#commit(WriterCommitMessage[]) is called,
StagedTable#commitStagedChanges() is called, at which point the staged table can
complete both the data write and the metadata swap operation atomically.
- Annotations
- @Evolving()
- Since
3.0.0
- Alphabetic
- By Inheritance
- StagingTableCatalog
- TableCatalog
- CatalogPlugin
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def alterTable(ident: Identifier, changes: <repeated...>[TableChange]): Table
Apply a set of
changesto a table in the catalog.Apply a set of
changesto a table in the catalog.Implementations may reject the requested changes. If any change is rejected, none of the changes should be applied to the table.
The requested changes must be applied in the order given.
If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- changes
changes to apply to the table
- returns
updated metadata for the table
- Definition Classes
- TableCatalog
- Exceptions thrown
IllegalArgumentExceptionIf any change is rejected by the implementation.NoSuchTableExceptionIf the table doesn't exist or is a view
- abstract def createTable(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): Table
Create a table in the catalog.
Create a table in the catalog.
- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
- abstract def dropTable(ident: Identifier): Boolean
Drop a table in the catalog.
Drop a table in the catalog.
If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
- ident
a table identifier
- returns
true if a table was deleted, false if no table exists for the identifier
- Definition Classes
- TableCatalog
- abstract def initialize(name: String, options: CaseInsensitiveStringMap): Unit
Called to initialize configuration.
Called to initialize configuration.
This method is called once, just after the provider is instantiated.
- name
the name used to identify and load this catalog
- options
a case-insensitive string map of configuration
- Definition Classes
- CatalogPlugin
- abstract def listTables(namespace: Array[String]): Array[Identifier]
List the tables in a namespace from the catalog.
List the tables in a namespace from the catalog.
If the catalog supports views, this must return identifiers for only tables and not views.
- namespace
a multi-part namespace
- returns
an array of Identifiers for tables
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the namespace does not exist (optional).
- abstract def loadTable(ident: Identifier): Table
Load table metadata by
identifierfrom the catalog.Load table metadata by
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
- abstract def name(): String
Called to get this catalog's name.
Called to get this catalog's name.
This method is only called after
CaseInsensitiveStringMap)is called to pass the catalog's name.- Definition Classes
- CatalogPlugin
- abstract def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit
Renames a table in the catalog.
Renames a table in the catalog.
If the catalog supports views and contains a view for the old identifier and not a table, this throws
NoSuchTableException. Additionally, if the new identifier is a table or a view, this throwsTableAlreadyExistsException.If the catalog does not support table renames between namespaces, it throws
UnsupportedOperationException.- oldIdent
the table identifier of the existing table to rename
- newIdent
the new table identifier of the table
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table to rename doesn't exist or is a viewTableAlreadyExistsExceptionIf the new table name already exists or is a viewUnsupportedOperationExceptionIf the namespaces of old and new identifiers do not match (optional)
- abstract def stageCreate(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation of a table, preparing it to be committed into the metastore.
Stage the creation of a table, preparing it to be committed into the metastore.
When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by
StagedTable#commitStagedChanges().- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
- abstract def stageCreateOrReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of
StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)UnsupportedOperationExceptionIf a requested partition transform is not supported
- abstract def stageReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist, committing the staged changes should fail with
NoSuchTableException. This differs from the semantics ofStructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)NoSuchTableExceptionIf the table does not existUnsupportedOperationExceptionIf a requested partition transform is not supported
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def defaultNamespace(): Array[String]
Return a default namespace for the catalog.
Return a default namespace for the catalog.
When this catalog is set as the current catalog, the namespace returned by this method will be set as the current namespace.
The namespace returned by this method is not required to exist.
- returns
a multi-part namespace
- Definition Classes
- CatalogPlugin
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def invalidateTable(ident: Identifier): Unit
Invalidate cached table metadata for an
identifier.Invalidate cached table metadata for an
identifier.If the table is already loaded or cached, drop cached data. If the table does not exist or is not cached, do nothing. Calling this method should not query remote services.
- ident
a table identifier
- Definition Classes
- TableCatalog
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def purgeTable(ident: Identifier): Boolean
Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.
Drop a table in the catalog and completely remove its data by skipping a trash even if it is supported.
If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
If the catalog supports to purge a table, this method should be overridden. The default implementation throws
UnsupportedOperationException.- ident
a table identifier
- returns
true if a table was deleted, false if no table exists for the identifier
- Definition Classes
- TableCatalog
- Since
3.1.0
- Exceptions thrown
UnsupportedOperationExceptionIf table purging is not supported
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tableExists(ident: Identifier): Boolean
Test whether a table exists using an
identifierfrom the catalog.Test whether a table exists using an
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must return false.
- ident
a table identifier
- returns
true if the table exists, false otherwise
- Definition Classes
- TableCatalog
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()