Class SnowflakeSinkServiceV2

  • All Implemented Interfaces:
    SnowflakeSinkService

    public class SnowflakeSinkServiceV2
    extends Object
    implements SnowflakeSinkService
    This is per task configuration. A task can be assigned multiple partitions. Major methods are startTask, insert, getOffset and close methods.

    StartTask: Called when partitions are assigned. Responsible for generating the POJOs.

    Insert and getOffset are called when SnowflakeSinkTask.put(Collection) and SnowflakeSinkTask.preCommit(Map) APIs are called.

    This implementation of SinkService uses Streaming Snowpipe (Streaming Ingestion)

    Hence this initializes the channel, opens, closes. The StreamingIngestChannel resides inside TopicPartitionChannel which is per partition.

    • Method Detail

      • startPartition

        public void startPartition​(String tableName,
                                   org.apache.kafka.common.TopicPartition topicPartition)
        Creates a table if it doesnt exist in Snowflake.

        Initializes the Channel and partitionsToChannel map with new instance of TopicPartitionChannel

        Specified by:
        startPartition in interface SnowflakeSinkService
        Parameters:
        tableName - destination table name
        topicPartition - TopicPartition passed from Kafka
      • startPartitions

        public void startPartitions​(Collection<org.apache.kafka.common.TopicPartition> partitions,
                                    Map<String,​String> topic2Table)
        Initializes multiple Channels and partitionsToChannel maps with new instances of TopicPartitionChannel
        Specified by:
        startPartitions in interface SnowflakeSinkService
        Parameters:
        partitions - collection of topic partition
        topic2Table - map of topic to table name
      • insert

        public void insert​(Collection<org.apache.kafka.connect.sink.SinkRecord> records)
        Inserts the given record into buffer and then eventually calls insertRows API if buffer threshold has reached.

        TODO: SNOW-473896 - Please note we will get away with Buffering logic in future commits.

        Specified by:
        insert in interface SnowflakeSinkService
        Parameters:
        records - records coming from Kafka. Please note, they are not just from single topic and partition. It depends on the kafka connect worker node which can consume from multiple Topic and multiple Partitions
      • insert

        public void insert​(org.apache.kafka.connect.sink.SinkRecord record)
        Inserts individual records into buffer. It fetches the TopicPartitionChannel from the map and then each partition(Streaming channel) calls its respective insertRows API
        Specified by:
        insert in interface SnowflakeSinkService
        Parameters:
        record - record content
      • getOffset

        public long getOffset​(org.apache.kafka.common.TopicPartition topicPartition)
        Description copied from interface: SnowflakeSinkService
        retrieve offset of last loaded record for given pipe name
        Specified by:
        getOffset in interface SnowflakeSinkService
        Parameters:
        topicPartition - topic and partition
        Returns:
        offset, or -1 for empty
      • close

        public void close​(Collection<org.apache.kafka.common.TopicPartition> partitions)
        This function is called during rebalance.

        All the channels are closed. The client is still active. Upon rebalance, (inside SnowflakeSinkTask.open(Collection) we will reopen the channel.

        We will wipe the cache partitionsToChannel so that in SnowflakeSinkTask.open(Collection) we reinstantiate and fetch offsetToken

        Specified by:
        close in interface SnowflakeSinkService
        Parameters:
        partitions - a list of topic partition
      • setRecordNumber

        public void setRecordNumber​(long num)
        Description copied from interface: SnowflakeSinkService
        change maximum number of record cached in buffer to control the flush rate, 0 for unlimited
        Specified by:
        setRecordNumber in interface SnowflakeSinkService
        Parameters:
        num - a non negative long number represents number of record limitation
      • setFileSize

        public void setFileSize​(long size)
        Assume this is buffer size in bytes, since this is streaming ingestion
        Specified by:
        setFileSize in interface SnowflakeSinkService
        Parameters:
        size - in bytes - a non negative long number representing size of internal buffer for flush.
      • getFileSize

        public long getFileSize()
        This is more of size in bytes of buffered records. This necessarily doesnt translates to files created by Streaming Ingest since they are compressed. So there is no 1:1 mapping.
        Specified by:
        getFileSize in interface SnowflakeSinkService
        Returns:
        current file size limitation
      • partitionChannelKey

        public static String partitionChannelKey​(String connectorName,
                                                 String topic,
                                                 int partition)
        Gets a unique identifier consisting of connector name, topic name and partition number.
        Parameters:
        connectorName - Connector name is always unique. (Two connectors with same name won't be allowed by Connector Framework)

        Note: Customers can have same named connector in different connector runtimes (Like DEV or PROD)

        topic - topic name
        partition - partition number
        Returns:
        combinartion of topic and partition
      • getTopicPartitionChannelFromCacheKey

        protected Optional<TopicPartitionChannel> getTopicPartitionChannelFromCacheKey​(String topicPartitionChannelKey)
        Used for testing Only
        Parameters:
        topicPartitionChannelKey - look #partitionChannelKey(String, int) for key format
        Returns:
        TopicPartitionChannel if present in partitionsToChannel Map else null