Class TopicPartitionChannel


  • public class TopicPartitionChannel
    extends Object
    This is a wrapper on top of Streaming Ingest Channel which is responsible for ingesting rows to Snowflake.

    There is a one to one relation between partition and channel.

    The number of TopicPartitionChannel objects can scale in proportion to the number of partitions of a topic.

    Whenever a new instance is created, the cache(Map) in SnowflakeSinkService is also replaced, and we will reload the offsets from SF and reset the consumer offset in kafka

    During rebalance, we would lose this state and hence there is a need to invoke getLatestOffsetToken from Snowflake

    • Field Detail

      • NO_OFFSET_TOKEN_REGISTERED_IN_SNOWFLAKE

        public static final long NO_OFFSET_TOKEN_REGISTERED_IN_SNOWFLAKE
        See Also:
        Constant Field Values
    • Constructor Detail

      • TopicPartitionChannel

        public TopicPartitionChannel​(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient,
                                     org.apache.kafka.common.TopicPartition topicPartition,
                                     String channelName,
                                     String tableName,
                                     BufferThreshold streamingBufferThreshold,
                                     Map<String,​String> sfConnectorConfig,
                                     KafkaRecordErrorReporter kafkaRecordErrorReporter,
                                     org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext,
                                     SnowflakeTelemetryService telemetryService)
        Testing only, initialize TopicPartitionChannel without the connection service
      • TopicPartitionChannel

        public TopicPartitionChannel​(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient,
                                     org.apache.kafka.common.TopicPartition topicPartition,
                                     String channelName,
                                     String tableName,
                                     boolean hasSchemaEvolutionPermission,
                                     BufferThreshold streamingBufferThreshold,
                                     Map<String,​String> sfConnectorConfig,
                                     KafkaRecordErrorReporter kafkaRecordErrorReporter,
                                     org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext,
                                     SnowflakeConnectionService conn,
                                     RecordService recordService,
                                     SnowflakeTelemetryService telemetryService,
                                     boolean enableCustomJMXMonitoring,
                                     MetricsJmxReporter metricsJmxReporter)
        Parameters:
        streamingIngestClient - client created specifically for this task
        topicPartition - topic partition corresponding to this Streaming Channel (TopicPartitionChannel)
        channelName - channel Name which is deterministic for topic and partition
        tableName - table to ingest in snowflake
        hasSchemaEvolutionPermission - if the role has permission to perform schema evolution on the table
        streamingBufferThreshold - bytes, count of records and flush time thresholds.
        sfConnectorConfig - configuration set for snowflake connector
        kafkaRecordErrorReporter - kafka errpr reporter for sending records to DLQ
        sinkTaskContext - context on Kafka Connect's runtime
        conn - the snowflake connection service
        recordService - record service for processing incoming offsets from Kafka
        telemetryService - Telemetry Service which includes the Telemetry Client, sends Json data to Snowflake
    • Method Detail

      • insertRecordToBuffer

        public void insertRecordToBuffer​(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)
        Inserts the record into buffer

        Step 1: Initializes this channel by fetching the offsetToken from Snowflake for the first time this channel/partition has received offset after start/restart.

        Step 2: Decides whether given offset from Kafka needs to be processed and whether it qualifies for being added into buffer.

        Parameters:
        kafkaSinkRecord - input record from Kafka
      • insertBufferedRecordsIfFlushTimeThresholdReached

        protected void insertBufferedRecordsIfFlushTimeThresholdReached()
        If difference between current time and previous flush time is more than threshold, insert the buffered Rows.

        Note: We acquire buffer lock since we copy the buffer.

        Threshold is config parameter: SnowflakeSinkConnectorConfig.BUFFER_FLUSH_TIME_SEC

        Previous flush time here means last time we called insertRows API with rows present in

      • getOffsetSafeToCommitToKafka

        public long getOffsetSafeToCommitToKafka()
        Get committed offset from Snowflake. It does an HTTP call internally to find out what was the last offset inserted.

        If committedOffset fetched from Snowflake is null, we would return -1(default value of committedOffset) back to original call. (-1) would return an empty Map of partition and offset back to kafka.

        Else, we will convert this offset and return the offset which is safe to commit inside Kafka (+1 of this returned value).

        Check SnowflakeSinkTask.preCommit(Map)

        Note:

        If we cannot fetch offsetToken from snowflake even after retries and reopening the channel, we will throw app

        Returns:
        (offsetToken present in Snowflake + 1), else -1
      • fetchOffsetTokenWithRetry

        protected long fetchOffsetTokenWithRetry()
        Fetches the offset token from Snowflake.

        It uses Failsafe library which implements retries, fallbacks and circuit breaker.

        Here is how Failsafe is implemented.

        Fetches offsetToken from Snowflake (Streaming API)

        If it returns a valid offset number, that number is returned back to caller.

        If SFException is thrown, we will retry for max 3 times. (Including the original try)

        Upon reaching the limit of maxRetries, we will Fallback to opening a channel and fetching offsetToken again.

        Please note, upon executing fallback, we might throw an exception too. However, in that case we will not retry.

        Returns:
        long offset token present in snowflake for this channel/partition.
      • closeChannel

        public void closeChannel()
        Close channel associated to this partition Not rethrowing connect exception because the connector will stop. Channel will eventually be reopened.
      • isChannelClosed

        public boolean isChannelClosed()
      • getPreviousFlushTimeStampMs

        public long getPreviousFlushTimeStampMs()
      • getChannelName

        public String getChannelName()
      • getOffsetPersistedInSnowflake

        protected long getOffsetPersistedInSnowflake()
      • getProcessedOffset

        protected long getProcessedOffset()
      • getLatestConsumerOffset

        protected long getLatestConsumerOffset()
      • isPartitionBufferEmpty

        protected boolean isPartitionBufferEmpty()
      • getChannel

        protected net.snowflake.ingest.streaming.SnowflakeStreamingIngestChannel getChannel()
      • setLatestConsumerOffset

        protected void setLatestConsumerOffset​(long consumerOffset)
      • getApproxSizeOfRecordInBytes

        protected long getApproxSizeOfRecordInBytes​(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)
        Get Approximate size of Sink Record which we get from Kafka. This is useful to find out how much data(records) we have buffered per channel/partition.

        This is an approximate size since there is no API available to find out size of record.

        We first serialize the incoming kafka record into a Json format and find estimate size.

        Please note, the size we calculate here is not accurate and doesnt match with actual size of Kafka record which we buffer in memory. (Kafka Sink Record has lot of other metadata information which is discarded when we calculate the size of Json Record)

        We also do the same processing just before calling insertRows API for the buffered rows.

        Downside of this calculation is we might try to buffer more records but we could be close to JVM memory getting full

        Parameters:
        kafkaSinkRecord - sink record received as is from Kafka (With connector specific converter being invoked)
        Returns:
        Approximate long size of record in bytes. 0 if record is broken