Class TopicPartitionChannel
- java.lang.Object
-
- com.snowflake.kafka.connector.internal.streaming.TopicPartitionChannel
-
public class TopicPartitionChannel extends Object
This is a wrapper on top of Streaming Ingest Channel which is responsible for ingesting rows to Snowflake.There is a one to one relation between partition and channel.
The number of TopicPartitionChannel objects can scale in proportion to the number of partitions of a topic.
Whenever a new instance is created, the cache(Map) in SnowflakeSinkService is also replaced, and we will reload the offsets from SF and reset the consumer offset in kafka
During rebalance, we would lose this state and hence there is a need to invoke getLatestOffsetToken from Snowflake
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classTopicPartitionChannel.StreamingBufferA buffer which holds the rows before calling insertRows API.
-
Field Summary
Fields Modifier and Type Field Description static longNO_OFFSET_TOKEN_REGISTERED_IN_SNOWFLAKE
-
Constructor Summary
Constructors Constructor Description TopicPartitionChannel(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, org.apache.kafka.common.TopicPartition topicPartition, String channelNameFormatV1, String tableName, boolean hasSchemaEvolutionPermission, BufferThreshold streamingBufferThreshold, Map<String,String> sfConnectorConfig, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, SnowflakeConnectionService conn, RecordService recordService, SnowflakeTelemetryService telemetryService, boolean enableCustomJMXMonitoring, MetricsJmxReporter metricsJmxReporter)TopicPartitionChannel(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, org.apache.kafka.common.TopicPartition topicPartition, String channelNameFormatV1, String tableName, BufferThreshold streamingBufferThreshold, Map<String,String> sfConnectorConfig, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, SnowflakeConnectionService conn, SnowflakeTelemetryService telemetryService)Testing only, initialize TopicPartitionChannel without the connection service
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcloseChannel()Close channel associated to this partition Not rethrowing connect exception because the connector will stop.protected longfetchOffsetTokenWithRetry()Fetches the offset token from Snowflake.static StringgenerateChannelNameFormatV2(String channelNameFormatV1, String connectorName)This is the new channel Name format that was created.protected longgetApproxSizeOfRecordInBytes(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)Get Approximate size of Sink Record which we get from Kafka.protected net.snowflake.ingest.streaming.SnowflakeStreamingIngestChannelgetChannel()StringgetChannelNameFormatV1()protected longgetLatestConsumerOffset()protected longgetOffsetPersistedInSnowflake()longgetOffsetSafeToCommitToKafka()Get committed offset from Snowflake.longgetPreviousFlushTimeStampMs()protected longgetProcessedOffset()protected SnowflakeTelemetryChannelStatusgetSnowflakeTelemetryChannelStatus()TopicPartitionChannel.StreamingBuffergetStreamingBuffer()protected SnowflakeTelemetryServicegetTelemetryServiceV2()protected voidinsertBufferedRecordsIfFlushTimeThresholdReached()If difference between current time and previous flush time is more than threshold, insert the buffered Rows.voidinsertRecordToBuffer(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)Inserts the record into bufferbooleanisChannelClosed()protected booleanisPartitionBufferEmpty()protected voidsetLatestConsumerOffset(long consumerOffset)StringtoString()
-
-
-
Field Detail
-
NO_OFFSET_TOKEN_REGISTERED_IN_SNOWFLAKE
public static final long NO_OFFSET_TOKEN_REGISTERED_IN_SNOWFLAKE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
TopicPartitionChannel
public TopicPartitionChannel(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, org.apache.kafka.common.TopicPartition topicPartition, String channelNameFormatV1, String tableName, BufferThreshold streamingBufferThreshold, Map<String,String> sfConnectorConfig, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, SnowflakeConnectionService conn, SnowflakeTelemetryService telemetryService)Testing only, initialize TopicPartitionChannel without the connection service
-
TopicPartitionChannel
public TopicPartitionChannel(net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, org.apache.kafka.common.TopicPartition topicPartition, String channelNameFormatV1, String tableName, boolean hasSchemaEvolutionPermission, BufferThreshold streamingBufferThreshold, Map<String,String> sfConnectorConfig, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, SnowflakeConnectionService conn, RecordService recordService, SnowflakeTelemetryService telemetryService, boolean enableCustomJMXMonitoring, MetricsJmxReporter metricsJmxReporter)- Parameters:
streamingIngestClient- client created specifically for this tasktopicPartition- topic partition corresponding to this Streaming Channel (TopicPartitionChannel)channelNameFormatV1- channel Name which is deterministic for topic and partitiontableName- table to ingest in snowflakehasSchemaEvolutionPermission- if the role has permission to perform schema evolution on the tablestreamingBufferThreshold- bytes, count of records and flush time thresholds.sfConnectorConfig- configuration set for snowflake connectorkafkaRecordErrorReporter- kafka errpr reporter for sending records to DLQsinkTaskContext- context on Kafka Connect's runtimeconn- the snowflake connection servicerecordService- record service for processing incoming offsets from KafkatelemetryService- Telemetry Service which includes the Telemetry Client, sends Json data to Snowflake
-
-
Method Detail
-
generateChannelNameFormatV2
public static String generateChannelNameFormatV2(String channelNameFormatV1, String connectorName)
This is the new channel Name format that was created. New channel name prefixes connector name in old format. Please note, we will not open channel with new format. We will run a migration function from this new channel format to old channel format and drop new channel format.- Parameters:
channelNameFormatV1- Original format used.connectorName- connector name used in SF config JSON.- Returns:
- new channel name introduced as part of @see this change (released in version 2.1.0)
-
insertRecordToBuffer
public void insertRecordToBuffer(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)
Inserts the record into bufferStep 1: Initializes this channel by fetching the offsetToken from Snowflake for the first time this channel/partition has received offset after start/restart.
Step 2: Decides whether given offset from Kafka needs to be processed and whether it qualifies for being added into buffer.
- Parameters:
kafkaSinkRecord- input record from Kafka
-
insertBufferedRecordsIfFlushTimeThresholdReached
protected void insertBufferedRecordsIfFlushTimeThresholdReached()
If difference between current time and previous flush time is more than threshold, insert the buffered Rows.Note: We acquire buffer lock since we copy the buffer.
Threshold is config parameter:
SnowflakeSinkConnectorConfig.BUFFER_FLUSH_TIME_SECPrevious flush time here means last time we called insertRows API with rows present in
-
getOffsetSafeToCommitToKafka
public long getOffsetSafeToCommitToKafka()
Get committed offset from Snowflake. It does an HTTP call internally to find out what was the last offset inserted.If committedOffset fetched from Snowflake is null, we would return -1(default value of committedOffset) back to original call. (-1) would return an empty Map of partition and offset back to kafka.
Else, we will convert this offset and return the offset which is safe to commit inside Kafka (+1 of this returned value).
Check
SnowflakeSinkTask.preCommit(Map)Note:
If we cannot fetch offsetToken from snowflake even after retries and reopening the channel, we will throw app
- Returns:
- (offsetToken present in Snowflake + 1), else -1
-
fetchOffsetTokenWithRetry
protected long fetchOffsetTokenWithRetry()
Fetches the offset token from Snowflake.It uses Failsafe library which implements retries, fallbacks and circuit breaker.
Here is how Failsafe is implemented.
Fetches offsetToken from Snowflake (Streaming API)
If it returns a valid offset number, that number is returned back to caller.
If
SFExceptionis thrown, we will retry for max 3 times. (Including the original try)Upon reaching the limit of maxRetries, we will
Fallbackto opening a channel and fetching offsetToken again.Please note, upon executing fallback, we might throw an exception too. However, in that case we will not retry.
- Returns:
- long offset token present in snowflake for this channel/partition.
-
closeChannel
public void closeChannel()
Close channel associated to this partition Not rethrowing connect exception because the connector will stop. Channel will eventually be reopened.
-
isChannelClosed
public boolean isChannelClosed()
-
getStreamingBuffer
public TopicPartitionChannel.StreamingBuffer getStreamingBuffer()
-
getPreviousFlushTimeStampMs
public long getPreviousFlushTimeStampMs()
-
getChannelNameFormatV1
public String getChannelNameFormatV1()
-
getOffsetPersistedInSnowflake
protected long getOffsetPersistedInSnowflake()
-
getProcessedOffset
protected long getProcessedOffset()
-
getLatestConsumerOffset
protected long getLatestConsumerOffset()
-
isPartitionBufferEmpty
protected boolean isPartitionBufferEmpty()
-
getChannel
protected net.snowflake.ingest.streaming.SnowflakeStreamingIngestChannel getChannel()
-
getTelemetryServiceV2
protected SnowflakeTelemetryService getTelemetryServiceV2()
-
getSnowflakeTelemetryChannelStatus
protected SnowflakeTelemetryChannelStatus getSnowflakeTelemetryChannelStatus()
-
setLatestConsumerOffset
protected void setLatestConsumerOffset(long consumerOffset)
-
getApproxSizeOfRecordInBytes
protected long getApproxSizeOfRecordInBytes(org.apache.kafka.connect.sink.SinkRecord kafkaSinkRecord)
Get Approximate size of Sink Record which we get from Kafka. This is useful to find out how much data(records) we have buffered per channel/partition.This is an approximate size since there is no API available to find out size of record.
We first serialize the incoming kafka record into a Json format and find estimate size.
Please note, the size we calculate here is not accurate and doesnt match with actual size of Kafka record which we buffer in memory. (Kafka Sink Record has lot of other metadata information which is discarded when we calculate the size of Json Record)
We also do the same processing just before calling insertRows API for the buffered rows.
Downside of this calculation is we might try to buffer more records but we could be close to JVM memory getting full
- Parameters:
kafkaSinkRecord- sink record received as is from Kafka (With connector specific converter being invoked)- Returns:
- Approximate long size of record in bytes. 0 if record is broken
-
-