Class SnowflakeSinkServiceV2
- java.lang.Object
-
- com.snowflake.kafka.connector.internal.streaming.SnowflakeSinkServiceV2
-
- All Implemented Interfaces:
SnowflakeSinkService
public class SnowflakeSinkServiceV2 extends Object implements SnowflakeSinkService
This is per task configuration. A task can be assigned multiple partitions. Major methods are startTask, insert, getOffset and close methods.StartTask: Called when partitions are assigned. Responsible for generating the POJOs.
Insert and getOffset are called when
SnowflakeSinkTask.put(Collection)andSnowflakeSinkTask.preCommit(Map)APIs are called.This implementation of SinkService uses Streaming Snowpipe (Streaming Ingestion)
Hence this initializes the channel, opens, closes. The StreamingIngestChannel resides inside
TopicPartitionChannelwhich is per partition.
-
-
Constructor Summary
Constructors Constructor Description SnowflakeSinkServiceV2(long flushTimeSeconds, long fileSizeBytes, long recordNum, SnowflakeConnectionService conn, RecordService recordService, SnowflakeTelemetryService telemetryService, Map<String,String> topicToTableMap, SnowflakeSinkConnectorConfig.BehaviorOnNullValues behaviorOnNullValues, boolean enableCustomJMXMonitoring, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, Map<String,String> connectorConfig, boolean enableSchematization, Map<String,TopicPartitionChannel> partitionsToChannel)SnowflakeSinkServiceV2(SnowflakeConnectionService conn, Map<String,String> connectorConfig)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcallAllGetOffset()used for testing onlyvoidclose(Collection<org.apache.kafka.common.TopicPartition> partitions)This function is called during rebalance.voidcloseAll()terminate all tasks and close this service instanceSnowflakeSinkConnectorConfig.BehaviorOnNullValuesgetBehaviorOnNullValuesConfig()longgetFileSize()This is more of size in bytes of buffered records.longgetFlushTime()Optional<com.codahale.metrics.MetricRegistry>getMetricRegistry(String partitionChannelKey)longgetOffset(org.apache.kafka.common.TopicPartition topicPartition)retrieve offset of last loaded record for given pipe nameintgetPartitionCount()get the number of partitions assigned to this sink servicelonggetRecordNumber()net.snowflake.ingest.streaming.SnowflakeStreamingIngestClientgetStreamingIngestClient()protected Optional<TopicPartitionChannel>getTopicPartitionChannelFromCacheKey(String topicPartitionChannelKey)Used for testing Onlyvoidinsert(Collection<org.apache.kafka.connect.sink.SinkRecord> records)Inserts the given record into buffer and then eventually calls insertRows API if buffer threshold has reached.voidinsert(org.apache.kafka.connect.sink.SinkRecord record)Inserts individual records into buffer.booleanisClosed()retrieve sink service statusstatic StringpartitionChannelKey(String topic, int partition)Gets a unique identifier consisting of topic name and partition number.voidsetBehaviorOnNullValuesConfig(SnowflakeSinkConnectorConfig.BehaviorOnNullValues behavior)voidsetCustomJMXMetrics(boolean enableJMX)voidsetErrorReporter(KafkaRecordErrorReporter kafkaRecordErrorReporter)voidsetFileSize(long size)Assume this is buffer size in bytes, since this is streaming ingestionvoidsetFlushTime(long time)change flush rate of sink service the minimum flush time is controlled bySnowflakeSinkConnectorConfig.BUFFER_FLUSH_TIME_SEC_MINvoidsetIsStoppedToTrue()close all cleaner thread but have no effect on sink service contextvoidsetMetadataConfig(SnowflakeMetadataConfig configMap)set the metadata config to let user control what metadata to be collected into SF dbvoidsetRecordNumber(long num)change maximum number of record cached in buffer to control the flush rate, 0 for unlimitedvoidsetSinkTaskContext(org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext)voidsetTopic2TableMap(Map<String,String> topicToTableMap)pass topic to table map to sink servicevoidstartPartition(String tableName, org.apache.kafka.common.TopicPartition topicPartition)Creates a table if it doesnt exist in Snowflake.voidstartPartitions(Collection<org.apache.kafka.common.TopicPartition> partitions, Map<String,String> topic2Table)Initializes multiple Channels and partitionsToChannel maps with new instances ofTopicPartitionChannel
-
-
-
Constructor Detail
-
SnowflakeSinkServiceV2
public SnowflakeSinkServiceV2(SnowflakeConnectionService conn, Map<String,String> connectorConfig)
-
SnowflakeSinkServiceV2
public SnowflakeSinkServiceV2(long flushTimeSeconds, long fileSizeBytes, long recordNum, SnowflakeConnectionService conn, RecordService recordService, SnowflakeTelemetryService telemetryService, Map<String,String> topicToTableMap, SnowflakeSinkConnectorConfig.BehaviorOnNullValues behaviorOnNullValues, boolean enableCustomJMXMonitoring, KafkaRecordErrorReporter kafkaRecordErrorReporter, org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext, net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient streamingIngestClient, Map<String,String> connectorConfig, boolean enableSchematization, Map<String,TopicPartitionChannel> partitionsToChannel)
-
-
Method Detail
-
startPartition
public void startPartition(String tableName, org.apache.kafka.common.TopicPartition topicPartition)
Creates a table if it doesnt exist in Snowflake.Initializes the Channel and partitionsToChannel map with new instance of
TopicPartitionChannel- Specified by:
startPartitionin interfaceSnowflakeSinkService- Parameters:
tableName- destination table nametopicPartition- TopicPartition passed from Kafka
-
startPartitions
public void startPartitions(Collection<org.apache.kafka.common.TopicPartition> partitions, Map<String,String> topic2Table)
Initializes multiple Channels and partitionsToChannel maps with new instances ofTopicPartitionChannel- Specified by:
startPartitionsin interfaceSnowflakeSinkService- Parameters:
partitions- collection of topic partitiontopic2Table- map of topic to table name
-
insert
public void insert(Collection<org.apache.kafka.connect.sink.SinkRecord> records)
Inserts the given record into buffer and then eventually calls insertRows API if buffer threshold has reached.TODO: SNOW-473896 - Please note we will get away with Buffering logic in future commits.
- Specified by:
insertin interfaceSnowflakeSinkService- Parameters:
records- records coming from Kafka. Please note, they are not just from single topic and partition. It depends on the kafka connect worker node which can consume from multiple Topic and multiple Partitions
-
insert
public void insert(org.apache.kafka.connect.sink.SinkRecord record)
Inserts individual records into buffer. It fetches the TopicPartitionChannel from the map and then each partition(Streaming channel) calls its respective insertRows API- Specified by:
insertin interfaceSnowflakeSinkService- Parameters:
record- record content
-
getOffset
public long getOffset(org.apache.kafka.common.TopicPartition topicPartition)
Description copied from interface:SnowflakeSinkServiceretrieve offset of last loaded record for given pipe name- Specified by:
getOffsetin interfaceSnowflakeSinkService- Parameters:
topicPartition- topic and partition- Returns:
- offset, or -1 for empty
-
getPartitionCount
public int getPartitionCount()
Description copied from interface:SnowflakeSinkServiceget the number of partitions assigned to this sink service- Specified by:
getPartitionCountin interfaceSnowflakeSinkService- Returns:
- number of partitions
-
callAllGetOffset
public void callAllGetOffset()
Description copied from interface:SnowflakeSinkServiceused for testing only- Specified by:
callAllGetOffsetin interfaceSnowflakeSinkService
-
closeAll
public void closeAll()
Description copied from interface:SnowflakeSinkServiceterminate all tasks and close this service instance- Specified by:
closeAllin interfaceSnowflakeSinkService
-
close
public void close(Collection<org.apache.kafka.common.TopicPartition> partitions)
This function is called during rebalance.All the channels are closed. The client is still active. Upon rebalance, (inside
SnowflakeSinkTask.open(Collection)we will reopen the channel.We will wipe the cache partitionsToChannel so that in
SnowflakeSinkTask.open(Collection)we reinstantiate and fetch offsetToken- Specified by:
closein interfaceSnowflakeSinkService- Parameters:
partitions- a list of topic partition
-
setIsStoppedToTrue
public void setIsStoppedToTrue()
Description copied from interface:SnowflakeSinkServiceclose all cleaner thread but have no effect on sink service context- Specified by:
setIsStoppedToTruein interfaceSnowflakeSinkService
-
isClosed
public boolean isClosed()
Description copied from interface:SnowflakeSinkServiceretrieve sink service status- Specified by:
isClosedin interfaceSnowflakeSinkService- Returns:
- true is closed
-
setRecordNumber
public void setRecordNumber(long num)
Description copied from interface:SnowflakeSinkServicechange maximum number of record cached in buffer to control the flush rate, 0 for unlimited- Specified by:
setRecordNumberin interfaceSnowflakeSinkService- Parameters:
num- a non negative long number represents number of record limitation
-
setFileSize
public void setFileSize(long size)
Assume this is buffer size in bytes, since this is streaming ingestion- Specified by:
setFileSizein interfaceSnowflakeSinkService- Parameters:
size- in bytes - a non negative long number representing size of internal buffer for flush.
-
setTopic2TableMap
public void setTopic2TableMap(Map<String,String> topicToTableMap)
Description copied from interface:SnowflakeSinkServicepass topic to table map to sink service- Specified by:
setTopic2TableMapin interfaceSnowflakeSinkService- Parameters:
topicToTableMap- a String to String Map represents topic to table map
-
setFlushTime
public void setFlushTime(long time)
Description copied from interface:SnowflakeSinkServicechange flush rate of sink service the minimum flush time is controlled bySnowflakeSinkConnectorConfig.BUFFER_FLUSH_TIME_SEC_MIN- Specified by:
setFlushTimein interfaceSnowflakeSinkService- Parameters:
time- a non negative long number represents service flush time in seconds
-
setMetadataConfig
public void setMetadataConfig(SnowflakeMetadataConfig configMap)
Description copied from interface:SnowflakeSinkServiceset the metadata config to let user control what metadata to be collected into SF db- Specified by:
setMetadataConfigin interfaceSnowflakeSinkService- Parameters:
configMap- a String to String Map
-
getRecordNumber
public long getRecordNumber()
- Specified by:
getRecordNumberin interfaceSnowflakeSinkService- Returns:
- current number of record limitation
-
getFlushTime
public long getFlushTime()
- Specified by:
getFlushTimein interfaceSnowflakeSinkService- Returns:
- current flush time in seconds
-
getFileSize
public long getFileSize()
This is more of size in bytes of buffered records. This necessarily doesnt translates to files created by Streaming Ingest since they are compressed. So there is no 1:1 mapping.- Specified by:
getFileSizein interfaceSnowflakeSinkService- Returns:
- current file size limitation
-
setBehaviorOnNullValuesConfig
public void setBehaviorOnNullValuesConfig(SnowflakeSinkConnectorConfig.BehaviorOnNullValues behavior)
- Specified by:
setBehaviorOnNullValuesConfigin interfaceSnowflakeSinkService
-
setCustomJMXMetrics
public void setCustomJMXMetrics(boolean enableJMX)
- Specified by:
setCustomJMXMetricsin interfaceSnowflakeSinkService
-
getBehaviorOnNullValuesConfig
public SnowflakeSinkConnectorConfig.BehaviorOnNullValues getBehaviorOnNullValuesConfig()
- Specified by:
getBehaviorOnNullValuesConfigin interfaceSnowflakeSinkService
-
setErrorReporter
public void setErrorReporter(KafkaRecordErrorReporter kafkaRecordErrorReporter)
- Specified by:
setErrorReporterin interfaceSnowflakeSinkService
-
setSinkTaskContext
public void setSinkTaskContext(org.apache.kafka.connect.sink.SinkTaskContext sinkTaskContext)
- Specified by:
setSinkTaskContextin interfaceSnowflakeSinkService
-
getMetricRegistry
public Optional<com.codahale.metrics.MetricRegistry> getMetricRegistry(String partitionChannelKey)
- Specified by:
getMetricRegistryin interfaceSnowflakeSinkService
-
partitionChannelKey
public static String partitionChannelKey(String topic, int partition)
Gets a unique identifier consisting of topic name and partition number.- Parameters:
topic- topic namepartition- partition number- Returns:
- combinartion of topic and partition
-
getStreamingIngestClient
public net.snowflake.ingest.streaming.SnowflakeStreamingIngestClient getStreamingIngestClient()
-
getTopicPartitionChannelFromCacheKey
protected Optional<TopicPartitionChannel> getTopicPartitionChannelFromCacheKey(String topicPartitionChannelKey)
Used for testing Only- Parameters:
topicPartitionChannelKey- lookpartitionChannelKey(String, int)for key format- Returns:
- TopicPartitionChannel if present in partitionsToChannel Map else null
-
-