public class HdfsSystemDescriptor extends org.apache.samza.system.descriptors.SystemDescriptor<HdfsSystemDescriptor>
HdfsSystemDescriptor can be used for specifying Samza and HDFS-specific properties of a HDFS
input/output system. It can also be used for obtaining HdfsInputDescriptors and
HdfsOutputDescriptors, which can be used for specifying Samza and system-specific properties of
HDFS input/output streams.
System properties provided in configuration override corresponding properties specified using a descriptor.
| Constructor and Description |
|---|
HdfsSystemDescriptor(java.lang.String systemName) |
| Modifier and Type | Method and Description |
|---|---|
HdfsInputDescriptor |
getInputDescriptor(java.lang.String streamId)
Gets an
HdfsInputDescriptor for the input stream of this system. |
HdfsOutputDescriptor |
getOutputDescriptor(java.lang.String streamId)
Gets an
HdfsOutputDescriptor for the output stream of this system. |
java.util.Map<java.lang.String,java.lang.String> |
toConfig() |
HdfsSystemDescriptor |
withConsumerBlackList(java.lang.String blackList)
Black list used by directory partitioner to filter out unwanted files in a hdfs directory.
|
HdfsSystemDescriptor |
withConsumerBufferCapacity(long bufferCapacity)
The capacity of the hdfs consumer buffer - the blocking queue used for storing messages.
|
HdfsSystemDescriptor |
withConsumerGroupPattern(java.lang.String groupPattern)
Group pattern used by directory partitioner for advanced partitioning.
|
HdfsSystemDescriptor |
withConsumerNumMaxRetries(long maxRetries)
Number of max retries for the hdfs consumer readers per partition.
|
HdfsSystemDescriptor |
withConsumerWhiteList(java.lang.String whiteList)
White list used by directory partitioner to filter out unwanted files in a hdfs directory.
|
HdfsSystemDescriptor |
withDatePathFormat(java.lang.String datePathFormat)
In an HdfsWriter implementation that performs time-based output bucketing,
the user may configure a date format (suitable for inclusion in a file path)
using
SimpleDateFormat formatting that the Bucketer implementation will
use to generate HDFS paths and filenames. |
HdfsSystemDescriptor |
withOutputBaseDir(java.lang.String outputBaseDir)
The base output directory into which all HDFS output for this job will be written.
|
HdfsSystemDescriptor |
withReaderType(java.lang.String readerType)
The type of the file reader for consumer (avro, plain, etc.)
|
HdfsSystemDescriptor |
withStagingDirectory(java.lang.String stagingDirectory)
Staging directory for storing partition description.
|
HdfsSystemDescriptor |
withWriteBatchSizeBytes(long writeBatchSizeBytes)
Split output files from all writer tasks based on # of bytes written to optimize
MapReduce utilization for Hadoop jobs that will process the data later.
|
HdfsSystemDescriptor |
withWriteBatchSizeRecords(long writeBatchSizeRecords)
Split output files from all writer tasks based on # of bytes written to optimize
MapReduce utilization for Hadoop jobs that will process the data later.
|
HdfsSystemDescriptor |
withWriteCompressionType(java.lang.String writeCompressionType)
Simple, human-readable label for various compression options.
|
HdfsSystemDescriptor |
withWriterClassName(java.lang.String writerClassName)
The fully-qualified class name of the HdfsWriter subclass that will write for this system.
|
public HdfsInputDescriptor getInputDescriptor(java.lang.String streamId)
HdfsInputDescriptor for the input stream of this system.
The message in the stream has no key and the value type is determined by reader type.
streamId - id of the input streamHdfsInputDescriptor for the hdfs input streampublic HdfsOutputDescriptor getOutputDescriptor(java.lang.String streamId)
HdfsOutputDescriptor for the output stream of this system.
The message in the stream has no key and the value type is determined by writer class.
streamId - id of the output streamHdfsOutputDescriptor for the hdfs output streampublic HdfsSystemDescriptor withDatePathFormat(java.lang.String datePathFormat)
SimpleDateFormat formatting that the Bucketer implementation will
use to generate HDFS paths and filenames. The more granular this date format, the more
often a bucketing HdfsWriter will begin a new date-path bucket when creating the next output file.datePathFormat - date path formatpublic HdfsSystemDescriptor withOutputBaseDir(java.lang.String outputBaseDir)
outputBaseDir - output base directorypublic HdfsSystemDescriptor withWriteBatchSizeBytes(long writeBatchSizeBytes)
writeBatchSizeBytes - write batch size in bytes.public HdfsSystemDescriptor withWriteBatchSizeRecords(long writeBatchSizeRecords)
writeBatchSizeRecords - write batch size in records.public HdfsSystemDescriptor withWriteCompressionType(java.lang.String writeCompressionType)
writeCompressionType - compression type for writer.public HdfsSystemDescriptor withWriterClassName(java.lang.String writerClassName)
writerClassName - writer class name.public HdfsSystemDescriptor withConsumerBufferCapacity(long bufferCapacity)
bufferCapacity - the buffer capacity for HDFS consumer.public HdfsSystemDescriptor withConsumerNumMaxRetries(long maxRetries)
maxRetries - number of max retires for HDFS consumer.public HdfsSystemDescriptor withConsumerWhiteList(java.lang.String whiteList)
whiteList - white list for HDFS consumer inputs.public HdfsSystemDescriptor withConsumerBlackList(java.lang.String blackList)
blackList - black list for HDFS consumer inputs.public HdfsSystemDescriptor withConsumerGroupPattern(java.lang.String groupPattern)
groupPattern - group parttern for HDFS consumer inputs.public HdfsSystemDescriptor withReaderType(java.lang.String readerType)
readerType - reader type for HDFS consumer inputs.public HdfsSystemDescriptor withStagingDirectory(java.lang.String stagingDirectory)
stagingDirectory - staging directory for HDFS consumer inputs.public java.util.Map<java.lang.String,java.lang.String> toConfig()
toConfig in class org.apache.samza.system.descriptors.SystemDescriptor<HdfsSystemDescriptor>