public class MultiFileHdfsReader
extends java.lang.Object
SingleFileHdfsReader
to manage the situation of multiple files per partition.
The offset for MultiFileHdfsReader, which is also the offset that gets
committed in and used by Samza, consists of two parts: file index,
actual offset within file. For example, 3:127
Format of the offset within file is defined by the implementation of
SingleFileHdfsReader itself.| Constructor and Description |
|---|
MultiFileHdfsReader(HdfsReaderFactory.ReaderType readerType,
org.apache.samza.system.SystemStreamPartition systemStreamPartition,
java.util.List<java.lang.String> partitionDescriptors,
java.lang.String offset) |
MultiFileHdfsReader(HdfsReaderFactory.ReaderType readerType,
org.apache.samza.system.SystemStreamPartition systemStreamPartition,
java.util.List<java.lang.String> partitionDescriptors,
java.lang.String offset,
int numMaxRetries) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
static java.lang.String |
generateOffset(int fileIndex,
java.lang.String singleFileOffset)
Generate the offset based on file index and offset within single file
|
static int |
getCurFileIndex(java.lang.String offset)
Get the current file index from the offset string
|
static java.lang.String |
getCurSingleFileOffset(java.lang.String offset)
Get the offset within file from the offset string
|
org.apache.samza.system.SystemStreamPartition |
getSystemStreamPartition() |
boolean |
hasNext() |
org.apache.samza.system.IncomingMessageEnvelope |
readNext() |
void |
reconnect()
Reconnect to the file systems in case of failure.
|
void |
reconnect(java.lang.String offset)
Reconnect to the file systems in case of failures.
|
public MultiFileHdfsReader(HdfsReaderFactory.ReaderType readerType, org.apache.samza.system.SystemStreamPartition systemStreamPartition, java.util.List<java.lang.String> partitionDescriptors, java.lang.String offset)
public MultiFileHdfsReader(HdfsReaderFactory.ReaderType readerType, org.apache.samza.system.SystemStreamPartition systemStreamPartition, java.util.List<java.lang.String> partitionDescriptors, java.lang.String offset, int numMaxRetries)
public static int getCurFileIndex(java.lang.String offset)
offset - offset string that contains both file index and offset within filepublic static java.lang.String getCurSingleFileOffset(java.lang.String offset)
offset - offset string that contains both file index and offset within filepublic static java.lang.String generateOffset(int fileIndex,
java.lang.String singleFileOffset)
fileIndex - index of the filesingleFileOffset - offset within single filepublic boolean hasNext()
public org.apache.samza.system.IncomingMessageEnvelope readNext()
public void reconnect()
SamzaException if reaches max number of
retries.public void reconnect(java.lang.String offset)
offset - reset offset to the specified offset
Throw SamzaException if reaches max number of
retries.public void close()
public org.apache.samza.system.SystemStreamPartition getSystemStreamPartition()