Class RegexSequenceRecordReader
- java.lang.Object
-
- org.datavec.api.records.reader.BaseRecordReader
-
- org.datavec.api.records.reader.impl.FileRecordReader
-
- org.datavec.api.records.reader.impl.regex.RegexSequenceRecordReader
-
- All Implemented Interfaces:
Closeable,Serializable,AutoCloseable,Configurable,RecordReader,SequenceRecordReader
public class RegexSequenceRecordReader extends FileRecordReader implements SequenceRecordReader
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classRegexSequenceRecordReader.LineErrorHandlingError handling mode: How should invalid lines (i.e., those that don't match the provided regex) be handled?
FailOnInvalid: Throw an IllegalStateException when an invalid line is found
SkipInvalid: Skip invalid lines (quietly, with no warning)
SkipInvalidWithWarning: Skip invalid lines, but log a warning
-
Field Summary
Fields Modifier and Type Field Description static CharsetDEFAULT_CHARSETstatic RegexSequenceRecordReader.LineErrorHandlingDEFAULT_ERROR_HANDLINGstatic org.slf4j.LoggerLOGstatic StringSKIP_NUM_LINES-
Fields inherited from class org.datavec.api.records.reader.impl.FileRecordReader
appendLabel, conf, currentUri, labels, locationsIterator
-
Fields inherited from class org.datavec.api.records.reader.BaseRecordReader
inputSplit, listeners, streamCreatorFn
-
Fields inherited from interface org.datavec.api.records.reader.RecordReader
APPEND_LABEL, LABELS, NAME_SPACE
-
-
Constructor Summary
Constructors Constructor Description RegexSequenceRecordReader(String regex, int skipNumLines)RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidinitialize(Configuration conf, InputSplit split)Called once at initialization.List<SequenceRecord>loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas)Load multiple sequence records from the given a list ofRecordMetaDatainstancesSequenceRecordloadSequenceFromMetaData(RecordMetaData recordMetaData)Load a single sequence record from the givenRecordMetaDatainstance
Note: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once usingSequenceRecordReader.loadSequenceFromMetaData(List)SequenceRecordnextSequence()Similar toSequenceRecordReader.sequenceRecord(), but returns aRecordobject, that may include metadata such as the source of the datavoidreset()Reset record reader iteratorList<List<Writable>>sequenceRecord()Returns a sequence record.List<List<Writable>>sequenceRecord(URI uri, DataInputStream dataInputStream)Load a sequence record from the given DataInputStream UnlikeRecordReader.next()the internal state of the RecordReader is not modified Implementations of this method should not close the DataInputStream-
Methods inherited from class org.datavec.api.records.reader.impl.FileRecordReader
close, doInitialize, getConf, getCurrentLabel, getLabel, getLabels, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setConf, setLabels
-
Methods inherited from class org.datavec.api.records.reader.BaseRecordReader
batchesSupported, getListeners, invokeListeners, setListeners, setListeners
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.datavec.api.conf.Configurable
getConf, setConf
-
Methods inherited from interface org.datavec.api.records.reader.RecordReader
batchesSupported, getLabels, getListeners, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setListeners, setListeners
-
-
-
-
Field Detail
-
SKIP_NUM_LINES
public static final String SKIP_NUM_LINES
-
DEFAULT_CHARSET
public static final Charset DEFAULT_CHARSET
-
DEFAULT_ERROR_HANDLING
public static final RegexSequenceRecordReader.LineErrorHandling DEFAULT_ERROR_HANDLING
-
LOG
public static final org.slf4j.Logger LOG
-
-
Constructor Detail
-
RegexSequenceRecordReader
public RegexSequenceRecordReader(String regex, int skipNumLines)
-
RegexSequenceRecordReader
public RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
-
-
Method Detail
-
initialize
public void initialize(Configuration conf, InputSplit split) throws IOException, InterruptedException
Description copied from interface:RecordReaderCalled once at initialization.- Specified by:
initializein interfaceRecordReader- Overrides:
initializein classFileRecordReader- Parameters:
conf- a configuration for initializationsplit- the split that defines the range of records to read- Throws:
IOExceptionInterruptedException
-
sequenceRecord
public List<List<Writable>> sequenceRecord()
Description copied from interface:SequenceRecordReaderReturns a sequence record.- Specified by:
sequenceRecordin interfaceSequenceRecordReader- Returns:
- a sequence of records
-
sequenceRecord
public List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException
Description copied from interface:SequenceRecordReaderLoad a sequence record from the given DataInputStream UnlikeRecordReader.next()the internal state of the RecordReader is not modified Implementations of this method should not close the DataInputStream- Specified by:
sequenceRecordin interfaceSequenceRecordReader- Throws:
IOException- if error occurs during reading from the input stream
-
reset
public void reset()
Description copied from interface:RecordReaderReset record reader iterator- Specified by:
resetin interfaceRecordReader- Overrides:
resetin classFileRecordReader
-
nextSequence
public SequenceRecord nextSequence()
Description copied from interface:SequenceRecordReaderSimilar toSequenceRecordReader.sequenceRecord(), but returns aRecordobject, that may include metadata such as the source of the data- Specified by:
nextSequencein interfaceSequenceRecordReader- Returns:
- next sequence record
-
loadSequenceFromMetaData
public SequenceRecord loadSequenceFromMetaData(RecordMetaData recordMetaData) throws IOException
Description copied from interface:SequenceRecordReaderLoad a single sequence record from the givenRecordMetaDatainstance
Note: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once usingSequenceRecordReader.loadSequenceFromMetaData(List)- Specified by:
loadSequenceFromMetaDatain interfaceSequenceRecordReader- Parameters:
recordMetaData- Metadata for the sequence record that we want to load from- Returns:
- Single sequence record for the given RecordMetaData instance
- Throws:
IOException- If I/O error occurs during loading
-
loadSequenceFromMetaData
public List<SequenceRecord> loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException
Description copied from interface:SequenceRecordReaderLoad multiple sequence records from the given a list ofRecordMetaDatainstances- Specified by:
loadSequenceFromMetaDatain interfaceSequenceRecordReader- Parameters:
recordMetaDatas- Metadata for the records that we want to load from- Returns:
- Multiple sequence record for the given RecordMetaData instances
- Throws:
IOException- If I/O error occurs during loading
-
-