public static class ArffLoader.ArffReader extends java.lang.Object implements RevisionHandler
BufferedReader reader =
new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader);
Instances data = arff.getData();
data.setClassIndex(data.numAttributes() - 1);
Typical code for incremental usage:
BufferedReader reader =
new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader, 1000);
Instances data = arff.getStructure();
data.setClassIndex(data.numAttributes() - 1);
Instance inst;
while ((inst = arff.readInstance(data)) != null) {
data.add(inst);
}
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
m_batchMode |
protected Instances |
m_Data
the actual data
|
protected java.util.List<java.lang.String> |
m_enclosures
List of (single character) enclosures to use instead of the defaults
|
protected java.lang.String |
m_fieldSeparator
Field separator (single character string) to use instead of the defaults
|
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse instance
|
protected int |
m_Lines
the number of lines read so far
|
protected boolean |
m_retainStringValues
Whether the values for string attributes will accumulate in the header
when reading incrementally
|
protected java.util.List<java.lang.Integer> |
m_stringAttIndices |
protected java.io.StreamTokenizer |
m_Tokenizer
the tokenizer for reading the stream
|
protected double[] |
m_ValueBuffer
Buffer of values for sparse instance
|
| Constructor and Description |
|---|
ArffReader(java.io.Reader reader)
Reads the data completely from the reader.
|
ArffReader(java.io.Reader reader,
Instances template,
int lines,
int capacity,
boolean batch,
java.lang.String... fieldSepAndEnclosures)
Initializes the reader without reading the header according to the
specified template.
|
ArffReader(java.io.Reader reader,
Instances template,
int lines,
int capacity,
java.lang.String... fieldSepAndEnclosures)
Initializes the reader without reading the header according to the
specified template.
|
ArffReader(java.io.Reader reader,
Instances template,
int lines,
java.lang.String... fieldSepAndEnclosures)
Reads the data without header according to the specified template.
|
ArffReader(java.io.Reader reader,
int capacity) |
ArffReader(java.io.Reader reader,
int capacity,
boolean batch)
Reads only the header and reserves the specified space for instances.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
compactify()
compactifies the data
|
protected void |
errorMessage(java.lang.String msg)
Throws error message with line number and last token read.
|
protected double |
getAttributeWeight()
Gets the value of an attribute's weight (if one exists).
|
Instances |
getData()
Returns the data that was read
|
protected void |
getFirstToken()
Gets next token, skipping empty lines.
|
protected void |
getIndex()
Gets index, checking for a premature and of line.
|
protected Instance |
getInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceFull(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceSparse(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected double |
getInstanceWeight()
Gets the value of an instance's weight (if one exists)
|
protected void |
getLastToken(boolean endOfFileOk)
Gets token and checks if its end of line.
|
int |
getLineNo()
returns the current line number
|
protected void |
getNextToken()
Gets next token, checking for a premature and of line.
|
boolean |
getRetainStringValues()
Get whether to retain the values of string attributes in memory (in the
header) when reading incrementally.
|
java.lang.String |
getRevision()
Returns the revision string.
|
Instances |
getStructure()
Returns the header format
|
protected void |
initBuffers()
initializes the buffers for sparse instances to be read
|
protected void |
initTokenizer()
Initializes the StreamTokenizer used for reading the ARFF file.
|
protected java.util.ArrayList<Attribute> |
parseAttribute(java.util.ArrayList<Attribute> attributes)
Parses the attribute declaration.
|
protected void |
readHeader(int capacity)
Reads and stores header of an ARFF file.
|
Instance |
readInstance(Instances structure)
Reads a single instance using the tokenizer and returns it.
|
Instance |
readInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected void |
readTillEOL()
Reads and skips all tokens before next end of line token.
|
void |
setRetainStringValues(boolean retain)
Set whether to retain the values of string attributes in memory (in the
header) when reading incrementally.
|
protected java.io.StreamTokenizer m_Tokenizer
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
protected java.util.List<java.lang.Integer> m_stringAttIndices
protected Instances m_Data
protected int m_Lines
protected boolean m_batchMode
protected boolean m_retainStringValues
protected java.lang.String m_fieldSeparator
protected java.util.List<java.lang.String> m_enclosures
public ArffReader(java.io.Reader reader)
throws java.io.IOException
getData() method.reader - the reader to usejava.io.IOException - if something goes wronggetData()public ArffReader(java.io.Reader reader,
int capacity)
throws java.io.IOException
java.io.IOExceptionpublic ArffReader(java.io.Reader reader,
int capacity,
boolean batch)
throws java.io.IOException
readInstance().reader - the reader to usecapacity - the capacity of the new datasetbatch - true if reading in batch modejava.io.IOException - if something goes wrongjava.io.IOException - if a problem occursgetStructure(),
readInstance(Instances)public ArffReader(java.io.Reader reader,
Instances template,
int lines,
java.lang.String... fieldSepAndEnclosures)
throws java.io.IOException
getData() method.reader - the reader to usetemplate - the template headerlines - the lines read so farfieldSepAndEnclosures - an optional array of Strings containing the
field separator and enclosures to use instead of the defaults.
The first entry in the array is expected to be the single
character field separator to use; the remaining entries (if any)
are enclosure characters to use.java.io.IOException - if something goes wronggetData()public ArffReader(java.io.Reader reader,
Instances template,
int lines,
int capacity,
java.lang.String... fieldSepAndEnclosures)
throws java.io.IOException
readInstance() method.reader - the reader to usetemplate - the template headerlines - the lines read so farcapacity - the capacity of the new datasetfieldSepAndEnclosures - an optional array of Strings containing the
field separator and enclosures to use instead of the defaults.
The first entry in the array is expected to be the single
character field separator to use; the remaining entries (if any)
are enclosure characters to use.java.io.IOException - if something goes wronggetData()public ArffReader(java.io.Reader reader,
Instances template,
int lines,
int capacity,
boolean batch,
java.lang.String... fieldSepAndEnclosures)
throws java.io.IOException
readInstance() method.reader - the reader to usetemplate - the template headerlines - the lines read so farcapacity - the capacity of the new datasetbatch - true if the data is going to be read in batch modefieldSepAndEnclosures - an optional array of Strings containing the
field separator and enclosures to use instead of the defaults.
The first entry in the array is expected to be the single
character field separator to use; the remaining entries (if any)
are enclosure characters to use.java.io.IOException - if something goes wronggetData()protected void initBuffers()
m_ValueBuffer,
m_IndicesBufferprotected void compactify()
protected void errorMessage(java.lang.String msg)
throws java.io.IOException
msg - the error message to be thrownjava.io.IOException - containing the error messagepublic int getLineNo()
protected void getFirstToken()
throws java.io.IOException
java.io.IOException - if reading the next token failsprotected void getIndex()
throws java.io.IOException
java.io.IOException - if it finds a premature end of lineprotected void getLastToken(boolean endOfFileOk)
throws java.io.IOException
endOfFileOk - whether EOF is OKjava.io.IOException - if it doesn't find an end of lineprotected double getInstanceWeight()
throws java.io.IOException
java.io.IOExceptionprotected void getNextToken()
throws java.io.IOException
java.io.IOException - if it finds a premature end of lineprotected void initTokenizer()
public Instance readInstance(Instances structure) throws java.io.IOException
structure - the dataset header information, will get updated in case
of string or relational attributesjava.io.IOException - if the information is not read successfullypublic Instance readInstance(Instances structure, boolean flag) throws java.io.IOException
structure - the dataset header information, will get updated in case
of string or relational attributesflag - if method should test for carriage return after each instancejava.io.IOException - if the information is not read successfullyprotected Instance getInstance(Instances structure, boolean flag) throws java.io.IOException
structure - the dataset header information, will get updated in case
of string or relational attributesflag - if method should test for carriage return after each instancejava.io.IOException - if the information is not read successfullyprotected Instance getInstanceSparse(boolean flag) throws java.io.IOException
flag - if method should test for carriage return after each instancejava.io.IOException - if the information is not read successfullyprotected Instance getInstanceFull(boolean flag) throws java.io.IOException
flag - if method should test for carriage return after each instancejava.io.IOException - if the information is not read successfullyprotected void readHeader(int capacity)
throws java.io.IOException
capacity - the number of instances to reserve in the data structurejava.io.IOException - if the information is not read successfullyprotected java.util.ArrayList<Attribute> parseAttribute(java.util.ArrayList<Attribute> attributes) throws java.io.IOException
attributes - the current attributes vectorjava.io.IOException - if the information is not read successfullyprotected void readTillEOL()
throws java.io.IOException
java.io.IOException - in case something goes wrongprotected double getAttributeWeight()
throws java.io.IOException
java.io.IOExceptionpublic Instances getStructure()
public Instances getData()
public void setRetainStringValues(boolean retain)
retain - true if string values are to be retained in memory when
reading incrementallypublic boolean getRetainStringValues()
public java.lang.String getRevision()
getRevision in interface RevisionHandler