public class CSVLoader extends AbstractFileLoader implements BatchConverter, IncrementalConverter, OptionHandler
-H No header row present in the data.
-N <range> The range of attributes to force type to be NOMINAL. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-L <nominal label spec> Optional specification of legal labels for nominal attributes. May be specified multiple times. Batch mode can determine this automatically (and so can incremental mode if the first in memory buffer load of instances contains an example of each legal value). The spec contains two parts separated by a ":". The first part can be a range of attribute indexes or a comma-separated list off attruibute names; the second part is a comma-separated list of labels. E.g "1,2,4-6:red,green,blue" or "att1,att2:red,green,blue"
-S <range> The range of attribute to force type to be STRING. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-D <range> The range of attribute to force type to be DATE. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-format <date format> The date formatting string to use to parse date values. (default: "yyyy-MM-dd'T'HH:mm:ss")
-R <range> The range of attribute to force type to be NUMERIC. 'first' and 'last' are accepted as well. Examples: "first-last", "1,4,5-27,50-last" (default: -none-)
-M <str> The string representing a missing value. (default: ?)
-F <separator> The field separator to be used. '\t' can be used as well. (default: ',')
-E <enclosures> The enclosure character(s) to use for strings. Specify as a comma separated list (e.g. ",' (default: ",')
-B <num> The size of the in memory buffer (in rows). (default: 100)
Loader.StructureNotReadyException| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
FILE_EXTENSION
the file extension.
|
protected int |
m_bufferSize
The maximum number of rows to hold in memory at any one time
|
protected java.util.ArrayList<java.lang.Object> |
m_current |
protected java.io.PrintWriter |
m_dataDumper |
protected Range |
m_dateAttributes
The range of attributes to force to type date
|
protected java.lang.String |
m_dateFormat
The formatting string to use to parse dates
|
protected java.lang.String |
m_Enclosures
enclosure character(s) to use for strings
|
protected java.lang.String |
m_FieldSeparator
the field separator.
|
protected java.lang.String[] |
m_fieldSeparatorAndEnclosures
Array holding field separator and enclosures to pass through to the
underlying ArffReader
|
protected java.text.SimpleDateFormat |
m_formatter
The formatter to use on dates
|
protected ArffLoader.ArffReader |
m_incrementalReader
Reader used to process and output data incrementally
|
protected java.lang.String |
m_MissingValue
The placeholder for missing values.
|
protected boolean |
m_noHeaderRow
whether the csv file contains a header row with att names
|
protected Range |
m_NominalAttributes
The range of attributes to force to type nominal.
|
protected java.util.List<java.lang.String> |
m_nominalLabelSpecs
The user-supplied legal nominal values - each entry in the list is a spec
|
protected java.util.Map<java.lang.Integer,java.util.LinkedHashSet<java.lang.String>> |
m_nominalVals
Lookup for nominal values
|
protected Range |
m_numericAttributes
The range of attributes to force to type numeric
|
protected java.util.List<java.lang.String> |
m_rowBuffer
The in memory row buffer
|
protected int |
m_rowCount |
protected java.io.BufferedReader |
m_sourceReader
The reader for the data.
|
protected java.io.StreamTokenizer |
m_st
Tokenizer for the data.
|
protected Range |
m_StringAttributes
The range of attributes to force to type string.
|
protected java.io.File |
m_tempFile |
protected weka.core.converters.CSVLoader.TYPE[] |
m_types |
FILE_EXTENSION_COMPRESSED, m_env, m_File, m_sourceFile, m_structure, m_useRelativePathm_retrievalBATCH, INCREMENTAL, NONE| Constructor and Description |
|---|
CSVLoader()
default constructor.
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
bufferSizeTipText()
Returns the tip text for this property.
|
java.lang.String |
dateAttributesTipText()
Returns the tip text for this property.
|
java.lang.String |
dateFormatTipText()
Returns the tip text for this property.
|
protected void |
dumpRow(java.lang.String row) |
java.lang.String |
enclosureCharactersTipText()
Returns the tip text for this property.
|
java.lang.String |
fieldSeparatorTipText()
Returns the tip text for this property.
|
int |
getBufferSize()
Get the buffer size to use - i.e. the number of rows to load and process in
memory at any one time
|
Instances |
getDataSet()
Return the full data set.
|
java.lang.String |
getDateAttributes()
Returns the current attribute range to be forced to type date.
|
java.lang.String |
getDateFormat()
Get the format to use for parsing date values.
|
java.lang.String |
getEnclosureCharacters()
Get the character(s) to use/recognize as string enclosures
|
java.lang.String |
getFieldSeparator()
Returns the character used as column separator.
|
java.lang.String |
getFileDescription()
Get a one line description of the type of file
|
java.lang.String |
getFileExtension()
Get the file extension used for this type of file
|
java.lang.String[] |
getFileExtensions()
Gets all the file extensions used for this type of file
|
java.lang.String |
getMissingValue()
Returns the current placeholder for missing values.
|
Instance |
getNextInstance(Instances structure)
Read the data set incrementally---get the next instance in the data set or
returns null if there are no more instances to get.
|
boolean |
getNoHeaderRowPresent()
Get whether there is no header row in the data.
|
java.lang.String |
getNominalAttributes()
Returns the current attribute range to be forced to type nominal.
|
java.lang.Object[] |
getNominalLabelSpecs()
Get label specifications for nominal attributes.
|
java.lang.String |
getNumericAttributes()
Gets the attribute range to be forced to type numeric
|
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
java.lang.String |
getRevision()
Returns the revision string.
|
java.lang.String |
getStringAttributes()
Returns the current attribute range to be forced to type string.
|
Instances |
getStructure()
Determines and returns (if possible) the structure (internally the header)
of the data set as an empty set of instances.
|
java.lang.String |
globalInfo()
Returns a string describing this attribute evaluator.
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(java.lang.String[] args)
Main method.
|
protected Instance |
makeInstance() |
protected void |
makeStructure() |
java.lang.String |
missingValueTipText()
Returns the tip text for this property.
|
java.lang.String |
noHeaderRowPresentTipText()
Returns the tip text for this property.
|
java.lang.String |
nominalAttributesTipText()
Returns the tip text for this property.
|
java.lang.String |
nominalLabelSpecsTipText()
Returns the tip text for this property.
|
java.lang.String |
numericAttributesTipText()
Returns the tip text for this property.
|
protected void |
openTempFiles() |
void |
reset()
Resets the loader ready to read a new data set
|
void |
setBufferSize(int buff)
Set the buffer size to use - i.e. the number of rows to load and process in
memory at any one time
|
void |
setDateAttributes(java.lang.String value)
Set the attribute range to be forced to type date.
|
void |
setDateFormat(java.lang.String value)
Set the format to use for parsing date values.
|
void |
setEnclosureCharacters(java.lang.String enclosure)
Set the character(s) to use/recognize as string enclosures
|
void |
setFieldSeparator(java.lang.String value)
Sets the character used as column separator.
|
void |
setMissingValue(java.lang.String value)
Sets the placeholder for missing values.
|
void |
setNoHeaderRowPresent(boolean b)
Set whether there is no header row in the data.
|
void |
setNominalAttributes(java.lang.String value)
Sets the attribute range to be forced to type nominal.
|
void |
setNominalLabelSpecs(java.lang.Object[] specs)
Set label specifications for nominal attributes.
|
void |
setNumericAttributes(java.lang.String value)
Sets the attribute range to be forced to type numeric
|
void |
setOptions(java.lang.String[] options)
Sets the OptionHandler's options using the given list.
|
void |
setSource(java.io.File file)
Resets the Loader object and sets the source of the data set to be the
supplied File object.
|
void |
setSource(java.io.InputStream input)
Resets the Loader object and sets the source of the data set to be the
supplied Stream object.
|
void |
setStringAttributes(java.lang.String value)
Sets the attribute range to be forced to type string.
|
java.lang.String |
stringAttributesTipText()
Returns the tip text for this property.
|
getUseRelativePath, makeOptionStr, retrieveFile, runFileLoader, setEnvironment, setFile, setUseRelativePath, useRelativePathTipTextgetRetrieval, setRetrievalpublic static java.lang.String FILE_EXTENSION
protected transient java.io.BufferedReader m_sourceReader
protected transient java.io.StreamTokenizer m_st
protected transient java.io.File m_tempFile
protected transient java.io.PrintWriter m_dataDumper
protected java.lang.String m_FieldSeparator
protected java.lang.String m_MissingValue
protected Range m_NominalAttributes
protected java.util.List<java.lang.String> m_nominalLabelSpecs
protected Range m_StringAttributes
protected Range m_dateAttributes
protected Range m_numericAttributes
protected java.lang.String m_dateFormat
protected java.text.SimpleDateFormat m_formatter
protected boolean m_noHeaderRow
protected java.lang.String m_Enclosures
protected java.util.List<java.lang.String> m_rowBuffer
protected int m_bufferSize
protected java.util.Map<java.lang.Integer,java.util.LinkedHashSet<java.lang.String>> m_nominalVals
protected ArffLoader.ArffReader m_incrementalReader
protected transient int m_rowCount
protected java.lang.String[] m_fieldSeparatorAndEnclosures
protected java.util.ArrayList<java.lang.Object> m_current
protected weka.core.converters.CSVLoader.TYPE[] m_types
public static void main(java.lang.String[] args)
args - should contain the name of an input file.public java.lang.String globalInfo()
public java.lang.String getFileExtension()
FileSourcedConvertergetFileExtension in interface FileSourcedConverterpublic java.lang.String[] getFileExtensions()
FileSourcedConvertergetFileExtensions in interface FileSourcedConverterpublic java.lang.String getFileDescription()
FileSourcedConvertergetFileDescription in interface FileSourcedConverterpublic java.lang.String getRevision()
RevisionHandlergetRevision in interface RevisionHandlerpublic java.lang.String noHeaderRowPresentTipText()
public boolean getNoHeaderRowPresent()
public void setNoHeaderRowPresent(boolean b)
b - true if there is no header row in the datapublic java.lang.String getMissingValue()
public void setMissingValue(java.lang.String value)
value - the placeholderpublic java.lang.String missingValueTipText()
public java.lang.String getStringAttributes()
public void setStringAttributes(java.lang.String value)
value - the rangepublic java.lang.String stringAttributesTipText()
public java.lang.String getNominalAttributes()
public void setNominalAttributes(java.lang.String value)
value - the rangepublic java.lang.String nominalAttributesTipText()
public java.lang.String getNumericAttributes()
public void setNumericAttributes(java.lang.String value)
value - the rangepublic java.lang.String numericAttributesTipText()
public java.lang.String getDateFormat()
public void setDateFormat(java.lang.String value)
value - the format to use.public java.lang.String dateFormatTipText()
public java.lang.String getDateAttributes()
public void setDateAttributes(java.lang.String value)
value - the rangepublic java.lang.String dateAttributesTipText()
public java.lang.String enclosureCharactersTipText()
public java.lang.String getEnclosureCharacters()
public void setEnclosureCharacters(java.lang.String enclosure)
enclosure - the characters to use as string enclosurespublic java.lang.String getFieldSeparator()
public void setFieldSeparator(java.lang.String value)
value - the character to usepublic java.lang.String fieldSeparatorTipText()
public int getBufferSize()
public void setBufferSize(int buff)
buff - the buffer size (number of rows)public java.lang.String bufferSizeTipText()
public java.lang.Object[] getNominalLabelSpecs()
public void setNominalLabelSpecs(java.lang.Object[] specs)
specs - an array of label specificationspublic java.lang.String nominalLabelSpecsTipText()
public java.util.Enumeration<Option> listOptions()
OptionHandlerlistOptions in interface OptionHandlerpublic java.lang.String[] getOptions()
OptionHandlergetOptions in interface OptionHandlerpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
OptionHandlersetOptions in interface OptionHandleroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic Instance getNextInstance(Instances structure) throws java.io.IOException
LoadergetNextInstance in interface LoadergetNextInstance in class AbstractLoaderstructure - the dataset header information, will get updated in case
of string or relational attributesjava.io.IOException - if there is an error during parsing or if getDataSet
has been called on this source (either incremental or batch
loading can be used, not both).public Instances getDataSet() throws java.io.IOException
LoadergetDataSet in interface LoadergetDataSet in class AbstractLoaderjava.io.IOException - if there is an error during parsing or if
getNextInstance has been called on this source (either
incremental or batch loading can be used, not both).
public_normal_behavior requires: model_sourceSupplied == true && (* successful parse *); modifiable: model_structureDetermined; ensures: \result != null && \result.numInstances() >= 0 && model_structureDetermined == true; also public_exceptional_behavior requires: model_sourceSupplied == false || (* unsuccessful parse *); signals: (IOException);
public void setSource(java.io.InputStream input)
throws java.io.IOException
setSource in interface LoadersetSource in class AbstractLoaderinput - the input streamjava.io.IOException - if an error occurspublic void setSource(java.io.File file)
throws java.io.IOException
setSource in interface LoadersetSource in class AbstractFileLoaderfile - the source file.java.io.IOException - if an error occurspublic Instances getStructure() throws java.io.IOException
LoadergetStructure in interface LoadergetStructure in class AbstractLoaderjava.io.IOException - if there is no source or parsing fails
public_normal_behavior requires: model_sourceSupplied == true && model_structureDetermined == false && (* successful parse *); modifiable: model_structureDetermined; ensures: \result != null && \result.numInstances() == 0 && model_structureDetermined == true; also public_exceptional_behavior requires: model_sourceSupplied == false || (* unsuccessful parse *); signals: (IOException);
protected Instance makeInstance() throws java.io.IOException
java.io.IOExceptionprotected void makeStructure()
protected void openTempFiles()
throws java.io.IOException
java.io.IOExceptionprotected void dumpRow(java.lang.String row)
throws java.io.IOException
java.io.IOExceptionpublic void reset()
throws java.io.IOException
AbstractFileLoaderreset in interface Loaderreset in class AbstractFileLoaderjava.io.IOException - if something goes wrong