Package elki.datasource.parser
Class NumberVectorLabelParser<V extends elki.data.NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- Type Parameters:
V- the type of NumberVector used
- All Implemented Interfaces:
elki.datasource.bundle.BundleStreamSource,Parser,StreamingParser
- Direct Known Subclasses:
BitVectorLabelParser,CategorialDataAsNumberVectorParser,SparseNumberVectorLabelParser,TermFrequencyParser
public class NumberVectorLabelParser<V extends elki.data.NumberVector> extends AbstractStreamingParser
Parser for a simple CSV type of format, with columns separated by the given pattern (default: whitespace).Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
- Since:
- 0.1
- Author:
- Arthur Zimek, Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classNumberVectorLabelParser.Par<V extends elki.data.NumberVector>Parameterization class.
-
Field Summary
Fields Modifier and Type Field Description protected elki.utilities.datastructures.arraylike.DoubleArrayattributesDouble array storing the numerical attributes during parsing.protected java.util.List<java.lang.String>columnnamesColumn names.protected elki.data.LabelListcurlblCurrent labels.protected VcurvecCurrent vector.protected elki.data.NumberVector.Factory<V>factoryVector factory class.protected booleanhaslabelsWhether or not the data set has labels.private long[]labelIndicesKeeps the indices of the attributes to be treated as a string label.(package private) java.util.ArrayList<java.lang.String>labels(Reused) store for labels.private static elki.logging.LoggingLOGLogging class.protected intmaxdimDimensionality reported.protected elki.datasource.bundle.BundleMetametaMetadata.protected intmindimDimensionality reported.(package private) elki.datasource.bundle.BundleStreamSource.EventnexteventEvent to report next.(package private) it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String>uniqueFor String unification.(package private) booleanwarnedDimEmit a dimensionality change warning once.(package private) booleanwarnedPrecisionEmit a double-precision limit warning once.-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description NumberVectorLabelParser(elki.data.NumberVector.Factory<V> factory)Constructor with defaults.NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)Constructor.NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidbuildMeta()Update the meta element.voidcleanup()Perform cleanup operations after parsing.protected VcreateVector()Creates a database object of type V.java.lang.Objectdata(int rnum)protected elki.logging.LogginggetLogger()Get the logger for this class.elki.datasource.bundle.BundleMetagetMeta()(package private) elki.data.type.SimpleTypeInformation<V>getTypeInformation(int mindim, int maxdim)Get a prototype object for the given dimensionality.voidinitStream(java.io.InputStream in)Init the streaming parser for the given input stream.protected booleanisLabelColumn(int col)Test if the current column is marked as label column.elki.datasource.bundle.BundleStreamSource.EventnextEvent()protected booleanparseLineInternal()Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final elki.logging.Logging LOG
Logging class.
-
labelIndices
private long[] labelIndices
Keeps the indices of the attributes to be treated as a string label.
-
factory
protected elki.data.NumberVector.Factory<V extends elki.data.NumberVector> factory
Vector factory class.
-
mindim
protected int mindim
Dimensionality reported.
-
maxdim
protected int maxdim
Dimensionality reported.
-
meta
protected elki.datasource.bundle.BundleMeta meta
Metadata.
-
columnnames
protected java.util.List<java.lang.String> columnnames
Column names.
-
haslabels
protected boolean haslabels
Whether or not the data set has labels.
-
curvec
protected V extends elki.data.NumberVector curvec
Current vector.
-
curlbl
protected elki.data.LabelList curlbl
Current labels.
-
attributes
protected elki.utilities.datastructures.arraylike.DoubleArray attributes
Double array storing the numerical attributes during parsing.
-
labels
final java.util.ArrayList<java.lang.String> labels
(Reused) store for labels.
-
unique
it.unimi.dsi.fastutil.objects.ObjectOpenHashSet<java.lang.String> unique
For String unification.
-
nextevent
elki.datasource.bundle.BundleStreamSource.Event nextevent
Event to report next.
-
warnedPrecision
boolean warnedPrecision
Emit a double-precision limit warning once.
-
warnedDim
boolean warnedDim
Emit a dimensionality change warning once.
-
-
Constructor Detail
-
NumberVectorLabelParser
public NumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)
Constructor.- Parameters:
format- Input formatlabelIndices- Column indexes that are not numeric.factory- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(elki.data.NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory- Vector factory
-
NumberVectorLabelParser
public NumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)Constructor.- Parameters:
colSep- Column separatorquoteChars- Quote charactercomment- Comment patternlabelIndices- Column indexes that are not numeric.factory- Vector factory
-
-
Method Detail
-
isLabelColumn
protected boolean isLabelColumn(int col)
Test if the current column is marked as label column.- Parameters:
col- Column number- Returns:
truewhen a label column.
-
initStream
public void initStream(java.io.InputStream in)
Description copied from interface:StreamingParserInit the streaming parser for the given input stream.- Specified by:
initStreamin interfaceStreamingParser- Overrides:
initStreamin classAbstractStreamingParser- Parameters:
in- the stream to parse objects from
-
getMeta
public elki.datasource.bundle.BundleMeta getMeta()
-
nextEvent
public elki.datasource.bundle.BundleStreamSource.Event nextEvent()
-
cleanup
public void cleanup()
Description copied from interface:ParserPerform cleanup operations after parsing.- Specified by:
cleanupin interfaceParser- Overrides:
cleanupin classAbstractStreamingParser
-
buildMeta
protected void buildMeta()
Update the meta element.
-
data
public java.lang.Object data(int rnum)
-
parseLineInternal
protected boolean parseLineInternal()
Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Returns:
truewhen a valid line was read,falseon a label row.
-
createVector
protected V createVector()
Creates a database object of type V.- Returns:
- a vector of type V containing the given attribute values
-
getTypeInformation
elki.data.type.SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
Get a prototype object for the given dimensionality.- Parameters:
mindim- Minimum dimensionalitymaxdim- Maximum dimensionality- Returns:
- Prototype object
-
getLogger
protected elki.logging.Logging getLogger()
Description copied from class:AbstractStreamingParserGet the logger for this class.- Specified by:
getLoggerin classAbstractStreamingParser- Returns:
- Logger.
-
-