Package elki.datasource.parser
Class CategorialDataAsNumberVectorParser<V extends elki.data.NumberVector>
- java.lang.Object
-
- elki.datasource.parser.AbstractStreamingParser
-
- elki.datasource.parser.NumberVectorLabelParser<V>
-
- elki.datasource.parser.CategorialDataAsNumberVectorParser<V>
-
- Type Parameters:
V- the type of NumberVector used
- All Implemented Interfaces:
elki.datasource.bundle.BundleStreamSource,Parser,StreamingParser
@Description("This parser expects data in roughly the same format as the NumberVectorLabelParser,\nexcept that it will enumerate all unique strings to always produce numerical values.\nThis way, it can for example handle files that contain lines like \'y,n,y,y,n,y,n\'.") public class CategorialDataAsNumberVectorParser<V extends elki.data.NumberVector> extends NumberVectorLabelParser<V>A very simple parser for categorial data, which will then be encoded as numbers. This is closely modeled after the number vector parser. TODO: specify handling for numerical values.- Since:
- 0.6.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCategorialDataAsNumberVectorParser.Par<V extends elki.data.NumberVector>Parameterization class.
-
Field Summary
Fields Modifier and Type Field Description private static elki.logging.LoggingLOGLogging class.(package private) java.util.regex.MatchernanpatternPattern for NaN values.(package private) it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String>uniqueFor String unification.(package private) intustartBase for enumerating unique values.-
Fields inherited from class elki.datasource.parser.NumberVectorLabelParser
attributes, columnnames, curlbl, curvec, factory, haslabels, labels, maxdim, meta, mindim, nextevent, warnedDim, warnedPrecision
-
Fields inherited from class elki.datasource.parser.AbstractStreamingParser
reader, tokenizer
-
-
Constructor Summary
Constructors Constructor Description CategorialDataAsNumberVectorParser(elki.data.NumberVector.Factory<V> factory)Constructor with defaults.CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected elki.logging.LogginggetLogger()Get the logger for this class.elki.datasource.bundle.BundleStreamSource.EventnextEvent()protected booleanparseLineInternal()Internal method for parsing a single line.-
Methods inherited from class elki.datasource.parser.NumberVectorLabelParser
buildMeta, cleanup, createVector, data, getMeta, getTypeInformation, initStream, isLabelColumn
-
Methods inherited from class elki.datasource.parser.AbstractStreamingParser
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
-
-
-
-
Field Detail
-
LOG
private static final elki.logging.Logging LOG
Logging class.
-
unique
it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> unique
For String unification.
-
ustart
int ustart
Base for enumerating unique values.
-
nanpattern
java.util.regex.Matcher nanpattern
Pattern for NaN values.
-
-
Constructor Detail
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(elki.data.NumberVector.Factory<V> factory)
Constructor with defaults.- Parameters:
factory- Vector factory
-
CategorialDataAsNumberVectorParser
public CategorialDataAsNumberVectorParser(CSVReaderFormat format, long[] labelIndices, elki.data.NumberVector.Factory<V> factory)
Constructor.- Parameters:
format- Input formatlabelIndices- Column indexes that are numeric.factory- Vector factory
-
-
Method Detail
-
nextEvent
public elki.datasource.bundle.BundleStreamSource.Event nextEvent()
- Specified by:
nextEventin interfaceelki.datasource.bundle.BundleStreamSource- Overrides:
nextEventin classNumberVectorLabelParser<V extends elki.data.NumberVector>
-
parseLineInternal
protected boolean parseLineInternal()
Description copied from class:NumberVectorLabelParserInternal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.- Overrides:
parseLineInternalin classNumberVectorLabelParser<V extends elki.data.NumberVector>- Returns:
truewhen a valid line was read,falseon a label row.
-
getLogger
protected elki.logging.Logging getLogger()
Description copied from class:AbstractStreamingParserGet the logger for this class.- Overrides:
getLoggerin classNumberVectorLabelParser<V extends elki.data.NumberVector>- Returns:
- Logger.
-
-