Class CategorialDataAsNumberVectorParser<V extends elki.data.NumberVector>

  • Type Parameters:
    V - the type of NumberVector used
    All Implemented Interfaces:
    elki.datasource.bundle.BundleStreamSource, Parser, StreamingParser

    @Description("This parser expects data in roughly the same format as the NumberVectorLabelParser,\nexcept that it will enumerate all unique strings to always produce numerical values.\nThis way, it can for example handle files that contain lines like \'y,n,y,y,n,y,n\'.")
    public class CategorialDataAsNumberVectorParser<V extends elki.data.NumberVector>
    extends NumberVectorLabelParser<V>
    A very simple parser for categorial data, which will then be encoded as numbers. This is closely modeled after the number vector parser. TODO: specify handling for numerical values.
    Since:
    0.6.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Logging class.
      • unique

        it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> unique
        For String unification.
      • ustart

        int ustart
        Base for enumerating unique values.
      • nanpattern

        java.util.regex.Matcher nanpattern
        Pattern for NaN values.
    • Constructor Detail

      • CategorialDataAsNumberVectorParser

        public CategorialDataAsNumberVectorParser​(elki.data.NumberVector.Factory<V> factory)
        Constructor with defaults.
        Parameters:
        factory - Vector factory
      • CategorialDataAsNumberVectorParser

        public CategorialDataAsNumberVectorParser​(CSVReaderFormat format,
                                                  long[] labelIndices,
                                                  elki.data.NumberVector.Factory<V> factory)
        Constructor.
        Parameters:
        format - Input format
        labelIndices - Column indexes that are numeric.
        factory - Vector factory
    • Method Detail

      • nextEvent

        public elki.datasource.bundle.BundleStreamSource.Event nextEvent()
        Specified by:
        nextEvent in interface elki.datasource.bundle.BundleStreamSource
        Overrides:
        nextEvent in class NumberVectorLabelParser<V extends elki.data.NumberVector>
      • parseLineInternal

        protected boolean parseLineInternal()
        Description copied from class: NumberVectorLabelParser
        Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.
        Overrides:
        parseLineInternal in class NumberVectorLabelParser<V extends elki.data.NumberVector>
        Returns:
        true when a valid line was read, false on a label row.