Class LibSVMFormatParser<V extends elki.data.SparseNumberVector>

  • Type Parameters:
    V - Vector type
    All Implemented Interfaces:
    elki.datasource.bundle.BundleStreamSource, Parser, StreamingParser

    @Title("libSVM Format Parser")
    public class LibSVMFormatParser<V extends elki.data.SparseNumberVector>
    extends SparseNumberVectorLabelParser<V>
    Parser to read libSVM format files.

    The format of libSVM is roughly specified in the README given:

     <label> <index1>:<value1> <index2>:<value2> ...
     
    i.e. a mandatory integer class label in the beginning followed by a classic sparse vector representation of the data. indexes are integers, starting at 1 (Note that ELKI uses 0-based indexing, so we will map these to index-1) to not always have a constant-0 dimension 0.

    The libSVM FAQ states that you can also put comments into the file, separated by a hash: #, but they must not contain colons and are not officially supported.
    ELKI will simply stop parsing a line when encountering a #.

    Since:
    0.7.0
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • WHITESPACE_PATTERN

        public static final java.util.regex.Pattern WHITESPACE_PATTERN
        LibSVM uses whitespace and colons for separation.
      • COMMENT_PATTERN

        public static final java.util.regex.Pattern COMMENT_PATTERN
        Comment pattern.
    • Constructor Detail

      • LibSVMFormatParser

        public LibSVMFormatParser​(elki.data.SparseNumberVector.Factory<V> factory)
        Constructor.
        Parameters:
        factory - Vector factory
    • Method Detail

      • parseLineInternal

        protected boolean parseLineInternal()
        Description copied from class: NumberVectorLabelParser
        Internal method for parsing a single line. Used by both line based parsing as well as block parsing. This saves the building of meta data for each line.
        Overrides:
        parseLineInternal in class SparseNumberVectorLabelParser<V extends elki.data.SparseNumberVector>
        Returns:
        true when a valid line was read, false on a label row.