Class SimpleTransactionParser

  • All Implemented Interfaces:
    elki.datasource.bundle.BundleStreamSource, Parser, StreamingParser

    public class SimpleTransactionParser
    extends AbstractStreamingParser
    Simple parser for transactional data, such as market baskets.

    To keep the input format simple and readable, all tokens are assumed to be of text and separated by whitespace, and each transaction is on a separate line.

    An example file containing two transactions looks like this

     bread butter milk
     paste tomato basil
     
    TODO: add a parameter to, e.g., use the first or last entry as labels instead of tokens.
    Since:
    0.7.0
    Author:
    Erich Schubert
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  SimpleTransactionParser.Par
      Parameterization class.
      • Nested classes/interfaces inherited from interface elki.datasource.bundle.BundleStreamSource

        elki.datasource.bundle.BundleStreamSource.Event
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) it.unimi.dsi.fastutil.longs.LongArrayList buf
      Buffer, will be reused.
      (package private) elki.data.BitVector curvec
      Current vector.
      (package private) it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> keymap
      Map.
      private static elki.logging.Logging LOG
      Class logger.
      protected elki.datasource.bundle.BundleMeta meta
      Metadata.
      (package private) elki.datasource.bundle.BundleStreamSource.Event nextevent
      Event to report next.
      (package private) int numterms
      Number of different terms observed.
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • numterms

        int numterms
        Number of different terms observed.
      • keymap

        it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap<java.lang.String> keymap
        Map.
      • meta

        protected elki.datasource.bundle.BundleMeta meta
        Metadata.
      • nextevent

        elki.datasource.bundle.BundleStreamSource.Event nextevent
        Event to report next.
      • curvec

        elki.data.BitVector curvec
        Current vector.
      • buf

        it.unimi.dsi.fastutil.longs.LongArrayList buf
        Buffer, will be reused.
    • Constructor Detail

      • SimpleTransactionParser

        public SimpleTransactionParser​(CSVReaderFormat format)
        Constructor.
        Parameters:
        format - Input format
    • Method Detail

      • nextEvent

        public elki.datasource.bundle.BundleStreamSource.Event nextEvent()
      • data

        public java.lang.Object data​(int rnum)
      • getMeta

        public elki.datasource.bundle.BundleMeta getMeta()