Class InverseDocumentFrequencyNormalization<V extends elki.data.SparseNumberVector>

  • Type Parameters:
    V - Vector type
    All Implemented Interfaces:
    Normalization<V>, elki.datasource.filter.ObjectFilter

    public class InverseDocumentFrequencyNormalization<V extends elki.data.SparseNumberVector>
    extends AbstractVectorConversionFilter<V,​V>
    implements Normalization<V>
    Normalization for text frequency (TF) vectors, using the inverse document frequency (IDF). See also: TF-IDF for text analysis.
    Since:
    0.4.0
    Author:
    Erich Schubert
    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap idf
      The IDF storage.
      private static elki.logging.Logging LOG
      Class logger.
      (package private) int objcnt
      The number of objects in the dataset.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected elki.data.type.SimpleTypeInformation<? super V> convertedType​(elki.data.type.SimpleTypeInformation<V> in)
      Get the output type from the input type after conversion.
      protected V filterSingleObject​(V featureVector)
      Normalize a single instance.
      protected elki.data.type.SimpleTypeInformation<? super V> getInputTypeRestriction()
      Get the input type restriction used for negotiating the data query.
      protected elki.logging.Logging getLogger()
      Class logger.
      protected void prepareComplete()
      Complete the initialization phase.
      protected void prepareProcessInstance​(V featureVector)
      Process a single object during initialization.
      protected boolean prepareStart​(elki.data.type.SimpleTypeInformation<V> in)
      Return "true" when the normalization needs initialization (two-pass filtering!).
      V restore​(V featureVector)
      Transforms a feature vector to the original attribute ranges.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface elki.datasource.filter.ObjectFilter

        filter
    • Field Detail

      • LOG

        private static final elki.logging.Logging LOG
        Class logger.
      • idf

        it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap idf
        The IDF storage.
      • objcnt

        int objcnt
        The number of objects in the dataset.
    • Constructor Detail

      • InverseDocumentFrequencyNormalization

        public InverseDocumentFrequencyNormalization()
        Constructor.
    • Method Detail

      • prepareStart

        protected boolean prepareStart​(elki.data.type.SimpleTypeInformation<V> in)
        Description copied from class: AbstractConversionFilter
        Return "true" when the normalization needs initialization (two-pass filtering!).
        Overrides:
        prepareStart in class AbstractConversionFilter<V extends elki.data.SparseNumberVector,​V extends elki.data.SparseNumberVector>
        Parameters:
        in - Input type information
        Returns:
        true or false
      • prepareProcessInstance

        protected void prepareProcessInstance​(V featureVector)
        Description copied from class: AbstractConversionFilter
        Process a single object during initialization.
        Overrides:
        prepareProcessInstance in class AbstractConversionFilter<V extends elki.data.SparseNumberVector,​V extends elki.data.SparseNumberVector>
        Parameters:
        featureVector - Object to process
      • filterSingleObject

        protected V filterSingleObject​(V featureVector)
        Description copied from class: AbstractConversionFilter
        Normalize a single instance. You can implement this as UnsupportedOperationException if you override both public "normalize" functions!
        Specified by:
        filterSingleObject in class AbstractConversionFilter<V extends elki.data.SparseNumberVector,​V extends elki.data.SparseNumberVector>
        Parameters:
        featureVector - Database object to normalize
        Returns:
        Normalized database object
      • restore

        public V restore​(V featureVector)
        Description copied from interface: Normalization
        Transforms a feature vector to the original attribute ranges.
        Specified by:
        restore in interface Normalization<V extends elki.data.SparseNumberVector>
        Parameters:
        featureVector - a feature vector to be transformed into original space
        Returns:
        a feature vector transformed into original space corresponding to the given feature vector
      • convertedType

        protected elki.data.type.SimpleTypeInformation<? super V> convertedType​(elki.data.type.SimpleTypeInformation<V> in)
        Description copied from class: AbstractConversionFilter
        Get the output type from the input type after conversion.
        Specified by:
        convertedType in class AbstractConversionFilter<V extends elki.data.SparseNumberVector,​V extends elki.data.SparseNumberVector>
        Parameters:
        in - input type restriction
        Returns:
        output type restriction
      • getInputTypeRestriction

        protected elki.data.type.SimpleTypeInformation<? super V> getInputTypeRestriction()
        Description copied from class: AbstractConversionFilter
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in class AbstractConversionFilter<V extends elki.data.SparseNumberVector,​V extends elki.data.SparseNumberVector>
        Returns:
        Type restriction