org.apache.hadoop.tools.rumen.anonymization
Class WordListAnonymizerUtility

java.lang.Object
  extended by org.apache.hadoop.tools.rumen.anonymization.WordListAnonymizerUtility

public class WordListAnonymizerUtility
extends Object

Utility class to handle commonly performed tasks in a DefaultAnonymizableDataType using a WordList for anonymization. //TODO There is no caching for saving memory.


Field Summary
static String[] KNOWN_WORDS
           
 
Constructor Summary
WordListAnonymizerUtility()
           
 
Method Summary
static String[] extractSuffix(String data, String[] suffixes)
          Extracts a known suffix from the given data.
static boolean hasSuffix(String data, String[] suffixes)
          Checks if the given data has a known suffix.
static boolean isKnownData(String data)
          Checks if the given data is known.
static boolean isKnownData(String data, String[] knownWords)
          Checks if the given data is known.
static boolean needsAnonymization(String data)
          Checks if the data needs anonymization.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

KNOWN_WORDS

public static final String[] KNOWN_WORDS
Constructor Detail

WordListAnonymizerUtility

public WordListAnonymizerUtility()
Method Detail

needsAnonymization

public static boolean needsAnonymization(String data)
Checks if the data needs anonymization. Typically, data types which are numeric in nature doesn't need anonymization.


hasSuffix

public static boolean hasSuffix(String data,
                                String[] suffixes)
Checks if the given data has a known suffix.


extractSuffix

public static String[] extractSuffix(String data,
                                     String[] suffixes)
Extracts a known suffix from the given data.

Throws:
RuntimeException - if the data doesn't have a suffix. Use hasSuffix(String, String[]) to make sure that the given data has a suffix.

isKnownData

public static boolean isKnownData(String data)
Checks if the given data is known. This API uses KNOWN_WORDS to detect if the given data is a commonly used (so called 'known') word.


isKnownData

public static boolean isKnownData(String data,
                                  String[] knownWords)
Checks if the given data is known.



Copyright © 2012 Apache Software Foundation. All Rights Reserved.