org.apache.hadoop.tools.rumen.anonymization
Class WordListAnonymizerUtility

java.lang.Object
  extended by org.apache.hadoop.tools.rumen.anonymization.WordListAnonymizerUtility

public class WordListAnonymizerUtility
extends Object

Utility class to handle commonly performed tasks in a DefaultAnonymizableDataType using a WordList for anonymization. //TODO There is no caching for saving memory.


Constructor Summary
WordListAnonymizerUtility()
           
 
Method Summary
static String[] extractSuffix(String data, String[] suffixes)
          Extracts a known suffix from the given data.
static boolean hasSuffix(String data, String[] suffixes)
          Checks if the given data has a known suffix.
static boolean isKnownData(String data)
          Checks if the given data is known.
static boolean isKnownData(String data, String[] knownWords)
          Checks if the given data is known.
static boolean needsAnonymization(String data)
          Checks if the data needs anonymization.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordListAnonymizerUtility

public WordListAnonymizerUtility()
Method Detail

needsAnonymization

public static boolean needsAnonymization(String data)
Checks if the data needs anonymization. Typically, data types which are numeric in nature doesn't need anonymization.


hasSuffix

public static boolean hasSuffix(String data,
                                String[] suffixes)
Checks if the given data has a known suffix.


extractSuffix

public static String[] extractSuffix(String data,
                                     String[] suffixes)
Extracts a known suffix from the given data.

Throws:
RuntimeException - if the data doesn't have a suffix. Use hasSuffix(String, String[]) to make sure that the given data has a suffix.

isKnownData

public static boolean isKnownData(String data)
Checks if the given data is known. This API uses KNOWN_WORDS to detect if the given data is a commonly used (so called 'known') word.


isKnownData

public static boolean isKnownData(String data,
                                  String[] knownWords)
Checks if the given data is known.



Copyright © 2013 Apache Software Foundation. All Rights Reserved.