Generate the *almost* official terasort input data set. (See below) The user specifies the number of rows and the output directory and this class runs a
map/reduce program to generate the data. The format of the data is:
- (10 bytes key) (10 bytes rowid) (78 bytes filler) \r \n
- The keys are random characters from the set ' ' .. '~'.
- The rowid is the right justified row id as a int.
- The filler consists of 7 runs of 10 characters from 'A' to 'Z'.
This TeraSort is slightly modified to allow for variable length key sizes and value sizes. The row length isn't variable. To generate a terabyte of data in
the same way TeraSort does use 10000000000 rows and 10/10 byte key length and 78/78 byte value length. Along with the 10 byte row id and \r\n this gives you
100 byte row * 10000000000 rows = 1tb. Min/Max ranges for key and value parameters are inclusive/inclusive respectively.