Class UnicodeCompressor

java.lang.Object
org.graalvm.shadowed.com.ibm.icu.text.UnicodeCompressor

public final class UnicodeCompressor extends Object
A compression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.

USAGE

The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:

 String s = ... ; // get string from somewhere
 byte [] compressed = UnicodeCompressor.compress(s);

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:

 // Compress an array "chars" of length "len" using a buffer of 512 bytes
 // to the OutputStream "out"

 UnicodeCompressor myCompressor         = new UnicodeCompressor();
 final static int  BUFSIZE              = 512;
 byte []           byteBuffer           = new byte [ BUFSIZE ];
 int               bytesWritten         = 0;
 int []            unicharsRead         = new int [1];
 int               totalCharsCompressed = 0;
 int               totalBytesWritten    = 0;

 do {
   // do the compression
   bytesWritten = myCompressor.compress(chars, totalCharsCompressed,
                                        len, unicharsRead,
                                        byteBuffer, 0, BUFSIZE);

   // do something with the current set of bytes
   out.write(byteBuffer, 0, bytesWritten);

   // update the no. of characters compressed
   totalCharsCompressed += unicharsRead[0];

   // update the no. of bytes written
   totalBytesWritten += bytesWritten;

 } while(totalCharsCompressed < len);

 myCompressor.reset(); // reuse compressor
See Also: