public final class UnicodeCompressor extends Object
The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.
USAGE
The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:
String s = ... ; // get string from somewhere byte [] compressed = UnicodeCompressor.compress(s);
The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:
// Compress an array "chars" of length "len" using a buffer of 512 bytes // to the OutputStream "out" UnicodeCompressor myCompressor = new UnicodeCompressor(); final static int BUFSIZE = 512; byte [] byteBuffer = new byte [ BUFSIZE ]; int bytesWritten = 0; int [] unicharsRead = new int [1]; int totalCharsCompressed = 0; int totalBytesWritten = 0; do { // do the compression bytesWritten = myCompressor.compress(chars, totalCharsCompressed, len, unicharsRead, byteBuffer, 0, BUFSIZE); // do something with the current set of bytes out.write(byteBuffer, 0, bytesWritten); // update the no. of characters compressed totalCharsCompressed += unicharsRead[0]; // update the no. of bytes written totalBytesWritten += bytesWritten; } while(totalCharsCompressed < len); myCompressor.reset(); // reuse compressor
UnicodeDecompressor
Modifier and Type | Field and Description |
---|---|
static int |
ARMENIANINDEX |
static int |
COMPRESSIONOFFSET |
static int |
GREEKINDEX |
static int |
HALFWIDTHKATAKANAINDEX |
static int |
HIRAGANAINDEX |
static int |
INVALIDCHAR |
static int |
INVALIDWINDOW |
static int |
IPAEXTENSIONINDEX |
static int |
KATAKANAINDEX |
static int |
LATININDEX |
static int |
MAXINDEX |
static int |
NUMSTATICWINDOWS |
static int |
NUMWINDOWS |
static int |
RESERVEDINDEX |
static int |
SCHANGE0 |
static int |
SCHANGE1 |
static int |
SCHANGE2 |
static int |
SCHANGE3 |
static int |
SCHANGE4 |
static int |
SCHANGE5 |
static int |
SCHANGE6 |
static int |
SCHANGE7 |
static int |
SCHANGEU |
static int |
SDEFINE0 |
static int |
SDEFINE1 |
static int |
SDEFINE2 |
static int |
SDEFINE3 |
static int |
SDEFINE4 |
static int |
SDEFINE5 |
static int |
SDEFINE6 |
static int |
SDEFINE7 |
static int |
SDEFINEX |
static int |
SINGLEBYTEMODE |
static int[] |
sOffsets
Static compression window offsets
|
static int[] |
sOffsetTable
For window offset mapping
|
static int |
SQUOTE0 |
static int |
SQUOTE1 |
static int |
SQUOTE2 |
static int |
SQUOTE3 |
static int |
SQUOTE4 |
static int |
SQUOTE5 |
static int |
SQUOTE6 |
static int |
SQUOTE7 |
static int |
SQUOTEU |
static int |
SRESERVED |
static int |
UCHANGE0 |
static int |
UCHANGE1 |
static int |
UCHANGE2 |
static int |
UCHANGE3 |
static int |
UCHANGE4 |
static int |
UCHANGE5 |
static int |
UCHANGE6 |
static int |
UCHANGE7 |
static int |
UDEFINE0 |
static int |
UDEFINE1 |
static int |
UDEFINE2 |
static int |
UDEFINE3 |
static int |
UDEFINE4 |
static int |
UDEFINE5 |
static int |
UDEFINE6 |
static int |
UDEFINE7 |
static int |
UDEFINEX |
static int |
UNICODEMODE |
static int |
UQUOTEU |
static int |
URESERVED |
Constructor and Description |
---|
UnicodeCompressor()
Create a UnicodeCompressor.
|
Modifier and Type | Method and Description |
---|---|
static byte[] |
compress(char[] buffer,
int start,
int limit)
Compress a Unicode character array into a byte array.
|
int |
compress(char[] charBuffer,
int charBufferStart,
int charBufferLimit,
int[] charsRead,
byte[] byteBuffer,
int byteBufferStart,
int byteBufferLimit)
Compress a Unicode character array into a byte array.
|
static byte[] |
compress(String buffer)
Compress a string into a byte array.
|
void |
reset()
Reset the compressor to its initial state.
|
public static final int COMPRESSIONOFFSET
public static final int NUMWINDOWS
public static final int NUMSTATICWINDOWS
public static final int INVALIDWINDOW
public static final int INVALIDCHAR
public static final int SINGLEBYTEMODE
public static final int UNICODEMODE
public static final int MAXINDEX
public static final int RESERVEDINDEX
public static final int LATININDEX
public static final int IPAEXTENSIONINDEX
public static final int GREEKINDEX
public static final int ARMENIANINDEX
public static final int HIRAGANAINDEX
public static final int KATAKANAINDEX
public static final int HALFWIDTHKATAKANAINDEX
public static final int SDEFINEX
public static final int SRESERVED
public static final int SQUOTEU
public static final int SCHANGEU
public static final int SQUOTE0
public static final int SQUOTE1
public static final int SQUOTE2
public static final int SQUOTE3
public static final int SQUOTE4
public static final int SQUOTE5
public static final int SQUOTE6
public static final int SQUOTE7
public static final int SCHANGE0
public static final int SCHANGE1
public static final int SCHANGE2
public static final int SCHANGE3
public static final int SCHANGE4
public static final int SCHANGE5
public static final int SCHANGE6
public static final int SCHANGE7
public static final int SDEFINE0
public static final int SDEFINE1
public static final int SDEFINE2
public static final int SDEFINE3
public static final int SDEFINE4
public static final int SDEFINE5
public static final int SDEFINE6
public static final int SDEFINE7
public static final int UCHANGE0
public static final int UCHANGE1
public static final int UCHANGE2
public static final int UCHANGE3
public static final int UCHANGE4
public static final int UCHANGE5
public static final int UCHANGE6
public static final int UCHANGE7
public static final int UDEFINE0
public static final int UDEFINE1
public static final int UDEFINE2
public static final int UDEFINE3
public static final int UDEFINE4
public static final int UDEFINE5
public static final int UDEFINE6
public static final int UDEFINE7
public static final int UQUOTEU
public static final int UDEFINEX
public static final int URESERVED
public static final int[] sOffsetTable
public static final int[] sOffsets
public UnicodeCompressor()
reset()
public static byte[] compress(String buffer)
buffer
- The string to compress.compress(char [], int, int)
public static byte[] compress(char[] buffer, int start, int limit)
buffer
- The character buffer to compress.start
- The start of the character run to compress.limit
- The limit of the character run to compress.compress(String)
public int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit)
charBuffer
- The character buffer to compress.charBufferStart
- The start of the character run to compress.charBufferLimit
- The limit of the character run to compress.charsRead
- A one-element array. If not null, on return
the number of characters read from charBuffer.byteBuffer
- A buffer to receive the compressed data. This
buffer must be at minimum four bytes in size.byteBufferStart
- The starting offset to which to write
compressed data.byteBufferLimit
- The limiting offset for writing compressed data.public void reset()
Copyright © 2016 Unicode, Inc. and others.