public abstract class CharsetICU extends Charset
A subclass of java.nio.Charset for providing implementation of ICU's charset converters.
This API is used to convert codepage or character encoded data to and
from UTF-16. You can open a converter with Charset.forName(java.lang.String)
and forNameICU(java.lang.String)
. With that
converter, you can get its properties, set options, convert your data.
Since many software programs recognize different converter names for different types of converters, there are other functions in this API to iterate over the converter aliases.
Note that Charset.name()
cannot always return a unique charset name.
Charset
documents that,
for charsets listed in the IANA Charset Registry,
the Charset.name()
must be listed there,
and it “must be the MIME-preferred name” if there are multiple names.
However, there are different implementations of many if not most charsets, ICU provides multiple variants for some of them, ICU provides variants of some java.nio-system-supported charsets, and ICU users are free to add more variants. This is so that applications can be compatible with multiple implementations at the same time.
This is in conflict with the Charset.name()
requirements.
It is not possible to offer variants of an IANA charset and
always use the MIME-preferred name and also have those names be unique.
Charset.name()
returns the MIME-preferred name, or IANA name,
so that it can always be used for the charset field in internet protocols.
Same-name charsets are accessible via Charset.forName(java.lang.String)
or forNameICU(java.lang.String)
by using unique aliases (e.g., the ICU-canonical names).
Charset
also documents that
“Two charsets are equal if, and only if, they have the same canonical names.”
This is not possible.
Unfortunately, Charset.equals(java.lang.Object)
is final, and
Charset.availableCharsets()
returns
“a sorted map from canonical charset names to charset objects”.
Since Charset.name()
cannot be unique,
Charset.equals(java.lang.Object)
cannot work properly in such cases, and
Charset.availableCharsets()
can only include one variant for a name.
Modifier and Type | Field and Description |
---|---|
static int |
ROUNDTRIP_AND_FALLBACK_SET
Deprecated.
This API is ICU internal only.
|
static int |
ROUNDTRIP_SET
Parameter that select the set of roundtrippable Unicode code points.
|
Modifier | Constructor and Description |
---|---|
protected |
CharsetICU(String icuCanonicalName,
String canonicalName,
String[] aliases) |
Modifier and Type | Method and Description |
---|---|
boolean |
contains(Charset cs)
Ascertains if a charset is a sub set of this charset
Implements the abstract method of super class.
|
static Charset |
forNameICU(String charsetName)
Returns a charset object for the named charset.
|
void |
getUnicodeSet(UnicodeSet setFillIn,
int which)
Returns the set of Unicode code points that can be converted by an ICU Converter.
|
boolean |
isFixedWidth()
Returns whether or not the charset of the converter has a fixed number of bytes
per charset character.
|
aliases, availableCharsets, canEncode, compareTo, decode, defaultCharset, displayName, displayName, encode, encode, equals, forName, hashCode, isRegistered, isSupported, name, newDecoder, newEncoder, toString
public static final int ROUNDTRIP_SET
@Deprecated public static final int ROUNDTRIP_AND_FALLBACK_SET
public boolean contains(Charset cs)
public static Charset forNameICU(String charsetName) throws IllegalCharsetNameException, UnsupportedCharsetException
charsetName
- The name of the requested charset,
may be either a canonical name or an aliasIllegalCharsetNameException
- If the given charset name
is illegalUnsupportedCharsetException
- If no support for the
named charset is available in this instance of th Java
virtual machinepublic void getUnicodeSet(UnicodeSet setFillIn, int which)
The current implementation returns only one kind of set (UCNV_ROUNDTRIP_SET): The set of all Unicode code points that can be roundtrip-converted (converted without any data loss) with the converter This set will not include code points that have fallback mappings or are only the result of reverse fallback mappings. See UTR #22 "Character Mapping Markup Language" at http://www.unicode.org/reports/tr22/
In the future, there may be more UConverterUnicodeSet choices to select sets with different properties.
This is useful for example for
setFillIn
- A valid UnicodeSet. It will be cleared by this function before
the converter's specific set is filled in.which
- A selector; currently ROUNDTRIP_SET is the only supported value.IllegalArgumentException
- if the parameters does not match.public boolean isFixedWidth()
Copyright © 2016 Unicode, Inc. and others.