public final class UCharacter extends Object implements UCharacterEnums.ECharacterCategory, UCharacterEnums.ECharacterDirection
java.lang.Character
. Methods, fields, and other functionality specific to ICU are labeled '[icu]'.
The UCharacter class provides extensions to the Character
class.
These extensions provide support for more Unicode properties.
Each ICU release supports the latest version of Unicode available at that time.
For some time before Java 5 added support for supplementary Unicode code points,
The ICU UCharacter class and many other ICU classes already supported them.
Some UCharacter methods and constants were widened slightly differently than
how the Character class methods and constants were widened later.
In particular, Character.MAX_VALUE
is still a char with the value U+FFFF,
while the MAX_VALUE
is an int with the value U+10FFFF.
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode properties, the main differences between UCharacter and Character are:
Further detail on differences can be determined using the program com.ibm.icu.dev.test.lang.UCharacterCompare
In addition to Java compatibility functions, which calculate derived properties, this API provides low-level access to the Unicode Character Database.
Unicode assigns each code point (not just assigned character) values for many properties. Most of them are simple boolean flags, or constants from a small enumerated list. For some properties, values are strings or other relatively more complex types.
For more information see "About the Unicode Character Database" (http://www.unicode.org/ucd/) and the ICU User Guide chapter on Properties (https://unicode-org.github.io/icu/userguide/strings/properties).
There are also functions that provide easy migration from C/POSIX functions like isblank(). Their use is generally discouraged because the C/POSIX standards do not define their semantics beyond the ASCII range, which means that different implementations exhibit very different behavior. Instead, Unicode properties should be used directly.
There are also only a few, broad C/POSIX character classes, and they tend to be used for conflicting purposes. For example, the "isalpha()" class is sometimes used to determine word boundaries, while a more sophisticated approach would at least distinguish initial letters from continuation characters (the latter including combining marks). (In ICU, BreakIterator is the most sophisticated API for word boundaries.) Another example: There is no "istitle()" class for titlecase characters.
ICU 3.4 and later provides API access for all twelve C/POSIX character classes. ICU implements them according to the Standard Recommendations in Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
API access for C/POSIX character classes is as follows:
- alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct: ((1<<getType(c)) & ((1<<DASH_PUNCTUATION)|(1<<START_PUNCTUATION)|
(1<<END_PUNCTUATION)|(1<<CONNECTOR_PUNCTUATION)|(1<<OTHER_PUNCTUATION)|
(1<<INITIAL_PUNCTUATION)|(1<<FINAL_PUNCTUATION)))!=0
- digit: isDigit(c) or getType(c)==DECIMAL_DIGIT_NUMBER
- xdigit: hasBinaryProperty(c, UProperty.POSIX_XDIGIT)
- alnum: hasBinaryProperty(c, UProperty.POSIX_ALNUM)
- space: isUWhiteSpace(c) or hasBinaryProperty(c, UProperty.WHITE_SPACE)
- blank: hasBinaryProperty(c, UProperty.POSIX_BLANK)
- cntrl: getType(c)==CONTROL
- graph: hasBinaryProperty(c, UProperty.POSIX_GRAPH)
- print: hasBinaryProperty(c, UProperty.POSIX_PRINT)
The C/POSIX character classes are also available in UnicodeSet patterns, using patterns like [:graph:] or \p{graph}.
[icu] Note: There are several ICU (and Java) whitespace functions. Comparison:
This class is not subclassable.
UCharacterEnums
Modifier and Type | Class and Description |
---|---|
static interface |
UCharacter.BidiPairedBracketType
Bidi Paired Bracket Type constants.
|
static interface |
UCharacter.DecompositionType
Decomposition Type constants.
|
static interface |
UCharacter.EastAsianWidth
East Asian Width constants.
|
static interface |
UCharacter.GraphemeClusterBreak
Grapheme Cluster Break constants.
|
static interface |
UCharacter.HangulSyllableType
Hangul Syllable Type constants.
|
static class |
UCharacter.IdentifierStatus
Identifier Status constants.
|
static class |
UCharacter.IdentifierType
Identifier Type constants.
|
static interface |
UCharacter.IndicPositionalCategory
Indic Positional Category constants.
|
static interface |
UCharacter.IndicSyllabicCategory
Indic Syllabic Category constants.
|
static interface |
UCharacter.JoiningGroup
Joining Group constants.
|
static interface |
UCharacter.JoiningType
Joining Type constants.
|
static interface |
UCharacter.LineBreak
Line Break constants.
|
static interface |
UCharacter.NumericType
Numeric Type constants.
|
static interface |
UCharacter.SentenceBreak
Sentence Break constants.
|
static class |
UCharacter.UnicodeBlock
[icu enhancement] ICU's replacement for
java.lang.Character.UnicodeBlock . |
static interface |
UCharacter.VerticalOrientation
Vertical Orientation constants.
|
static interface |
UCharacter.WordBreak
Word Break constants.
|
Modifier and Type | Field and Description |
---|---|
static int |
FOLD_CASE_DEFAULT
[icu] Option value for case folding: use default mappings defined in
CaseFolding.txt.
|
static int |
FOLD_CASE_EXCLUDE_SPECIAL_I
[icu] Option value for case folding:
Use the modified set of mappings provided in CaseFolding.txt to handle dotted I
and dotless i appropriately for Turkic languages (tr, az).
|
static int |
MAX_CODE_POINT
Constant U+10FFFF, same as
Character.MAX_CODE_POINT . |
static char |
MAX_HIGH_SURROGATE
Constant U+DBFF, same as
Character.MAX_HIGH_SURROGATE . |
static char |
MAX_LOW_SURROGATE
Constant U+DFFF, same as
Character.MAX_LOW_SURROGATE . |
static int |
MAX_RADIX
Compatibility constant for Java Character's MAX_RADIX.
|
static char |
MAX_SURROGATE
Constant U+DFFF, same as
Character.MAX_SURROGATE . |
static int |
MAX_VALUE
The highest Unicode code point value (scalar value), constant U+10FFFF (uses 21 bits).
|
static int |
MIN_CODE_POINT
Constant U+0000, same as
Character.MIN_CODE_POINT . |
static char |
MIN_HIGH_SURROGATE
Constant U+D800, same as
Character.MIN_HIGH_SURROGATE . |
static char |
MIN_LOW_SURROGATE
Constant U+DC00, same as
Character.MIN_LOW_SURROGATE . |
static int |
MIN_RADIX
Compatibility constant for Java Character's MIN_RADIX.
|
static int |
MIN_SUPPLEMENTARY_CODE_POINT
Constant U+10000, same as
Character.MIN_SUPPLEMENTARY_CODE_POINT . |
static char |
MIN_SURROGATE
Constant U+D800, same as
Character.MIN_SURROGATE . |
static int |
MIN_VALUE
The lowest Unicode code point value, constant 0.
|
static double |
NO_NUMERIC_VALUE
Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point.
|
static int |
REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there
is no existing character.
|
static int |
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points, constant U+10000.
|
static int |
TITLECASE_NO_BREAK_ADJUSTMENT
Do not adjust the titlecasing indexes from BreakIterator::next() indexes;
titlecase exactly the characters at breaks from the iterator.
|
static int |
TITLECASE_NO_LOWERCASE
Do not lowercase non-initial parts of words when titlecasing.
|
CHAR_CATEGORY_COUNT, COMBINING_SPACING_MARK, CONNECTOR_PUNCTUATION, CONTROL, CURRENCY_SYMBOL, DASH_PUNCTUATION, DECIMAL_DIGIT_NUMBER, ENCLOSING_MARK, END_PUNCTUATION, FINAL_PUNCTUATION, FINAL_QUOTE_PUNCTUATION, FORMAT, GENERAL_OTHER_TYPES, INITIAL_PUNCTUATION, INITIAL_QUOTE_PUNCTUATION, LETTER_NUMBER, LINE_SEPARATOR, LOWERCASE_LETTER, MATH_SYMBOL, MODIFIER_LETTER, MODIFIER_SYMBOL, NON_SPACING_MARK, OTHER_LETTER, OTHER_NUMBER, OTHER_PUNCTUATION, OTHER_SYMBOL, PARAGRAPH_SEPARATOR, PRIVATE_USE, SPACE_SEPARATOR, START_PUNCTUATION, SURROGATE, TITLECASE_LETTER, UNASSIGNED, UPPERCASE_LETTER
ARABIC_NUMBER, BLOCK_SEPARATOR, BOUNDARY_NEUTRAL, CHAR_DIRECTION_COUNT, COMMON_NUMBER_SEPARATOR, DIR_NON_SPACING_MARK, DIRECTIONALITY_ARABIC_NUMBER, DIRECTIONALITY_BOUNDARY_NEUTRAL, DIRECTIONALITY_COMMON_NUMBER_SEPARATOR, DIRECTIONALITY_EUROPEAN_NUMBER, DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR, DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR, DIRECTIONALITY_LEFT_TO_RIGHT, DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING, DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE, DIRECTIONALITY_NONSPACING_MARK, DIRECTIONALITY_OTHER_NEUTRALS, DIRECTIONALITY_PARAGRAPH_SEPARATOR, DIRECTIONALITY_POP_DIRECTIONAL_FORMAT, DIRECTIONALITY_RIGHT_TO_LEFT, DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC, DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING, DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE, DIRECTIONALITY_SEGMENT_SEPARATOR, DIRECTIONALITY_UNDEFINED, DIRECTIONALITY_WHITESPACE, EUROPEAN_NUMBER, EUROPEAN_NUMBER_SEPARATOR, EUROPEAN_NUMBER_TERMINATOR, FIRST_STRONG_ISOLATE, LEFT_TO_RIGHT, LEFT_TO_RIGHT_EMBEDDING, LEFT_TO_RIGHT_ISOLATE, LEFT_TO_RIGHT_OVERRIDE, OTHER_NEUTRAL, POP_DIRECTIONAL_FORMAT, POP_DIRECTIONAL_ISOLATE, RIGHT_TO_LEFT, RIGHT_TO_LEFT_ARABIC, RIGHT_TO_LEFT_EMBEDDING, RIGHT_TO_LEFT_ISOLATE, RIGHT_TO_LEFT_OVERRIDE, SEGMENT_SEPARATOR, WHITE_SPACE_NEUTRAL
Modifier and Type | Method and Description |
---|---|
static int |
charCount(int cp)
Same as
Character.charCount(int) . |
static int |
codePointAt(char[] text,
int index)
Same as
Character.codePointAt(char[], int) . |
static int |
codePointAt(char[] text,
int index,
int limit)
|
static int |
codePointAt(CharSequence seq,
int index)
|
static int |
codePointBefore(char[] text,
int index)
|
static int |
codePointBefore(char[] text,
int index,
int limit)
|
static int |
codePointBefore(CharSequence seq,
int index)
|
static int |
codePointCount(char[] text,
int start,
int limit)
Equivalent to the
Character.codePointCount(char[], int, int) method, for
convenience. |
static int |
codePointCount(CharSequence text,
int start,
int limit)
Equivalent to the
Character.codePointCount(CharSequence, int, int)
method, for convenience. |
static int |
digit(int ch)
Returnss the numeric value of a decimal digit code point.
|
static int |
digit(int ch,
int radix)
Returnss the numeric value of a decimal digit code point.
|
static int |
foldCase(int ch,
boolean defaultmapping)
[icu] The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
|
static int |
foldCase(int ch,
int options)
[icu] The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
|
static String |
foldCase(String str,
boolean defaultmapping)
[icu] The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
|
static String |
foldCase(String str,
int options)
[icu] The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
|
static char |
forDigit(int digit,
int radix)
Provide the java.lang.Character forDigit API, for convenience.
|
static VersionInfo |
getAge(int ch)
[icu] Returns the "age" of the code point.
|
static int |
getBidiPairedBracket(int c)
[icu] Maps the specified character to its paired bracket character.
|
static int |
getCharFromExtendedName(String name)
[icu] Find a Unicode character by either its name and return its code
point value.
|
static int |
getCharFromName(String name)
[icu] Finds a Unicode code point by its most current Unicode name and
return its code point value.
|
static int |
getCharFromName1_0(String name)
Deprecated.
ICU 49
|
static int |
getCharFromNameAlias(String name)
[icu] Find a Unicode character by its corrected name alias and return
its code point value.
|
static int |
getCodePoint(char char16)
[icu] Returns the code point corresponding to the BMP code point.
|
static int |
getCodePoint(int lead,
int trail)
[icu] Returns a code point corresponding to the two surrogate code units.
|
static int |
getCombiningClass(int ch)
[icu] Returns the combining class of the argument codepoint
|
static int |
getDirection(int ch)
[icu] Returns the Bidirection property of a code point.
|
static byte |
getDirectionality(int cp)
Equivalent to the
Character.getDirectionality(char) method, for
convenience. |
static String |
getExtendedName(int ch)
[icu] Returns a name for a valid codepoint.
|
static ValueIterator |
getExtendedNameIterator()
[icu] Returns an iterator for character names, iterating over codepoints.
|
static int |
getHanNumericValue(int ch)
[icu] Returns the numeric value of a Han character.
|
static int |
getIdentifierTypes(int c,
EnumSet<UCharacter.IdentifierType> types)
Writes code point c's Identifier_Type as a set of IdentifierType values and
returns the number of types.
|
static int |
getIntPropertyMaxValue(int type)
[icu] Returns the maximum value for an integer/binary Unicode property.
|
static int |
getIntPropertyMinValue(int type)
[icu] Returns the minimum value for an integer/binary Unicode property type.
|
static int |
getIntPropertyValue(int ch,
int type)
[icu] Returns the property value for a Unicode property type of a code point.
|
static String |
getISOComment(int ch)
Deprecated.
ICU 49
|
static int |
getMirror(int ch)
[icu] Maps the specified code point to a "mirror-image" code point.
|
static String |
getName(int ch)
[icu] Returns the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not
have a name. |
static String |
getName(String s,
String separator)
[icu] Returns the names for each of the characters in a string
|
static String |
getName1_0(int ch)
Deprecated.
ICU 49
|
static ValueIterator |
getName1_0Iterator()
Deprecated.
ICU 49
|
static String |
getNameAlias(int ch)
[icu] Returns the corrected name from NameAliases.txt if there is one.
|
static ValueIterator |
getNameIterator()
[icu] Returns an iterator for character names, iterating over codepoints.
|
static int |
getNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative
integer.
|
static int |
getPropertyEnum(CharSequence propertyAlias)
[icu] Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
|
static String |
getPropertyName(int property,
int nameChoice)
[icu] Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt.
|
static int |
getPropertyValueEnum(int property,
CharSequence valueAlias)
[icu] Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
|
static int |
getPropertyValueEnumNoThrow(int property,
CharSequence valueAlias)
Deprecated.
This API is ICU internal only.
|
static String |
getPropertyValueName(int property,
int value,
int nameChoice)
[icu] Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt.
|
static String |
getStringPropertyValue(int propertyEnum,
int codepoint,
int nameChoice)
Deprecated.
This API is ICU internal only.
|
static int |
getType(int ch)
Returns a value indicating a code point's Unicode category.
|
static RangeValueIterator |
getTypeIterator()
[icu] Returns an iterator for character types, iterating over codepoints.
|
static double |
getUnicodeNumericValue(int ch)
[icu] Returns the numeric value for a Unicode code point as defined in the
Unicode Character Database.
|
static VersionInfo |
getUnicodeVersion()
[icu] Returns the version of Unicode data used.
|
static boolean |
hasBinaryProperty(CharSequence s,
int property)
[icu] Returns true if the property is true for the string.
|
static boolean |
hasBinaryProperty(int ch,
int property)
[icu] Check a binary Unicode property for a code point.
|
static boolean |
hasIdentifierType(int c,
UCharacter.IdentifierType type)
Does the set of Identifier_Type values code point c contain the given type?
|
static boolean |
isBaseForm(int ch)
[icu] Determines whether the specified code point is of base form.
|
static boolean |
isBMP(int ch)
[icu] Determines if the code point is in the BMP plane.
|
static boolean |
isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
|
static boolean |
isDigit(int ch)
Determines if a code point is a Java digit.
|
static boolean |
isHighSurrogate(int codePoint)
Same as
Character.isHighSurrogate(char) ,
except that the ICU version accepts int for code points. |
static boolean |
isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an
ignorable character in a Java identifier.
|
static boolean |
isISOControl(int ch)
Determines if the specified code point is an ISO control character.
|
static boolean |
isJavaIdentifierPart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart.
|
static boolean |
isJavaIdentifierStart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart.
|
static boolean |
isJavaLetter(int cp)
Deprecated.
ICU 3.4 (Java)
|
static boolean |
isJavaLetterOrDigit(int cp)
Deprecated.
ICU 3.4 (Java)
|
static boolean |
isLegal(int ch)
[icu] A code point is illegal if and only if
Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
A surrogate value, 0xD800 to 0xDFFF
Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
|
static boolean |
isLegal(String str)
[icu] A string is legal iff all its code points are legal.
|
static boolean |
isLetter(int ch)
Determines if the specified code point is a letter.
|
static boolean |
isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit.
|
static boolean |
isLowerCase(int ch)
Determines if the specified code point is a lowercase character.
|
static boolean |
isLowSurrogate(int codePoint)
Same as
Character.isLowSurrogate(char) ,
except that the ICU version accepts int for code points. |
static boolean |
isMirrored(int ch)
Determines whether the code point has the "mirrored" property.
|
static boolean |
isPrintable(int ch)
[icu] Determines whether the specified code point is a printable character
according to the Unicode standard.
|
static boolean |
isSpace(int ch)
Deprecated.
ICU 3.4 (Java)
|
static boolean |
isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space
character, i.e. if code point is in the category Zs, Zl and Zp.
|
static boolean |
isSupplementary(int ch)
[icu] Determines if the code point is a supplementary character.
|
static boolean |
isSupplementaryCodePoint(int cp)
|
static boolean |
isSurrogatePair(int high,
int low)
Same as
Character.isSurrogatePair(char, char) ,
except that the ICU version accepts int for code points. |
static boolean |
isTitleCase(int ch)
Determines if the specified code point is a titlecase character.
|
static boolean |
isUAlphabetic(int ch)
[icu] Check if a code point has the Alphabetic Unicode property.
|
static boolean |
isULowercase(int ch)
[icu] Check if a code point has the Lowercase Unicode property.
|
static boolean |
isUnicodeIdentifierPart(int ch)
Determines if the specified character is permissible as a
non-initial character of an identifier
according to UAX #31 Unicode Identifier and Pattern Syntax.
|
static boolean |
isUnicodeIdentifierStart(int ch)
Determines if the specified character is permissible as the first character in an identifier
according to UAX #31 Unicode Identifier and Pattern Syntax.
|
static boolean |
isUpperCase(int ch)
Determines if the specified code point is an uppercase character.
|
static boolean |
isUUppercase(int ch)
[icu] Check if a code point has the Uppercase Unicode property.
|
static boolean |
isUWhiteSpace(int ch)
[icu] Check if a code point has the White_Space Unicode property.
|
static boolean |
isValidCodePoint(int cp)
Equivalent to
Character.isValidCodePoint(int) . |
static boolean |
isWhitespace(int ch)
Determines if the specified code point is a white space character.
|
static int |
offsetByCodePoints(char[] text,
int start,
int count,
int index,
int codePointOffset)
Equivalent to the
Character.offsetByCodePoints(char[], int, int, int, int)
method, for convenience. |
static int |
offsetByCodePoints(CharSequence text,
int index,
int codePointOffset)
Equivalent to the
Character.offsetByCodePoints(CharSequence, int, int)
method, for convenience. |
static char[] |
toChars(int cp)
Same as
Character.toChars(int) . |
static int |
toChars(int cp,
char[] dst,
int dstIndex)
Same as
Character.toChars(int, char[], int) . |
static int |
toCodePoint(int high,
int low)
Same as
Character.toCodePoint(char, char) ,
except that the ICU version accepts int for code points. |
static int |
toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned.
|
static String |
toLowerCase(Locale locale,
String str)
Returns the lowercase version of the argument string.
|
static String |
toLowerCase(String str)
Returns the lowercase version of the argument string.
|
static String |
toLowerCase(ULocale locale,
String str)
Returns the lowercase version of the argument string.
|
static String |
toString(int ch)
Converts argument code point and returns a String object representing
the code point's value in UTF-16 format.
|
static int |
toTitleCase(int ch)
Converts the code point argument to titlecase.
|
static String |
toTitleCase(Locale locale,
String str,
BreakIterator breakiter)
Returns the titlecase version of the argument string.
|
static String |
toTitleCase(Locale locale,
String str,
BreakIterator titleIter,
int options)
[icu] Returns the titlecase version of the argument string.
|
static String |
toTitleCase(String str,
BreakIterator breakiter)
Returns the titlecase version of the argument string.
|
static String |
toTitleCase(ULocale locale,
String str,
BreakIterator titleIter)
Returns the titlecase version of the argument string.
|
static String |
toTitleCase(ULocale locale,
String str,
BreakIterator titleIter,
int options)
Returns the titlecase version of the argument string.
|
static int |
toUpperCase(int ch)
Converts the character argument to uppercase.
|
static String |
toUpperCase(Locale locale,
String str)
Returns the uppercase version of the argument string.
|
static String |
toUpperCase(String str)
Returns the uppercase version of the argument string.
|
static String |
toUpperCase(ULocale locale,
String str)
Returns the uppercase version of the argument string.
|
public static final int MIN_VALUE
Character.MIN_CODE_POINT
, same integer value as Character.MIN_VALUE
.public static final int MAX_VALUE
Character.MAX_CODE_POINT
.
Up-to-date Unicode implementation of Character.MAX_VALUE
which is still a char with the value U+FFFF.
public static final int SUPPLEMENTARY_MIN_VALUE
Character.MIN_SUPPLEMENTARY_CODE_POINT
.public static final int REPLACEMENT_CHAR
public static final double NO_NUMERIC_VALUE
getUnicodeNumericValue(int)
,
Constant Field Valuespublic static final int MIN_RADIX
public static final int MAX_RADIX
public static final int TITLECASE_NO_LOWERCASE
toTitleCase(int)
,
Constant Field Valuespublic static final int TITLECASE_NO_BREAK_ADJUSTMENT
toTitleCase(int)
,
TITLECASE_NO_LOWERCASE
,
Constant Field Valuespublic static final int FOLD_CASE_DEFAULT
public static final int FOLD_CASE_EXCLUDE_SPECIAL_I
Before Unicode 3.2, CaseFolding.txt contains mappings marked with 'I' that are to be included for default mappings and excluded for the Turkic-specific mappings.
Unicode 3.2 CaseFolding.txt instead contains mappings marked with 'T' that are to be excluded for default mappings and included for the Turkic-specific mappings.
public static final char MIN_HIGH_SURROGATE
Character.MIN_HIGH_SURROGATE
.public static final char MAX_HIGH_SURROGATE
Character.MAX_HIGH_SURROGATE
.public static final char MIN_LOW_SURROGATE
Character.MIN_LOW_SURROGATE
.public static final char MAX_LOW_SURROGATE
Character.MAX_LOW_SURROGATE
.public static final char MIN_SURROGATE
Character.MIN_SURROGATE
.public static final char MAX_SURROGATE
Character.MAX_SURROGATE
.public static final int MIN_SUPPLEMENTARY_CODE_POINT
Character.MIN_SUPPLEMENTARY_CODE_POINT
.public static final int MAX_CODE_POINT
Character.MAX_CODE_POINT
.public static final int MIN_CODE_POINT
Character.MIN_CODE_POINT
.public static int digit(int ch, int radix)
java.lang.Character.digit()
. Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
ch
- the code point to queryradix
- the radixpublic static int digit(int ch)
digit(int, int)
that provides a decimal radix.
ch
- the code point to querypublic static int getNumericValue(int ch)
ch
- the code point to querypublic static double getUnicodeNumericValue(int ch)
A "double" return type is necessary because some numeric values are fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE. Note: This is different from the Unicode Standard which specifies NaN as the default value.
API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. This has been changed to synch with ICU4C This corresponds to the ICU4C function u_getNumericValue.
ch
- Code point to get the numeric value for.@Deprecated public static boolean isSpace(int ch)
ch
- the code pointpublic static int getType(int ch)
ch
- code point whose type is to be determinedpublic static boolean isDefined(int ch)
ch
- code point to be determined if it is defined in the most
current version of Unicodepublic static boolean isDigit(int ch)
java.lang.Character.isDigit()
. It returns true for decimal
digits only.
ch
- code point to querypublic static boolean isISOControl(int ch)
ch
- code point to determine if it is an ISO control characterpublic static boolean isLetter(int ch)
ch
- code point to determine if it is a letterpublic static boolean isLetterOrDigit(int ch)
[icu] Note: This method, unlike java.lang.Character does not regard the ascii characters 'A' - 'Z' and 'a' - 'z' as digits.
ch
- code point to determine if it is a letter or a digit@Deprecated public static boolean isJavaLetter(int cp)
cp
- the code point@Deprecated public static boolean isJavaLetterOrDigit(int cp)
cp
- the code pointpublic static boolean isJavaIdentifierStart(int cp)
cp
- the code pointpublic static boolean isJavaIdentifierPart(int cp)
cp
- the code pointpublic static boolean isLowerCase(int ch)
ch
- code point to determine if it is in lowercasepublic static boolean isWhitespace(int ch)
Note: Unicode 4.0.1 changed U+200B ZERO WIDTH SPACE from a Space Separator (Zs) to a Format Control (Cf). Since then, isWhitespace(0x200b) returns false. See http://www.unicode.org/versions/Unicode4.0.1/
ch
- code point to determine if it is a white spacepublic static boolean isSpaceChar(int ch)
ch
- code point to determine if it is a spacepublic static boolean isTitleCase(int ch)
ch
- code point to determine if it is in title casepublic static boolean isUnicodeIdentifierPart(int ch)
Same as Unicode ID_Continue (UProperty.ID_CONTINUE
).
Note that this differs from Character.isUnicodeIdentifierPart(char)
which implements a different identifier profile.
ch
- the code point to be testedpublic static boolean isUnicodeIdentifierStart(int ch)
Same as Unicode ID_Start (UProperty.ID_START
).
Note that this differs from Character.isUnicodeIdentifierStart(char)
which implements a different identifier profile.
ch
- the code point to be testedpublic static final boolean hasIdentifierType(int c, UCharacter.IdentifierType type)
Used for UTS #39 General Security Profile for Identifiers (https://www.unicode.org/reports/tr39/#General_Security_Profile).
Each code point maps to a set of UIdentifierType values.
c
- code pointtype
- Identifier_Type to checkpublic static final int getIdentifierTypes(int c, EnumSet<UCharacter.IdentifierType> types)
Used for UTS #39 General Security Profile for Identifiers (https://www.unicode.org/reports/tr39/#General_Security_Profile).
Each code point maps to a set of IdentifierType values. There is always at least one type. Only some of the types can be combined with others, and usually only a small number of types occur together. Future versions might add additional types. See UTS #39 and its data files for details.
c
- code pointtypes
- output setpublic static boolean isIdentifierIgnorable(int ch)
Note that Unicode just recommends to ignore Cf (format controls).
ch
- code point to be determined if it can be ignored in a Unicode
identifier.public static boolean isUpperCase(int ch)
ch
- code point to determine if it is in uppercasepublic static int toLowerCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: https://unicode-org.github.io/icu/userguide/icu/posix#case-mappings
ch
- code point whose lowercase equivalent is to be retrievedpublic static String toString(int ch)
Up-to-date Unicode implementation of java.lang.Character.toString().
ch
- code pointpublic static int toTitleCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: https://unicode-org.github.io/icu/userguide/icu/posix#case-mappings
ch
- code point whose title case is to be retrievedpublic static int toUpperCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: https://unicode-org.github.io/icu/userguide/icu/posix#case-mappings
ch
- code point whose uppercase is to be retrievedpublic static boolean isSupplementary(int ch)
ch
- code point to be determined if it is in the supplementary
planepublic static boolean isBMP(int ch)
ch
- code point to be determined if it is not a supplementary
characterpublic static boolean isPrintable(int ch)
ch
- code point to be determined if it is printablepublic static boolean isBaseForm(int ch)
ch
- code point to be determined if it is of base formpublic static int getDirection(int ch)
ch
- the code point to be determined its directionpublic static boolean isMirrored(int ch)
ch
- code point whose mirror is to be determinedpublic static int getMirror(int ch)
ch
- code point whose mirror is to be retrievedpublic static int getBidiPairedBracket(int c)
c
- the code point to be mappedUProperty.BIDI_PAIRED_BRACKET
,
UProperty.BIDI_PAIRED_BRACKET_TYPE
,
getMirror(int)
public static int getCombiningClass(int ch)
ch
- code point whose combining is to be retrievedpublic static boolean isLegal(int ch)
ch
- code point to determine if it is a legal code point by itselfpublic static boolean isLegal(String str)
str
- containing code points to examinpublic static VersionInfo getUnicodeVersion()
public static String getName(int ch)
UCharacter.MIN_VALUE
and UCharacter.MAX_VALUE
or does not
have a name.
getName()
incurs a one-time initialization cost to construct the name tables.ch
- the code point for which to get the namepublic static String getName(String s, String separator)
s
- string to formatseparator
- string to go between names@Deprecated public static String getName1_0(int ch)
ch
- the code point for which to get the namepublic static String getExtendedName(int ch)
The names are returned in the following order.
getName()
incurs a one-time initialization cost to construct the name tables.ch
- the code point for which to get the namepublic static String getNameAlias(int ch)
getName()
incurs a one-time initialization cost to construct the name tables.ch
- the code point for which to get the name alias@Deprecated public static String getISOComment(int ch)
ch
- The code point for which to get the ISO comment.
It must be the case that 0 <= ch <= 0x10ffff
.public static int getCharFromName(String name)
Finds a Unicode code point by its most current Unicode name and
return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. getName()
incurs a one-time initialization cost to construct the name tables.
name
- most current Unicode character name whose code point is to
be returned@Deprecated public static int getCharFromName1_0(String name)
Used to find a Unicode character by its version 1.0 Unicode name and return its code point value.
name
- Unicode 1.0 code point name whose code point is to be
returnedgetName1_0(int)
public static int getCharFromExtendedName(String name)
Find a Unicode character by either its name and return its code point value. All Unicode names are in uppercase. Extended names are all lowercase except for numbers and are contained within angle brackets. The names are searched in the following order
getName()
incurs a one-time initialization cost to construct the name tables.name
- codepoint namepublic static int getCharFromNameAlias(String name)
Find a Unicode character by its corrected name alias and return
its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. getName()
incurs a one-time initialization cost to construct the name tables.
name
- Unicode name alias whose code point is to be returnedpublic static String getPropertyName(int property, int nameChoice)
property
- UProperty selector.nameChoice
- UProperty.NameChoice selector for which name
to get. All properties have a long name. Most have a short
name, but some do not. Unicode allows for additional names; if
present these will be returned by UProperty.NameChoice.LONG + i,
where i=1, 2,...IllegalArgumentException
- thrown if property or
nameChoice are invalid.UProperty
,
UProperty.NameChoice
public static int getPropertyEnum(CharSequence propertyAlias)
propertyAlias
- the property name to be matched. The name
is compared using "loose matching" as described in
PropertyAliases.txt.IllegalArgumentException
- thrown if propertyAlias
is not recognized.UProperty
public static String getPropertyValueName(int property, int value, int nameChoice)
property
- UProperty selector constant.
UProperty.INT_START <= property < UProperty.INT_LIMIT or
UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or
UProperty.MASK_START < = property < UProperty.MASK_LIMIT.
If out of range, null is returned.value
- selector for a value for the given property. In
general, valid values range from 0 up to some maximum. There
are a few exceptions: (1.) UProperty.BLOCK values begin at the
non-zero value BASIC_LATIN.getID(). (2.)
UProperty.CANONICAL_COMBINING_CLASS values are not contiguous
and range from 0..240. (3.) UProperty.GENERAL_CATEGORY_MASK values
are mask values produced by left-shifting 1 by
UCharacter.getType(). This allows grouped categories such as
[:L:] to be represented. Mask values are non-contiguous.nameChoice
- UProperty.NameChoice selector for which name
to get. All values have a long name. Most have a short name,
but some do not. Unicode allows for additional names; if
present these will be returned by UProperty.NameChoice.LONG + i,
where i=1, 2,...IllegalArgumentException
- thrown if property, value,
or nameChoice are invalid.UProperty
,
UProperty.NameChoice
public static int getPropertyValueEnum(int property, CharSequence valueAlias)
property
- UProperty selector constant.
UProperty.INT_START <= property < UProperty.INT_LIMIT or
UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or
UProperty.MASK_START < = property < UProperty.MASK_LIMIT.
Only these properties can be enumerated.valueAlias
- the value name to be matched. The name is
compared using "loose matching" as described in
PropertyValueAliases.txt.IllegalArgumentException
- if property is not a valid UProperty
selector or valueAlias is not a value of this propertyUProperty
@Deprecated public static int getPropertyValueEnumNoThrow(int property, CharSequence valueAlias)
getPropertyValueEnum(int, CharSequence)
, except doesn't throw exception. Instead, returns UProperty.UNDEFINED.property
- Same as getPropertyValueEnum(int, CharSequence)
valueAlias
- Same as getPropertyValueEnum(int, CharSequence)
public static int getCodePoint(int lead, int trail)
lead
- the lead unit
(In ICU 2.1-69 the type of both parameters was char
.)trail
- the trail unitIllegalArgumentException
- thrown when the code units do
not form a valid surrogate pairtoCodePoint(int, int)
public static int getCodePoint(char char16)
char16
- the BMP code pointIllegalArgumentException
- thrown when char16 is not a valid
code pointpublic static String toUpperCase(String str)
str
- source string to be performed onpublic static String toLowerCase(String str)
str
- source string to be performed onpublic static String toTitleCase(String str, BreakIterator breakiter)
Returns the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the default locale and context-sensitive
str
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.public static String toUpperCase(Locale locale, String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static String toUpperCase(ULocale locale, String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static String toLowerCase(Locale locale, String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static String toLowerCase(ULocale locale, String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static String toTitleCase(Locale locale, String str, BreakIterator breakiter)
Returns the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter)
Returns the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed ontitleIter
- break iterator to determine the positions in which
the character should be title cased.public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter, int options)
Returns the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed ontitleIter
- break iterator to determine the positions in which
the character should be title cased.options
- bit set to modify the titlecasing operationTITLECASE_NO_LOWERCASE
,
TITLECASE_NO_BREAK_ADJUSTMENT
public static String toTitleCase(Locale locale, String str, BreakIterator titleIter, int options)
Returns the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed ontitleIter
- break iterator to determine the positions in which
the character should be title cased.options
- bit set to modify the titlecasing operationTITLECASE_NO_LOWERCASE
,
TITLECASE_NO_BREAK_ADJUSTMENT
public static int foldCase(int ch, boolean defaultmapping)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: https://unicode-org.github.io/icu/userguide/icu/posix#case-mappings
ch
- the character to be converteddefaultmapping
- Indicates whether the default mappings defined in
CaseFolding.txt are to be used, otherwise the
mappings for dotted I and dotless i marked with
'T' in CaseFolding.txt are included.foldCase(String, boolean)
public static String foldCase(String str, boolean defaultmapping)
str
- the String to be converteddefaultmapping
- Indicates whether the default mappings defined in
CaseFolding.txt are to be used, otherwise the
mappings for dotted I and dotless i marked with
'T' in CaseFolding.txt are included.foldCase(int, boolean)
public static int foldCase(int ch, int options)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: https://unicode-org.github.io/icu/userguide/icu/posix#case-mappings
ch
- the character to be convertedoptions
- A bit set for special processing. Currently the recognised options
are FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULTfoldCase(String, boolean)
public static final String foldCase(String str, int options)
str
- the String to be convertedoptions
- A bit set for special processing. Currently the recognised options
are FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULTfoldCase(int, boolean)
public static int getHanNumericValue(int ch)
This returns the value of Han 'numeric' code points, including those for zero, ten, hundred, thousand, ten thousand, and hundred million. This includes both the standard and 'checkwriting' characters, the 'big circle' zero character, and the standard zero character.
Note: The Unicode Standard has numeric values for more
Han characters recognized by this method
(see getNumericValue(int)
and the UCD file DerivedNumericValues.txt),
and a NumberFormat
can be used with
a Chinese NumberingSystem
.
ch
- code point to querypublic static RangeValueIterator getTypeIterator()
Returns an iterator for character types, iterating over codepoints.
Example of use:
RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element element = new RangeValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.start) + " to codepoint \\u" + Integer.toHexString(element.limit - 1) + " has the character type " + element.value); }
public static ValueIterator getNameIterator()
Returns an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or for extended names use getExtendedNameIterator().
Example of use:
ValueIterator iterator = UCharacter.getNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.
@Deprecated public static ValueIterator getName1_0Iterator()
Used to return an iterator for the older 1.0 Unicode character names, iterating over codepoints.
getName1_0(int)
public static ValueIterator getExtendedNameIterator()
Returns an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names. For modern, most up-to-date Unicode names use getNameIterator() or for older 1.0 Unicode names use get1_0NameIterator().
Example of use:
ValueIterator iterator = UCharacter.getExtendedNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
public static VersionInfo getAge(int ch)
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
ch
- The code point.public static boolean hasBinaryProperty(int ch, int property)
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available.
ch
- code point to test.property
- selector constant from com.ibm.icu.lang.UProperty,
identifies which binary property to check.UProperty
,
CharacterProperties.getBinaryPropertySet(int)
public static boolean hasBinaryProperty(CharSequence s, int property)
hasBinaryProperty(int, int)
if the string contains exactly one code point.
Most properties apply only to single code points. UTS #51 Unicode Emoji defines several properties of strings.
s
- String to test.property
- UProperty selector constant, identifies which binary property to check.
Must be BINARY_START<=which<BINARY_LIMIT.property
is out of bounds or if the Unicode version
does not have data for the property at all.UProperty
,
CharacterProperties.getBinaryPropertySet(int)
public static boolean isUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).
Different from UCharacter.isLetter(ch)!
ch
- codepoint to be testedpublic static boolean isULowercase(int ch)
Check if a code point has the Lowercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).
This is different from UCharacter.isLowerCase(ch)!
ch
- codepoint to be testedpublic static boolean isUUppercase(int ch)
Check if a code point has the Uppercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).
This is different from UCharacter.isUpperCase(ch)!
ch
- codepoint to be testedpublic static boolean isUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).
This is different from both UCharacter.isSpace(ch) and UCharacter.isWhitespace(ch)!
ch
- codepoint to be testedpublic static int getIntPropertyValue(int ch, int type)
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage: int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH); int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC); boolean b = (ideo == 1) ? true : false;
ch
- code point to test.type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT or
UProperty.MASK_START <= type < UProperty.MASK_LIMIT.UCharacterEnums.ECharacterCategory
, UCharacterEnums.ECharacterDirection
,
UCharacter.DecompositionType
, etc.).
Returns 0 or 1 (for false / true) for binary Unicode properties.
Returns a bit-mask for mask properties.
Returns 0 if 'type' is out of bounds or if the Unicode version
does not have data for the property at all, or not for this code
point.UProperty
,
hasBinaryProperty(int, int)
,
getIntPropertyMinValue(int)
,
getIntPropertyMaxValue(int)
,
CharacterProperties.getIntPropertyMap(int)
,
getUnicodeVersion()
@Deprecated public static String getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice)
propertyEnum
- The property enum value.codepoint
- The codepoint value.nameChoice
- The choice of the name.public static int getIntPropertyMinValue(int type)
type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT.UProperty
,
hasBinaryProperty(int, int)
,
getUnicodeVersion()
,
getIntPropertyMaxValue(int)
,
getIntPropertyValue(int, int)
public static int getIntPropertyMaxValue(int type)
type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT.UProperty
,
hasBinaryProperty(int, int)
,
getUnicodeVersion()
,
getIntPropertyMaxValue(int)
,
getIntPropertyValue(int, int)
public static char forDigit(int digit, int radix)
public static final boolean isValidCodePoint(int cp)
Character.isValidCodePoint(int)
.cp
- the code point to checkpublic static final boolean isSupplementaryCodePoint(int cp)
cp
- the code point to checkpublic static boolean isHighSurrogate(int codePoint)
Character.isHighSurrogate(char)
,
except that the ICU version accepts int
for code points.codePoint
- the code point to check
(In ICU 3.0-69 the type of this parameter was char
.)public static boolean isLowSurrogate(int codePoint)
Character.isLowSurrogate(char)
,
except that the ICU version accepts int
for code points.codePoint
- the code point to check
(In ICU 3.0-69 the type of this parameter was char
.)public static final boolean isSurrogatePair(int high, int low)
Character.isSurrogatePair(char, char)
,
except that the ICU version accepts int
for code points.high
- the high (lead) unit
(In ICU 3.0-69 the type of both parameters was char
.)low
- the low (trail) unitpublic static int charCount(int cp)
Character.charCount(int)
.
Returns the number of chars needed to represent the code point (1 or 2).
This does not check the code point for validity.cp
- the code point to checkpublic static final int toCodePoint(int high, int low)
Character.toCodePoint(char, char)
,
except that the ICU version accepts int
for code points.
Returns the code point represented by the two surrogate code units.
This does not check the surrogate pair for validity.high
- the high (lead) surrogate
(In ICU 3.0-69 the type of both parameters was char
.)low
- the low (trail) surrogategetCodePoint(int, int)
public static final int codePointAt(CharSequence seq, int index)
Character.codePointAt(CharSequence, int)
.
Returns the code point at index.
This examines only the characters at index and index+1.seq
- the characters to checkindex
- the index of the first or only char forming the code pointpublic static final int codePointAt(char[] text, int index)
Character.codePointAt(char[], int)
.
Returns the code point at index.
This examines only the characters at index and index+1.text
- the characters to checkindex
- the index of the first or only char forming the code pointpublic static final int codePointAt(char[] text, int index, int limit)
Character.codePointAt(char[], int, int)
.
Returns the code point at index.
This examines only the characters at index and index+1.text
- the characters to checkindex
- the index of the first or only char forming the code pointlimit
- the limit of the valid textpublic static final int codePointBefore(CharSequence seq, int index)
Character.codePointBefore(CharSequence, int)
.
Return the code point before index.
This examines only the characters at index-1 and index-2.seq
- the characters to checkindex
- the index after the last or only char forming the code pointpublic static final int codePointBefore(char[] text, int index)
Character.codePointBefore(char[], int)
.
Returns the code point before index.
This examines only the characters at index-1 and index-2.text
- the characters to checkindex
- the index after the last or only char forming the code pointpublic static final int codePointBefore(char[] text, int index, int limit)
Character.codePointBefore(char[], int, int)
.
Return the code point before index.
This examines only the characters at index-1 and index-2.text
- the characters to checkindex
- the index after the last or only char forming the code pointlimit
- the start of the valid textpublic static final int toChars(int cp, char[] dst, int dstIndex)
Character.toChars(int, char[], int)
.
Writes the chars representing the
code point into the destination at the given index.cp
- the code point to convertdst
- the destination array into which to put the char(s) representing the code pointdstIndex
- the index at which to put the first (or only) charIllegalArgumentException
- if cp is not a valid code pointpublic static final char[] toChars(int cp)
Character.toChars(int)
.
Returns a char array representing the code point.cp
- the code point to convertIllegalArgumentException
- if cp is not a valid code pointpublic static byte getDirectionality(int cp)
Character.getDirectionality(char)
method, for
convenience. Returns a byte representing the directionality of the
character.
[icu] Note: Unlike Character.getDirectionality(char)
, this returns
DIRECTIONALITY_LEFT_TO_RIGHT for undefined or out-of-bounds characters.
[icu] Note: The return value must be tested using the constants defined in UCharacterDirection
and its interface UCharacterEnums.ECharacterDirection
since the values are different from the ones
defined by java.lang.Character
.
cp
- the code point to checkgetDirection(int)
public static int codePointCount(CharSequence text, int start, int limit)
Character.codePointCount(CharSequence, int, int)
method, for convenience. Counts the number of code points in the range
of text.text
- the characters to checkstart
- the start of the rangelimit
- the limit of the rangepublic static int codePointCount(char[] text, int start, int limit)
Character.codePointCount(char[], int, int)
method, for
convenience. Counts the number of code points in the range of text.text
- the characters to checkstart
- the start of the rangelimit
- the limit of the rangepublic static int offsetByCodePoints(CharSequence text, int index, int codePointOffset)
Character.offsetByCodePoints(CharSequence, int, int)
method, for convenience. Adjusts the char index by a code point offset.text
- the characters to checkindex
- the index to adjustcodePointOffset
- the number of code points by which to offset the indexpublic static int offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset)
Character.offsetByCodePoints(char[], int, int, int, int)
method, for convenience. Adjusts the char index by a code point offset.text
- the characters to checkstart
- the start of the range to checkcount
- the length of the range to checkindex
- the index to adjustcodePointOffset
- the number of code points by which to offset the indexCopyright © 2016 Unicode, Inc. and others.