public final class UTF16 extends Object
Standalone utility class providing UTF16 character conversions and indexing conversions.
Code that uses strings alone rarely need modification. By design, UTF-16 does not allow overlap,
so searching for strings is a safe operation. Similarly, concatenation is always safe.
Substringing is safe if the start and end are both on UTF-32 boundaries. In normal code, the
values for start and end are on those boundaries, since they arose from operations like
searching. If not, the nearest UTF-32 boundaries can be determined using bounds()
.
The following examples illustrate use of some of these methods.
// iteration forwards: Original for (int i = 0; i < s.length(); ++i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration forwards: Changes for UTF-32 int ch; for (int i = 0; i < s.length(); i += UTF16.getCharCount(ch)) { ch = UTF16.charAt(s, i); doSomethingWith(ch); } // iteration backwards: Original for (int i = s.length() - 1; i >= 0; --i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration backwards: Changes for UTF-32 int ch; for (int i = s.length() - 1; i > 0; i -= UTF16.getCharCount(ch)) { ch = UTF16.charAt(s, i); doSomethingWith(ch); }Notes:
Lead
and Trail
in the API, which gives a better sense of their ordering in a string.
offset16
and offset32
are used to distinguish offsets to UTF-16
boundaries vs offsets to UTF-32 boundaries. int char32
is used to contain UTF-32
characters, as opposed to char16
, which is a UTF-16 code unit. bounds(string, offset16) != TRAIL
.
UCharacter.isLegal()
can be used to
check for validity if desired. Modifier and Type | Class and Description |
---|---|
static class |
UTF16.StringComparator
UTF16 string comparator class.
|
Modifier and Type | Field and Description |
---|---|
static int |
CODEPOINT_MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard.
|
static int |
CODEPOINT_MIN_VALUE
The lowest Unicode code point value.
|
static int |
LEAD_SURROGATE_BOUNDARY
Value returned in
bounds() . |
static int |
LEAD_SURROGATE_MAX_VALUE
Lead surrogate maximum value
|
static int |
LEAD_SURROGATE_MIN_VALUE
Lead surrogate minimum value
|
static int |
SINGLE_CHAR_BOUNDARY
Value returned in
bounds() . |
static int |
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points
|
static int |
SURROGATE_MAX_VALUE
Maximum surrogate value
|
static int |
SURROGATE_MIN_VALUE
Surrogate minimum value
|
static int |
TRAIL_SURROGATE_BOUNDARY
Value returned in
bounds() . |
static int |
TRAIL_SURROGATE_MAX_VALUE
Trail surrogate maximum value
|
static int |
TRAIL_SURROGATE_MIN_VALUE
Trail surrogate minimum value
|
Modifier and Type | Method and Description |
---|---|
static int |
append(char[] target,
int limit,
int char32)
Adds a codepoint to offset16 position of the argument char array.
|
static StringBuffer |
append(StringBuffer target,
int char32)
Append a single UTF-32 value to the end of a StringBuffer.
|
static StringBuffer |
appendCodePoint(StringBuffer target,
int cp)
Cover JDK 1.5 APIs.
|
static int |
bounds(char[] source,
int start,
int limit,
int offset16)
Returns the type of the boundaries around the char at offset16.
|
static int |
bounds(StringBuffer source,
int offset16)
Returns the type of the boundaries around the char at offset16.
|
static int |
bounds(String source,
int offset16)
Returns the type of the boundaries around the char at offset16.
|
static int |
charAt(char[] source,
int start,
int limit,
int offset16)
Extract a single UTF-32 value from a substring.
|
static int |
charAt(CharSequence source,
int offset16)
Extract a single UTF-32 value from a string.
|
static int |
charAt(Replaceable source,
int offset16)
Extract a single UTF-32 value from a string.
|
static int |
charAt(StringBuffer source,
int offset16)
Extract a single UTF-32 value from a string.
|
static int |
charAt(String source,
int offset16)
Extract a single UTF-32 value from a string.
|
static int |
compareCodePoint(int codePoint,
CharSequence s)
Utility for comparing a code point to a string without having to create a new string.
|
static int |
countCodePoint(char[] source,
int start,
int limit)
Number of codepoints in a UTF16 char array substring
|
static int |
countCodePoint(String source)
Number of codepoints in a UTF16 String
|
static int |
countCodePoint(StringBuffer source)
Number of codepoints in a UTF16 String buffer
|
static int |
delete(char[] target,
int limit,
int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1
character if the codepoint is a non-supplementary, 2 otherwise).
|
static StringBuffer |
delete(StringBuffer target,
int offset16)
Removes the codepoint at the specified position in this target (shortening target by 1
character if the codepoint is a non-supplementary, 2 otherwise).
|
static int |
findCodePointOffset(char[] source,
int start,
int limit,
int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16
offset.
|
static int |
findCodePointOffset(StringBuffer source,
int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16
offset.
|
static int |
findCodePointOffset(String source,
int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given
UTF-16 offset.
|
static int |
findOffsetFromCodePoint(char[] source,
int start,
int limit,
int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int |
findOffsetFromCodePoint(StringBuffer source,
int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int |
findOffsetFromCodePoint(String source,
int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset.
|
static int |
getCharCount(int char32)
Determines how many chars this char32 requires.
|
static char |
getLeadSurrogate(int char32)
Returns the lead surrogate.
|
static int |
getSingleCodePoint(CharSequence s)
Utility for getting a code point from a CharSequence that contains exactly one code point.
|
static char |
getTrailSurrogate(int char32)
Returns the trail surrogate.
|
static boolean |
hasMoreCodePointsThan(char[] source,
int start,
int limit,
int number)
Check if the sub-range of char array, from argument start to limit, contains more Unicode
code points than a certain number.
|
static boolean |
hasMoreCodePointsThan(StringBuffer source,
int number)
Check if the string buffer contains more Unicode code points than a certain number.
|
static boolean |
hasMoreCodePointsThan(String source,
int number)
Check if the string contains more Unicode code points than a certain number.
|
static int |
indexOf(String source,
int char32)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of
the argument codepoint.
|
static int |
indexOf(String source,
int char32,
int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of
the argument codepoint.
|
static int |
indexOf(String source,
String str)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of
the argument string str.
|
static int |
indexOf(String source,
String str,
int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the first occurrence of
the argument string str.
|
static int |
insert(char[] target,
int limit,
int offset16,
int char32)
Inserts char32 codepoint into target at the argument offset16.
|
static StringBuffer |
insert(StringBuffer target,
int offset16,
int char32)
Inserts char32 codepoint into target at the argument offset16.
|
static boolean |
isLeadSurrogate(int codePoint)
Determines whether the code point is a lead surrogate.
|
static boolean |
isSurrogate(int codePoint)
Determines whether the code point is a surrogate.
|
static boolean |
isTrailSurrogate(int codePoint)
Determines whether the code point is a trail surrogate.
|
static int |
lastIndexOf(String source,
int char32)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of
the argument codepoint.
|
static int |
lastIndexOf(String source,
int char32,
int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of
the argument codepoint, where the result is less than or equals to fromIndex.
|
static int |
lastIndexOf(String source,
String str)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of
the argument string str.
|
static int |
lastIndexOf(String source,
String str,
int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of
the argument string str, where the result is less than or equals to fromIndex.
|
static int |
moveCodePointOffset(char[] source,
int start,
int limit,
int offset16,
int shift32)
Shifts offset16 by the argument number of codepoints within a subarray.
|
static int |
moveCodePointOffset(StringBuffer source,
int offset16,
int shift32)
Shifts offset16 by the argument number of codepoints
|
static int |
moveCodePointOffset(String source,
int offset16,
int shift32)
Shifts offset16 by the argument number of codepoints
|
static String |
newString(int[] codePoints,
int offset,
int count)
Cover JDK 1.5 API.
|
static String |
replace(String source,
int oldChar32,
int newChar32)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of
oldChar32 in source with newChar32.
|
static String |
replace(String source,
String oldStr,
String newStr)
Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldStr
in source with newStr.
|
static StringBuffer |
reverse(StringBuffer source)
Reverses a UTF16 format Unicode string and replaces source's content with it.
|
static int |
setCharAt(char[] target,
int limit,
int offset16,
int char32)
Set a code point into a UTF16 position in a char array.
|
static void |
setCharAt(StringBuffer target,
int offset16,
int char32)
Set a code point into a UTF16 position.
|
static String |
valueOf(char[] source,
int start,
int limit,
int offset16)
Convenience method.
|
static String |
valueOf(int char32)
Convenience method corresponding to String.valueOf(char).
|
static String |
valueOf(StringBuffer source,
int offset16)
Convenience method corresponding to StringBuffer.valueOf(codepoint at offset16).
|
static String |
valueOf(String source,
int offset16)
Convenience method corresponding to String.valueOf(codepoint at offset16).
|
public static final int SINGLE_CHAR_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the position of the
character [offset16 - (value >> 2), offset16 + (value & 3)]public static final int LEAD_SURROGATE_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the position of the
character [offset16 - (value >> 2), offset16 + (value & 3)]public static final int TRAIL_SURROGATE_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the position of the
character [offset16 - (value >> 2), offset16 + (value & 3)]public static final int CODEPOINT_MIN_VALUE
public static final int CODEPOINT_MAX_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
public static final int LEAD_SURROGATE_MIN_VALUE
public static final int TRAIL_SURROGATE_MIN_VALUE
public static final int LEAD_SURROGATE_MAX_VALUE
public static final int TRAIL_SURROGATE_MAX_VALUE
public static final int SURROGATE_MIN_VALUE
public static final int SURROGATE_MAX_VALUE
public static int charAt(String source, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity check is
required, use
UCharacter.isLegal()
on the return value. If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not found the incomplete
character will be returnedsource
- Array of UTF-16 charsoffset16
- UTF-16 offset to the start of the character.bounds32()
.IndexOutOfBoundsException
- Thrown if offset16 is out of bounds.public static int charAt(CharSequence source, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity check is
required, use
UCharacter.isLegal()
on the return value. If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not found the incomplete
character will be returnedsource
- Array of UTF-16 charsoffset16
- UTF-16 offset to the start of the character.bounds32()
.IndexOutOfBoundsException
- Thrown if offset16 is out of bounds.public static int charAt(StringBuffer source, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity check is
required, use UCharacter.isLegal()
on the return value. If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not found the incomplete
character will be returnedsource
- UTF-16 chars string bufferoffset16
- UTF-16 offset to the start of the character.bounds32()
.IndexOutOfBoundsException
- Thrown if offset16 is out of bounds.public static int charAt(char[] source, int start, int limit, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity check is
required, use UCharacter.isLegal()
on the return value. If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not found the incomplete
character will be returnedsource
- Array of UTF-16 charsstart
- Offset to substring in the source array for analyzinglimit
- Offset to substring in the source array for analyzingoffset16
- UTF-16 offset relative to startbounds32()
.IndexOutOfBoundsException
- Thrown if offset16 is not within the range of start and limit.public static int charAt(Replaceable source, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity check is
required, use UCharacter.isLegal()
on the return value. If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not found the incomplete
character will be returnedsource
- UTF-16 chars string bufferoffset16
- UTF-16 offset to the start of the character.bounds32()
.IndexOutOfBoundsException
- Thrown if offset16 is out of bounds.public static int getCharCount(int char32)
isLegal()
on char32 before calling.char32
- The input codepoint.public static int bounds(String source, int offset16)
source
- Text to analyseoffset16
- UTF-16 offsetIndexOutOfBoundsException
- If offset16 is out of bounds.public static int bounds(StringBuffer source, int offset16)
source
- String buffer to analyseoffset16
- UTF16 offsetIndexOutOfBoundsException
- If offset16 is out of bounds.public static int bounds(char[] source, int start, int limit, int offset16)
source
- Char array to analysestart
- Offset to substring in the source array for analyzinglimit
- Offset to substring in the source array for analyzingoffset16
- UTF16 offset relative to startIndexOutOfBoundsException
- If offset16 is not within the range of start and limit.public static boolean isSurrogate(int codePoint)
codePoint
- The input character.
(In ICU 2.1-69 the type of this parameter was char
.)public static boolean isTrailSurrogate(int codePoint)
codePoint
- The input character.
(In ICU 2.1-69 the type of this parameter was char
.)public static boolean isLeadSurrogate(int codePoint)
codePoint
- The input character.
(In ICU 2.1-69 the type of this parameter was char
.)public static char getLeadSurrogate(int char32)
isLegal()
on char32
before calling.char32
- The input character.public static char getTrailSurrogate(int char32)
isLegal()
on char32
before calling.char32
- The input character.public static String valueOf(int char32)
UCharacter.isLegal(int)
on char32 before calling.char32
- The input character.IllegalArgumentException
- Thrown if char32 is a invalid codepoint.public static String valueOf(String source, int offset16)
UCharacter.isLegal(int)
on the
codepoint at offset16 before calling. The result returned will be a newly created String
obtained by calling source.substring(..) with the appropriate indexes.source
- The input string.offset16
- The UTF16 index to the codepoint in sourcepublic static String valueOf(StringBuffer source, int offset16)
UCharacter.isLegal(int)
on
the codepoint at offset16 before calling. The result returned will be a newly created String
obtained by calling source.substring(..) with the appropriate indexes.source
- The input string buffer.offset16
- The UTF16 index to the codepoint in sourcepublic static String valueOf(char[] source, int start, int limit, int offset16)
UCharacter.isLegal(int)
on the codepoint at
offset16 before calling. The result returned will be a newly created String containing the
relevant characters.source
- The input char array.start
- Start index of the subarraylimit
- End index of the subarrayoffset16
- The UTF16 index to the codepoint in source relative to startpublic static int findOffsetFromCodePoint(String source, int offset32)
class description
for notes on roundtripping.source
- The UTF-16 stringoffset32
- UTF-32 offsetIndexOutOfBoundsException
- If offset32 is out of bounds.public static int findOffsetFromCodePoint(StringBuffer source, int offset32)
class description
for notes on roundtripping.source
- The UTF-16 string bufferoffset32
- UTF-32 offsetIndexOutOfBoundsException
- If offset32 is out of bounds.public static int findOffsetFromCodePoint(char[] source, int start, int limit, int offset32)
class description
for notes on roundtripping.source
- The UTF-16 char array whose substring is to be analysedstart
- Offset of the substring to be analysedlimit
- Offset of the substring to be analysedoffset32
- UTF-32 offset relative to startIndexOutOfBoundsException
- If offset32 is out of bounds.public static int findCodePointOffset(String source, int offset16)
class description
for
notes on roundtripping.To find the UTF-32 length of a string, use:
len32 = countCodePoint(source, source.length());
source
- Text to analyseoffset16
- UTF-16 offset < source text length.IndexOutOfBoundsException
- If offset16 is out of bounds.public static int findCodePointOffset(StringBuffer source, int offset16)
class description
for notes on
roundtripping.To find the UTF-32 length of a string, use:
len32 = countCodePoint(source);
source
- Text to analyseoffset16
- UTF-16 offset < source text length.IndexOutOfBoundsException
- If offset16 is out of bounds.public static int findCodePointOffset(char[] source, int start, int limit, int offset16)
class description
for notes on
roundtripping.To find the UTF-32 length of a substring, use:
len32 = countCodePoint(source, start, limit);
source
- Text to analysestart
- Offset of the substringlimit
- Offset of the substringoffset16
- UTF-16 relative to startIndexOutOfBoundsException
- If offset16 is not within the range of start and limit.public static StringBuffer append(StringBuffer target, int char32)
UCharacter.isLegal(int)
on char32 before
calling.target
- The buffer to append tochar32
- Value to append.IllegalArgumentException
- Thrown when char32 does not lie within the range of the Unicode codepointspublic static StringBuffer appendCodePoint(StringBuffer target, int cp)
target
- The buffer to append tocp
- The code point to appendIllegalArgumentException
- If cp is not a valid code pointpublic static int append(char[] target, int limit, int char32)
target
- Char array to be append with the new code pointlimit
- UTF16 offset which the codepoint will be appended.char32
- Code point to be appendedIllegalArgumentException
- Thrown if there is not enough space for the append, or when char32 does not
lie within the range of the Unicode codepoints.public static int countCodePoint(String source)
source
- UTF16 stringpublic static int countCodePoint(StringBuffer source)
source
- UTF16 string bufferpublic static int countCodePoint(char[] source, int start, int limit)
source
- UTF16 char arraystart
- Offset of the substringlimit
- Offset of the substringIndexOutOfBoundsException
- If start and limit are not valid.public static void setCharAt(StringBuffer target, int offset16, int char32)
target
- Stringbufferoffset16
- UTF16 position to insert intochar32
- Code pointpublic static int setCharAt(char[] target, int limit, int offset16, int char32)
target
- char arraylimit
- numbers of valid chars in target, different from target.length. limit counts the
number of chars in target that represents a string, not the size of array target.offset16
- UTF16 position to insert intochar32
- code pointIndexOutOfBoundsException
- if offset16 is out of rangepublic static int moveCodePointOffset(String source, int offset16, int shift32)
source
- stringoffset16
- UTF16 position to shiftshift32
- number of codepoints to shiftIndexOutOfBoundsException
- if the new offset16 is out of bounds.public static int moveCodePointOffset(StringBuffer source, int offset16, int shift32)
source
- String bufferoffset16
- UTF16 position to shiftshift32
- Number of codepoints to shiftIndexOutOfBoundsException
- If the new offset16 is out of bounds.public static int moveCodePointOffset(char[] source, int start, int limit, int offset16, int shift32)
source
- Char arraystart
- Position of the subarray to be performed onlimit
- Position of the subarray to be performed onoffset16
- UTF16 position to shift relative to startshift32
- Number of codepoints to shiftIndexOutOfBoundsException
- If the new offset16 is out of bounds with respect to the subarray or the
subarray bounds are out of range.public static StringBuffer insert(StringBuffer target, int offset16, int char32)
The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.
The offset argument must be greater than or equal to 0, and less than or equal to the length of source.
target
- String buffer to insert tooffset16
- Offset which char32 will be inserted inchar32
- Codepoint to be insertedIndexOutOfBoundsException
- Thrown if offset16 is invalid.public static int insert(char[] target, int limit, int offset16, int char32)
The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.
The offset argument must be greater than or equal to 0, and less than or equal to the limit.
target
- Char array to insert tolimit
- End index of the char array, limit <= target.lengthoffset16
- Offset which char32 will be inserted inchar32
- Codepoint to be insertedIndexOutOfBoundsException
- Thrown if offset16 is invalid.public static StringBuffer delete(StringBuffer target, int offset16)
target
- String buffer to remove codepoint fromoffset16
- Offset which the codepoint will be removedIndexOutOfBoundsException
- Thrown if offset16 is invalid.public static int delete(char[] target, int limit, int offset16)
target
- String buffer to remove codepoint fromlimit
- End index of the char array, limit <= target.lengthoffset16
- Offset which the codepoint will be removedIndexOutOfBoundsException
- Thrown if offset16 is invalid.public static int indexOf(String source, int char32)
i
such that
UTF16.charAt(source, i) ==
char32
is true.
If no such character occurs in this string, then -1 is returned.
Examples:
UTF16.indexOf("abc", 'a') returns 0
UTF16.indexOf("abc𐀀", 0x10000) returns 3
UTF16.indexOf("abc𐀀", 0xd800) returns -1
source
- UTF16 format Unicode string that will be searchedchar32
- Codepoint to search forpublic static int indexOf(String source, String str)
If no such string str occurs in this source, then -1 is returned.
Examples:
UTF16.indexOf("abc", "ab") returns 0
UTF16.indexOf("abc𐀀", "𐀀") returns 3
UTF16.indexOf("abc𐀀", "?") returns -1
source
- UTF16 format Unicode string that will be searchedstr
- UTF16 format Unicode string to search forpublic static int indexOf(String source, int char32, int fromIndex)
If no such character occurs in this string, then -1 is returned.
Examples:
UTF16.indexOf("abc", 'a', 1) returns -1
UTF16.indexOf("abc𐀀", 0x10000, 1) returns 3
UTF16.indexOf("abc𐀀", 0xd800, 1) returns -1
source
- UTF16 format Unicode string that will be searchedchar32
- Codepoint to search forfromIndex
- The index to start the search from.public static int indexOf(String source, String str, int fromIndex)
If no such string str occurs in this source, then -1 is returned.
Examples:
UTF16.indexOf("abc", "ab", 0) returns 0
UTF16.indexOf("abc𐀀", "𐀀", 0) returns 3
UTF16.indexOf("abc𐀀", "𐀀", 2) returns 3
UTF16.indexOf("abc𐀀", "?", 0) returns -1
source
- UTF16 format Unicode string that will be searchedstr
- UTF16 format Unicode string to search forfromIndex
- The index to start the search from.public static int lastIndexOf(String source, int char32)
Examples:
UTF16.lastIndexOf("abc", 'a') returns 0
UTF16.lastIndexOf("abc𐀀", 0x10000) returns 3
UTF16.lastIndexOf("abc𐀀", 0xd800) returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.source
- UTF16 format Unicode string that will be searchedchar32
- Codepoint to search forpublic static int lastIndexOf(String source, String str)
Examples:
UTF16.lastIndexOf("abc", "a") returns 0
UTF16.lastIndexOf("abc𐀀", "𐀀") returns 3
UTF16.lastIndexOf("abc𐀀", "?") returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.source
- UTF16 format Unicode string that will be searchedstr
- UTF16 format Unicode string to search forpublic static int lastIndexOf(String source, int char32, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex.
This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character.
source is searched backwards starting at the last character starting at the specified index.
Examples:
UTF16.lastIndexOf("abc", 'c', 2) returns 2
UTF16.lastIndexOf("abc", 'c', 1) returns -1
UTF16.lastIndexOf("abc𐀀", 0x10000, 5) returns 3
UTF16.lastIndexOf("abc𐀀", 0x10000, 3) returns 3
UTF16.lastIndexOf("abc𐀀", 0xd800) returns -1
source
- UTF16 format Unicode string that will be searchedchar32
- Codepoint to search forfromIndex
- the index to start the search from. There is no restriction on the value of
fromIndex. If it is greater than or equal to the length of this string, it has the
same effect as if it were equal to one less than the length of this string: this
entire string may be searched. If it is negative, it has the same effect as if it
were -1: -1 is returned.public static int lastIndexOf(String source, String str, int fromIndex)
Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex.
This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str.
See example below.
Examples:
UTF16.lastIndexOf("abc", "c", 2) returns 2
UTF16.lastIndexOf("abc", "c", 1) returns -1
UTF16.lastIndexOf("abc𐀀", "𐀀", 5) returns 3
UTF16.lastIndexOf("abc𐀀", "𐀀", 3) returns 3
UTF16.lastIndexOf("abc𐀀", "?", 4) returns -1
source is searched backwards starting at the last character.
Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.source
- UTF16 format Unicode string that will be searchedstr
- UTF16 format Unicode string to search forfromIndex
- the index to start the search from. There is no restriction on the value of
fromIndex. If it is greater than or equal to the length of this string, it has the
same effect as if it were equal to one less than the length of this string: this
entire string may be searched. If it is negative, it has the same effect as if it
were -1: -1 is returned.public static String replace(String source, int oldChar32, int newChar32)
Examples:
UTF16.replace("mesquite in your cellar", 'e', 'o');
returns "mosquito in your collar"
UTF16.replace("JonL", 'q', 'x');
returns "JonL" (no change)
UTF16.replace("Supplementary character 𐀀", 0x10000, '!');
returns "Supplementary character !"
UTF16.replace("Supplementary character 𐀀", 0xd800, '!');
returns "Supplementary character 𐀀"
source
- UTF16 format Unicode string which the codepoint replacements will be based on.oldChar32
- Non-zero old codepoint to be replaced.newChar32
- The new codepoint to replace oldChar32public static String replace(String source, String oldStr, String newStr)
Examples:
UTF16.replace("mesquite in your cellar", "e", "o");
returns "mosquito in your collar"
UTF16.replace("mesquite in your cellar", "mesquite", "cat");
returns "cat in your cellar"
UTF16.replace("JonL", "q", "x");
returns "JonL" (no change)
UTF16.replace("Supplementary character 𐀀", "𐀀", '!');
returns "Supplementary character !"
UTF16.replace("Supplementary character 𐀀", "?", '!');
returns "Supplementary character 𐀀"
source
- UTF16 format Unicode string which the replacements will be based on.oldStr
- Non-zero-length string to be replaced.newStr
- The new string to replace oldStrpublic static StringBuffer reverse(StringBuffer source)
Examples:
UTF16.reverse(new StringBuffer( "Supplementary characters 𐀀𐐁"))
returns "𐐁𐀀 sretcarahc yratnemelppuS".
source
- The source StringBuffer that contains UTF16 format Unicode string to be reversedpublic static boolean hasMoreCodePointsThan(String source, int number)
source
- The input string.number
- The number of code points in the string is compared against the 'number'
parameter.public static boolean hasMoreCodePointsThan(char[] source, int start, int limit, int number)
source
- Array of UTF-16 charsstart
- Offset to substring in the source array for analyzinglimit
- Offset to substring in the source array for analyzingnumber
- The number of code points in the string is compared against the 'number'
parameter.IndexOutOfBoundsException
- Thrown when limit < startpublic static boolean hasMoreCodePointsThan(StringBuffer source, int number)
source
- The input string buffer.number
- The number of code points in the string buffer is compared against the 'number'
parameter.public static String newString(int[] codePoints, int offset, int count)
codePoints
- The code arrayoffset
- The start of the text in the code point arraycount
- The number of code pointsIllegalArgumentException
- If an invalid code point is encounteredIndexOutOfBoundsException
- If the offset or count are out of bounds.public static int getSingleCodePoint(CharSequence s)
s
- to testpublic static int compareCodePoint(int codePoint, CharSequence s)
sc = new StringComparator(true,false,0); fast = UTF16.compareCodePoint(codePoint, charSequence) slower = sc.compare(UTF16.valueOf(codePoint), charSequence == null ? "" : charSequence.toString())then
Integer.signum(fast) == Integer.signum(slower)
codePoint
- to tests
- to testCopyright © 2016 Unicode, Inc. and others.