public interface UForwardCharacterIterator
Characters can be accessed in two ways: as code units or as
code points.
Unicode code points are 21-bit integers and are the scalar values
of Unicode characters. ICU uses the type int
for them.
Unicode code units are the storage units of a given
Unicode/UCS Transformation Format (a character encoding scheme).
With UTF-16, all code points can be represented with either one
or two code units ("surrogates").
String storage is typically based on code units, while properties
of characters are typically determined using code point values.
Some processes may be designed to work with sequences of code units,
or it may be known that all characters that are important to an
algorithm can be represented with single code units.
Other processes will need to use the code point access functions.
ForwardCharacterIterator provides next() to access
a code unit and advance an internal position into the text object,
similar to a return text[position++]
.
It provides nextCodePoint() to access a code point and advance an internal
position.
nextCodePoint() assumes that the current position is that of
the beginning of a code point, i.e., of its first code unit.
After nextCodePoint(), this will be true again.
In general, access to code units and code points in the same
iteration loop should not be mixed. In UTF-16, if the current position
is on a second code unit (Low Surrogate), then only that code unit
is returned even by nextCodePoint().
Usage:
public void function1(UForwardCharacterIterator it) {
int c;
while((c=it.next())!=UForwardCharacterIterator.DONE) {
// use c
}
}
Modifier and Type | Field and Description |
---|---|
static int |
DONE
Indicator that we have reached the ends of the UTF16 text.
|
Modifier and Type | Method and Description |
---|---|
int |
next()
Returns the UTF16 code unit at index, and increments to the next
code unit (post-increment semantics).
|
int |
nextCodePoint()
Returns the code point at index, and increments to the next code
point (post-increment semantics).
|
static final int DONE
int next()
int nextCodePoint()
next()
. Otherwise the iterator is incremented past
the surrogate pair, and the code point represented by the pair
is returned.Copyright © 2016 Unicode, Inc. and others.