ICU 76.1 76.1
Loading...
Searching...
No Matches
Public Types | Public Member Functions | Protected Member Functions | Protected Attributes
icu::CharacterIterator Class Referenceabstract

Abstract class that defines an API for iteration on text objects. More...

#include <chariter.h>

Inheritance diagram for icu::CharacterIterator:
icu::ForwardCharacterIterator icu::UObject icu::UMemory icu::UCharCharacterIterator icu::StringCharacterIterator

Public Types

enum  EOrigin { kStart , kCurrent , kEnd }
 Origin enumeration for the move() and move32() functions. More...
 
- Public Types inherited from icu::ForwardCharacterIterator
enum  { DONE = 0xffff }
 Value returned by most of ForwardCharacterIterator's functions when the iterator has reached the limits of its iteration. More...
 

Public Member Functions

virtual ~CharacterIterator ()
 Destructor.
 
virtual CharacterIteratorclone () const =0
 Returns a pointer to a new CharacterIterator of the same concrete class as this one, and referring to the same character in the same text-storage object as this one.
 
virtual char16_t first ()=0
 Sets the iterator to refer to the first code unit in its iteration range, and returns that code unit.
 
virtual char16_t firstPostInc ()
 Sets the iterator to refer to the first code unit in its iteration range, returns that code unit, and moves the position to the second code unit.
 
virtual UChar32 first32 ()=0
 Sets the iterator to refer to the first code point in its iteration range, and returns that code unit, This can be used to begin an iteration with next32().
 
virtual UChar32 first32PostInc ()
 Sets the iterator to refer to the first code point in its iteration range, returns that code point, and moves the position to the second code point.
 
int32_t setToStart ()
 Sets the iterator to refer to the first code unit or code point in its iteration range.
 
virtual char16_t last ()=0
 Sets the iterator to refer to the last code unit in its iteration range, and returns that code unit.
 
virtual UChar32 last32 ()=0
 Sets the iterator to refer to the last code point in its iteration range, and returns that code unit.
 
int32_t setToEnd ()
 Sets the iterator to the end of its iteration range, just behind the last code unit or code point.
 
virtual char16_t setIndex (int32_t position)=0
 Sets the iterator to refer to the "position"-th code unit in the text-storage object the iterator refers to, and returns that code unit.
 
virtual UChar32 setIndex32 (int32_t position)=0
 Sets the iterator to refer to the beginning of the code point that contains the "position"-th code unit in the text-storage object the iterator refers to, and returns that code point.
 
virtual char16_t current () const =0
 Returns the code unit the iterator currently refers to.
 
virtual UChar32 current32 () const =0
 Returns the code point the iterator currently refers to.
 
virtual char16_t next ()=0
 Advances to the next code unit in the iteration range (toward endIndex()), and returns that code unit.
 
virtual UChar32 next32 ()=0
 Advances to the next code point in the iteration range (toward endIndex()), and returns that code point.
 
virtual char16_t previous ()=0
 Advances to the previous code unit in the iteration range (toward startIndex()), and returns that code unit.
 
virtual UChar32 previous32 ()=0
 Advances to the previous code point in the iteration range (toward startIndex()), and returns that code point.
 
virtual UBool hasPrevious ()=0
 Returns false if there are no more code units or code points before the current position in the iteration range.
 
int32_t startIndex () const
 Returns the numeric index in the underlying text-storage object of the character returned by first().
 
int32_t endIndex () const
 Returns the numeric index in the underlying text-storage object of the position immediately BEYOND the character returned by last().
 
int32_t getIndex () const
 Returns the numeric index in the underlying text-storage object of the character the iterator currently refers to (i.e., the character returned by current()).
 
int32_t getLength () const
 Returns the length of the entire text in the underlying text-storage object.
 
virtual int32_t move (int32_t delta, EOrigin origin)=0
 Moves the current position relative to the start or end of the iteration range, or relative to the current position itself.
 
virtual int32_t move32 (int32_t delta, EOrigin origin)=0
 Moves the current position relative to the start or end of the iteration range, or relative to the current position itself.
 
virtual void getText (UnicodeString &result)=0
 Copies the text under iteration into the UnicodeString referred to by "result".
 
- Public Member Functions inherited from icu::ForwardCharacterIterator
virtual ~ForwardCharacterIterator ()
 Destructor.
 
virtual bool operator== (const ForwardCharacterIterator &that) const =0
 Returns true when both iterators refer to the same character in the same character-storage object.
 
bool operator!= (const ForwardCharacterIterator &that) const
 Returns true when the iterators refer to different text-storage objects, or to different characters in the same text-storage object.
 
virtual int32_t hashCode () const =0
 Generates a hash code for this iterator.
 
virtual UClassID getDynamicClassID () const override=0
 Returns a UClassID for this ForwardCharacterIterator ("poor man's RTTI").
 
virtual char16_t nextPostInc ()=0
 Gets the current code unit for returning and advances to the next code unit in the iteration range (toward endIndex()).
 
virtual UChar32 next32PostInc ()=0
 Gets the current code point for returning and advances to the next code point in the iteration range (toward endIndex()).
 
virtual UBool hasNext ()=0
 Returns false if there are no more code units or code points at or after the current position in the iteration range.
 
- Public Member Functions inherited from icu::UObject
virtual ~UObject ()
 Destructor.
 

Protected Member Functions

 CharacterIterator ()
 Empty constructor.
 
 CharacterIterator (int32_t length)
 Constructor, just setting the length field in this base class.
 
 CharacterIterator (int32_t length, int32_t position)
 Constructor, just setting the length and position fields in this base class.
 
 CharacterIterator (int32_t length, int32_t textBegin, int32_t textEnd, int32_t position)
 Constructor, just setting the length, start, end, and position fields in this base class.
 
 CharacterIterator (const CharacterIterator &that)
 Copy constructor.
 
CharacterIteratoroperator= (const CharacterIterator &that)
 Assignment operator.
 
- Protected Member Functions inherited from icu::ForwardCharacterIterator
 ForwardCharacterIterator ()
 Default constructor to be overridden in the implementing class.
 
 ForwardCharacterIterator (const ForwardCharacterIterator &other)
 Copy constructor to be overridden in the implementing class.
 
ForwardCharacterIteratoroperator= (const ForwardCharacterIterator &)
 Assignment operator to be overridden in the implementing class.
 

Protected Attributes

int32_t textLength
 Base class text length field.
 
int32_t pos
 Base class field for the current position.
 
int32_t begin
 Base class field for the start of the iteration range.
 
int32_t end
 Base class field for the end of the iteration range.
 

Detailed Description

Abstract class that defines an API for iteration on text objects.

This is an interface for forward and backward iteration and random access into a text object.

The API provides backward compatibility to the Java and older ICU CharacterIterator classes but extends them significantly:

  1. CharacterIterator is now a subclass of ForwardCharacterIterator.
  2. While the old API functions provided forward iteration with "pre-increment" semantics, the new one also provides functions with "post-increment" semantics. They are more efficient and should be the preferred iterator functions for new implementations. The backward iteration always had "pre-decrement" semantics, which are efficient.
  3. Just like ForwardCharacterIterator, it provides access to both code units and code points. Code point access versions are available for the old and the new iteration semantics.
  4. There are new functions for setting and moving the current position without returning a character, for efficiency.

See ForwardCharacterIterator for examples for using the new forward iteration functions. For backward iteration, there is also a hasPrevious() function that can be used analogously to hasNext(). The old functions work as before and are shown below.

Examples for some of the new functions:

Forward iteration with hasNext():

for(it.setToStart(); it.hasNext();) {
c=it.next32PostInc();
// use c
}
}
Abstract class that defines an API for iteration on text objects.
Definition chariter.h:361
"Smart pointer" base class; do not use directly: use LocalPointer etc.
int32_t UChar32
Define UChar32 as a type for single Unicode code points.
Definition umachine.h:427

Forward iteration more similar to loops with the old forward iteration, showing a way to convert simple for() loops:

char16_t c;
for(c=it.firstPostInc(); c!=CharacterIterator::DONE; c=it.nextPostInc()) {
// use c
}
}

Backward iteration with setToEnd() and hasPrevious():

for(it.setToEnd(); it.hasPrevious();) {
c=it.previous32();
// use c
}
}

Backward iteration with a more traditional for() loop:

char16_t c;
for(c=it.last(); c!=CharacterIterator::DONE; c=it.previous()) {
// use c
}
}

Example for random access:

// set to the third code point from the beginning
it.move32(3, CharacterIterator::kStart);
// get a code point from here without moving the position
UChar32 c=it.current32();
// get the position
int32_t pos=it.getIndex();
// get the previous code unit
char16_t u=it.previous();
// move back one more code unit
it.move(-1, CharacterIterator::kCurrent);
// set the position back to where it was
// and read the same code point c and move beyond it
it.setIndex(pos);
if(c!=it.next32PostInc()) {
exit(1); // CharacterIterator inconsistent
}
}
int32_t pos
Base class field for the current position.
Definition chariter.h:680

Examples, especially for the old API:

Function processing characters, in this example simple output

void processChar( char16_t c )
{
cout << " " << c;
}

Traverse the text from start to finish

 
{
for(char16_t c = iter.first(); c != CharacterIterator::DONE; c = iter.next()) {
}
}

Traverse the text backwards, from end to start

{
for(char16_t c = iter.last(); c != CharacterIterator::DONE; c = iter.previous()) {
}
}

Traverse both forward and backward from a given position in the text. Calls to notBoundary() in this example represents some additional stopping criteria.

{
char16_t c;
for (c = iter.setIndex(pos);
c != CharacterIterator::DONE && (Unicode::isLetter(c) || Unicode::isDigit(c));
c = iter.next()) {}
int32_t end = iter.getIndex();
for (c = iter.setIndex(pos);
c != CharacterIterator::DONE && (Unicode::isLetter(c) || Unicode::isDigit(c));
c = iter.previous()) {}
int32_t start = iter.getIndex() + 1;
cout << "start: " << start << " end: " << end << endl;
for (c = iter.setIndex(start); iter.getIndex() < end; c = iter.next() ) {
processChar(c);
}
}
virtual char16_t next()=0
Advances to the next code unit in the iteration range (toward endIndex()), and returns that code unit...
int32_t end
Base class field for the end of the iteration range.
Definition chariter.h:692
int32_t getIndex() const
Returns the numeric index in the underlying text-storage object of the character the iterator current...
Definition chariter.h:721
virtual char16_t setIndex(int32_t position)=0
Sets the iterator to refer to the "position"-th code unit in the text-storage object the iterator ref...

Creating a StringCharacterIterator and calling the test functions

{
cout << endl << "===== CharacterIterator_Example: =====" << endl;
UnicodeString text("Ein kleiner Satz.");
StringCharacterIterator iterator(text);
cout << "----- traverseForward: -----------" << endl;
traverseForward( iterator );
cout << endl << endl << "----- traverseBackward: ----------" << endl;
traverseBackward( iterator );
cout << endl << endl << "----- traverseOut: ---------------" << endl;
traverseOut( iterator, 7 );
cout << endl << endl << "-----" << endl;
}
A concrete subclass of CharacterIterator that iterates over the characters (code units or code points...
Definition schriter.h:48
UnicodeString is a string class that stores Unicode characters directly and provides similar function...
Definition unistr.h:296
Stable:
ICU 2.0

Definition at line 361 of file chariter.h.

Member Enumeration Documentation

◆ EOrigin

Origin enumeration for the move() and move32() functions.

Stable:
ICU 2.0

Definition at line 367 of file chariter.h.

Constructor & Destructor Documentation

◆ ~CharacterIterator()

virtual icu::CharacterIterator::~CharacterIterator ( )
virtual

Destructor.

Stable:
ICU 2.0

◆ CharacterIterator() [1/5]

icu::CharacterIterator::CharacterIterator ( )
protected

Empty constructor.

Stable:
ICU 2.0

◆ CharacterIterator() [2/5]

icu::CharacterIterator::CharacterIterator ( int32_t  length)
protected

Constructor, just setting the length field in this base class.

Stable:
ICU 2.0

◆ CharacterIterator() [3/5]

icu::CharacterIterator::CharacterIterator ( int32_t  length,
int32_t  position 
)
protected

Constructor, just setting the length and position fields in this base class.

Stable:
ICU 2.0

◆ CharacterIterator() [4/5]

icu::CharacterIterator::CharacterIterator ( int32_t  length,
int32_t  textBegin,
int32_t  textEnd,
int32_t  position 
)
protected

Constructor, just setting the length, start, end, and position fields in this base class.

Stable:
ICU 2.0

◆ CharacterIterator() [5/5]

icu::CharacterIterator::CharacterIterator ( const CharacterIterator that)
protected

Copy constructor.

Parameters
thatThe CharacterIterator to be copied
Stable:
ICU 2.0

Member Function Documentation

◆ clone()

virtual CharacterIterator * icu::CharacterIterator::clone ( ) const
pure virtual

Returns a pointer to a new CharacterIterator of the same concrete class as this one, and referring to the same character in the same text-storage object as this one.

The caller is responsible for deleting the new clone.

Returns
a pointer to a new CharacterIterator
Stable:
ICU 2.0

Implemented in icu::StringCharacterIterator, and icu::UCharCharacterIterator.

◆ current()

virtual char16_t icu::CharacterIterator::current ( ) const
pure virtual

Returns the code unit the iterator currently refers to.

Returns
the current code unit.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ current32()

virtual UChar32 icu::CharacterIterator::current32 ( ) const
pure virtual

Returns the code point the iterator currently refers to.


Returns
the current code point.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ endIndex()

int32_t icu::CharacterIterator::endIndex ( ) const
inline

Returns the numeric index in the underlying text-storage object of the position immediately BEYOND the character returned by last().


Returns
the numeric index in the underlying text-storage object of the position immediately BEYOND the character returned by last().
Stable:
ICU 2.0

Definition at line 716 of file chariter.h.

◆ first()

virtual char16_t icu::CharacterIterator::first ( )
pure virtual

Sets the iterator to refer to the first code unit in its iteration range, and returns that code unit.

This can be used to begin an iteration with next().

Returns
the first code unit in its iteration range.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ first32()

virtual UChar32 icu::CharacterIterator::first32 ( )
pure virtual

Sets the iterator to refer to the first code point in its iteration range, and returns that code unit, This can be used to begin an iteration with next32().

Note that an iteration with next32PostInc(), beginning with, e.g., setToStart() or firstPostInc(), is more efficient.

Returns
the first code point in its iteration range.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ first32PostInc()

virtual UChar32 icu::CharacterIterator::first32PostInc ( )
virtual

Sets the iterator to refer to the first code point in its iteration range, returns that code point, and moves the position to the second code point.

This is an alternative to setToStart() for forward iteration with next32PostInc().

Returns
the first code point in its iteration range.
Stable:
ICU 2.0

Reimplemented in icu::UCharCharacterIterator.

◆ firstPostInc()

virtual char16_t icu::CharacterIterator::firstPostInc ( )
virtual

Sets the iterator to refer to the first code unit in its iteration range, returns that code unit, and moves the position to the second code unit.

This is an alternative to setToStart() for forward iteration with nextPostInc().

Returns
the first code unit in its iteration range.
Stable:
ICU 2.0

Reimplemented in icu::UCharCharacterIterator.

◆ getIndex()

int32_t icu::CharacterIterator::getIndex ( ) const
inline

Returns the numeric index in the underlying text-storage object of the character the iterator currently refers to (i.e., the character returned by current()).


Returns
the numeric index in the text-storage object of the character the iterator currently refers to
Stable:
ICU 2.0

Definition at line 721 of file chariter.h.

◆ getLength()

int32_t icu::CharacterIterator::getLength ( ) const
inline

Returns the length of the entire text in the underlying text-storage object.

Returns
the length of the entire text in the text-storage object
Stable:
ICU 2.0

Definition at line 726 of file chariter.h.

◆ getText()

virtual void icu::CharacterIterator::getText ( UnicodeString result)
pure virtual

Copies the text under iteration into the UnicodeString referred to by "result".


Parameters
resultReceives a copy of the text under iteration.
Stable:
ICU 2.0

Implemented in icu::StringCharacterIterator, and icu::UCharCharacterIterator.

◆ hasPrevious()

virtual UBool icu::CharacterIterator::hasPrevious ( )
pure virtual

Returns false if there are no more code units or code points before the current position in the iteration range.

This is used with previous() or previous32() in backward iteration.

Returns
false if there are no more code units or code points before the current position in the iteration range, return true otherwise.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ last()

virtual char16_t icu::CharacterIterator::last ( )
pure virtual

Sets the iterator to refer to the last code unit in its iteration range, and returns that code unit.

This can be used to begin an iteration with previous().

Returns
the last code unit.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ last32()

virtual UChar32 icu::CharacterIterator::last32 ( )
pure virtual

Sets the iterator to refer to the last code point in its iteration range, and returns that code unit.

This can be used to begin an iteration with previous32().

Returns
the last code point.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ move()

virtual int32_t icu::CharacterIterator::move ( int32_t  delta,
EOrigin  origin 
)
pure virtual

Moves the current position relative to the start or end of the iteration range, or relative to the current position itself.

The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta.

Parameters
deltathe position relative to origin. A positive delta means forward; a negative delta means backward.
originOrigin enumeration {kStart, kCurrent, kEnd}
Returns
the new position
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ move32()

virtual int32_t icu::CharacterIterator::move32 ( int32_t  delta,
EOrigin  origin 
)
pure virtual

Moves the current position relative to the start or end of the iteration range, or relative to the current position itself.

The movement is expressed in numbers of code points forward or backward by specifying a positive or negative delta.

Parameters
deltathe position relative to origin. A positive delta means forward; a negative delta means backward.
originOrigin enumeration {kStart, kCurrent, kEnd}
Returns
the new position
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ next()

virtual char16_t icu::CharacterIterator::next ( )
pure virtual

Advances to the next code unit in the iteration range (toward endIndex()), and returns that code unit.

If there are no more code units to return, returns DONE.

Returns
the next code unit.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ next32()

virtual UChar32 icu::CharacterIterator::next32 ( )
pure virtual

Advances to the next code point in the iteration range (toward endIndex()), and returns that code point.

If there are no more code points to return, returns DONE. Note that iteration with "pre-increment" semantics is less efficient than iteration with "post-increment" semantics that is provided by next32PostInc().

Returns
the next code point.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ operator=()

CharacterIterator & icu::CharacterIterator::operator= ( const CharacterIterator that)
protected

Assignment operator.

Sets this CharacterIterator to have the same behavior, as the one passed in.

Parameters
thatThe CharacterIterator passed in.
Returns
the newly set CharacterIterator.
Stable:
ICU 2.0

◆ previous()

virtual char16_t icu::CharacterIterator::previous ( )
pure virtual

Advances to the previous code unit in the iteration range (toward startIndex()), and returns that code unit.

If there are no more code units to return, returns DONE.

Returns
the previous code unit.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ previous32()

virtual UChar32 icu::CharacterIterator::previous32 ( )
pure virtual

Advances to the previous code point in the iteration range (toward startIndex()), and returns that code point.

If there are no more code points to return, returns DONE.

Returns
the previous code point.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ setIndex()

virtual char16_t icu::CharacterIterator::setIndex ( int32_t  position)
pure virtual

Sets the iterator to refer to the "position"-th code unit in the text-storage object the iterator refers to, and returns that code unit.


Parameters
positionthe "position"-th code unit in the text-storage object
Returns
the "position"-th code unit.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ setIndex32()

virtual UChar32 icu::CharacterIterator::setIndex32 ( int32_t  position)
pure virtual

Sets the iterator to refer to the beginning of the code point that contains the "position"-th code unit in the text-storage object the iterator refers to, and returns that code point.

The current position is adjusted to the beginning of the code point (its first code unit).

Parameters
positionthe "position"-th code unit in the text-storage object
Returns
the "position"-th code point.
Stable:
ICU 2.0

Implemented in icu::UCharCharacterIterator.

◆ setToEnd()

int32_t icu::CharacterIterator::setToEnd ( )
inline

Sets the iterator to the end of its iteration range, just behind the last code unit or code point.

This can be used to begin a backward iteration with previous() or previous32().

Returns
the end position of the iteration range
Stable:
ICU 2.0

Definition at line 706 of file chariter.h.

◆ setToStart()

int32_t icu::CharacterIterator::setToStart ( )
inline

Sets the iterator to refer to the first code unit or code point in its iteration range.

This can be used to begin a forward iteration with nextPostInc() or next32PostInc().

Returns
the start position of the iteration range
Stable:
ICU 2.0

Definition at line 701 of file chariter.h.

◆ startIndex()

int32_t icu::CharacterIterator::startIndex ( ) const
inline

Returns the numeric index in the underlying text-storage object of the character returned by first().

Since it's possible to create an iterator that iterates across only part of a text-storage object, this number isn't necessarily 0.

Returns
the numeric index in the underlying text-storage object of the character returned by first().
Stable:
ICU 2.0

Definition at line 711 of file chariter.h.

Field Documentation

◆ begin

int32_t icu::CharacterIterator::begin
protected

Base class field for the start of the iteration range.

Stable:
ICU 2.0

Definition at line 686 of file chariter.h.

◆ end

int32_t icu::CharacterIterator::end
protected

Base class field for the end of the iteration range.

Stable:
ICU 2.0

Definition at line 692 of file chariter.h.

◆ pos

int32_t icu::CharacterIterator::pos
protected

Base class field for the current position.

Stable:
ICU 2.0

Definition at line 680 of file chariter.h.

◆ textLength

int32_t icu::CharacterIterator::textLength
protected

Base class text length field.

Necessary this for correct getText() and hashCode().

Stable:
ICU 2.0

Definition at line 674 of file chariter.h.


The documentation for this class was generated from the following file: