ICU 76.1 76.1
|
SearchIterator
is an abstract base class that provides methods to search for a pattern within a text string.
More...
#include <search.h>
Public Member Functions | |
SearchIterator (const SearchIterator &other) | |
Copy constructor that creates a SearchIterator instance with the same behavior, and iterating over the same text. | |
virtual | ~SearchIterator () |
Destructor. | |
virtual void | setOffset (int32_t position, UErrorCode &status)=0 |
Sets the index to point to the given position, and clears any state that's affected. | |
virtual int32_t | getOffset () const =0 |
Return the current index in the text being searched. | |
void | setAttribute (USearchAttribute attribute, USearchAttributeValue value, UErrorCode &status) |
Sets the text searching attributes located in the enum USearchAttribute with values from the enum USearchAttributeValue. | |
USearchAttributeValue | getAttribute (USearchAttribute attribute) const |
Gets the text searching attributes. | |
int32_t | getMatchedStart () const |
Returns the index to the match in the text string that was searched. | |
int32_t | getMatchedLength () const |
Returns the length of text in the string which matches the search pattern. | |
void | getMatchedText (UnicodeString &result) const |
Returns the text that was matched by the most recent call to first , next , previous , or last . | |
void | setBreakIterator (BreakIterator *breakiter, UErrorCode &status) |
Set the BreakIterator that will be used to restrict the points at which matches are detected. | |
const BreakIterator * | getBreakIterator () const |
Returns the BreakIterator that is used to restrict the points at which matches are detected. | |
virtual void | setText (const UnicodeString &text, UErrorCode &status) |
Set the string text to be searched. | |
virtual void | setText (CharacterIterator &text, UErrorCode &status) |
Set the string text to be searched. | |
const UnicodeString & | getText () const |
Return the string text to be searched. | |
virtual bool | operator== (const SearchIterator &that) const |
Equality operator. | |
bool | operator!= (const SearchIterator &that) const |
Not-equal operator. | |
virtual SearchIterator * | safeClone () const =0 |
Returns a copy of SearchIterator with the same behavior, and iterating over the same text, as this one. | |
int32_t | first (UErrorCode &status) |
Returns the first index at which the string text matches the search pattern. | |
int32_t | following (int32_t position, UErrorCode &status) |
Returns the first index equal or greater than position at which the string text matches the search pattern. | |
int32_t | last (UErrorCode &status) |
Returns the last index in the target text at which it matches the search pattern. | |
int32_t | preceding (int32_t position, UErrorCode &status) |
Returns the first index less than position at which the string text matches the search pattern. | |
int32_t | next (UErrorCode &status) |
Returns the index of the next point at which the text matches the search pattern, starting from the current position The iterator is adjusted so that its current index (as returned by getOffset ) is the match position if one was found. | |
int32_t | previous (UErrorCode &status) |
Returns the index of the previous point at which the string text matches the search pattern, starting at the current position. | |
virtual void | reset () |
Resets the iteration. | |
Public Member Functions inherited from icu::UObject | |
virtual | ~UObject () |
Destructor. | |
virtual UClassID | getDynamicClassID () const |
ICU4C "poor man's RTTI", returns a UClassID for the actual ICU class. | |
Protected Member Functions | |
SearchIterator () | |
Default constructor. | |
SearchIterator (const UnicodeString &text, BreakIterator *breakiter=nullptr) | |
Constructor for use by subclasses. | |
SearchIterator (CharacterIterator &text, BreakIterator *breakiter=nullptr) | |
Constructor for use by subclasses. | |
SearchIterator & | operator= (const SearchIterator &that) |
Assignment operator. | |
virtual int32_t | handleNext (int32_t position, UErrorCode &status)=0 |
Abstract method which subclasses override to provide the mechanism for finding the next match in the target text. | |
virtual int32_t | handlePrev (int32_t position, UErrorCode &status)=0 |
Abstract method which subclasses override to provide the mechanism for finding the previous match in the target text. | |
virtual void | setMatchLength (int32_t length) |
Sets the length of the currently matched string in the text string to be searched. | |
virtual void | setMatchStart (int32_t position) |
Sets the offset of the currently matched string in the text string to be searched. | |
void | setMatchNotFound () |
sets match not found | |
Protected Attributes | |
USearch * | m_search_ |
C search data struct. | |
BreakIterator * | m_breakiterator_ |
Break iterator. | |
UnicodeString | m_text_ |
Unicode string version of the search text. | |
SearchIterator
is an abstract base class that provides methods to search for a pattern within a text string.
Instances of SearchIterator
maintain a current position and scans over the target text, returning the indices the pattern is matched and the length of each match.
SearchIterator
defines a protocol for text searching. Subclasses provide concrete implementations of various search algorithms. For example, StringSearch
implements language-sensitive pattern matching based on the comparison rules defined in a RuleBasedCollator
object.
Other options for searching includes using a BreakIterator to restrict the points at which matches are detected.
SearchIterator
provides an API that is similar to that of other text iteration classes such as BreakIterator
. Using this class, it is easy to scan through text looking for all occurrences of a given pattern. The following example uses a StringSearch
object to find all instances of "fox" in the target string. Any other subclass of SearchIterator
can be used in an identical manner.
UnicodeString target("The quick brown fox jumped over the lazy fox");
UnicodeString pattern("fox");
SearchIterator *iter = new StringSearch(pattern, target);
UErrorCode error = U_ZERO_ERROR;
for (int pos = iter->first(error); pos != USEARCH_DONE;
pos = iter->next(error)) {
printf("Found match at %d pos, length is %d\n", pos, iter.getMatchedLength());
}
icu::SearchIterator::SearchIterator | ( | const SearchIterator & | other | ) |
Copy constructor that creates a SearchIterator instance with the same behavior, and iterating over the same text.
other | the SearchIterator instance to be copied. |
|
virtual |
|
protected |
|
protected |
Constructor for use by subclasses.
text | The target text to be searched. |
breakiter | A BreakIterator that is used to restrict the points at which matches are detected. If handleNext or handlePrev finds a match, but the match's start or end index is not a boundary as determined by the BreakIterator , the match is rejected and handleNext or handlePrev is called again. If this parameter is nullptr , no break detection is attempted. |
|
protected |
Constructor for use by subclasses.
Note: No parsing of the text within the CharacterIterator
will be done during searching for this version. The block of text in CharacterIterator
will be used as it is.
text | The target text to be searched. |
breakiter | A BreakIterator that is used to restrict the points at which matches are detected. If handleNext or handlePrev finds a match, but the match's start or end index is not a boundary as determined by the BreakIterator , the match is rejected and handleNext or handlePrev is called again. If this parameter is nullptr , no break detection is attempted. |
int32_t icu::SearchIterator::first | ( | UErrorCode & | status | ) |
Returns the first index at which the string text matches the search pattern.
The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to the index USEARCH_DONE
status | for errors if it occurs |
USEARCH_DONE
if there are no matches. int32_t icu::SearchIterator::following | ( | int32_t | position, |
UErrorCode & | status | ||
) |
Returns the first index equal or greater than position
at which the string text matches the search pattern.
The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to the index USEARCH_DONE
.
position | where search if to start from. If position is less than or greater than the text range for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned |
status | for errors if it occurs |
position
, or USEARCH_DONE
if there are no matches. USearchAttributeValue icu::SearchIterator::getAttribute | ( | USearchAttribute | attribute | ) | const |
Gets the text searching attributes.
attribute | text attribute (enum USearchAttribute) to be retrieve |
const BreakIterator * icu::SearchIterator::getBreakIterator | ( | ) | const |
Returns the BreakIterator that is used to restrict the points at which matches are detected.
This will be the same object that was passed to the constructor or to setBreakIterator
. Note that nullptr
is a legal value; it means that break detection should not be attempted.
int32_t icu::SearchIterator::getMatchedLength | ( | ) | const |
Returns the length of text in the string which matches the search pattern.
This call returns a valid result only after a successful call to first
, next
, previous
, or last
. Just after construction, or after a searching method returns USEARCH_DONE
, this method will return 0.
int32_t icu::SearchIterator::getMatchedStart | ( | ) | const |
Returns the index to the match in the text string that was searched.
This call returns a valid result only after a successful call to first
, next
, previous
, or last
. Just after construction, or after a searching method returns USEARCH_DONE
, this method will return USEARCH_DONE
.
Use getMatchedLength to get the matched string length.
void icu::SearchIterator::getMatchedText | ( | UnicodeString & | result | ) | const |
Returns the text that was matched by the most recent call to first
, next
, previous
, or last
.
If the iterator is not pointing at a valid match (e.g. just after construction or after USEARCH_DONE
has been returned, returns an empty string.
result | stores the matched string or an empty string if a match is not found. |
Return the current index in the text being searched.
If the iteration has gone past the end of the text (or past the beginning for a backwards search), USEARCH_DONE is returned.
Implemented in icu::StringSearch.
const UnicodeString & icu::SearchIterator::getText | ( | ) | const |
|
protectedpure virtual |
Abstract method which subclasses override to provide the mechanism for finding the next match in the target text.
This allows different subclasses to provide different search algorithms.
If a match is found, the implementation should return the index at which the match starts and should call setMatchLength
with the number of characters in the target text that make up the match. If no match is found, the method should return USEARCH_DONE.
position | The index in the target text at which the search should start. |
status | for error codes if it occurs. |
Implemented in icu::StringSearch.
|
protectedpure virtual |
Abstract method which subclasses override to provide the mechanism for finding the previous match in the target text.
This allows different subclasses to provide different search algorithms.
If a match is found, the implementation should return the index at which the match starts and should call setMatchLength
with the number of characters in the target text that make up the match. If no match is found, the method should return USEARCH_DONE.
position | The index in the target text at which the search should start. |
status | for error codes if it occurs. |
Implemented in icu::StringSearch.
int32_t icu::SearchIterator::last | ( | UErrorCode & | status | ) |
Returns the last index in the target text at which it matches the search pattern.
The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to the index USEARCH_DONE.
status | for errors if it occurs |
USEARCH_DONE
if there are no matches. int32_t icu::SearchIterator::next | ( | UErrorCode & | status | ) |
Returns the index of the next point at which the text matches the search pattern, starting from the current position The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found.
If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to a position after the end of the text string.
status | for errors if it occurs |
USEARCH_DONE
if there are no more matches.
|
inline |
Not-equal operator.
that | SearchIterator instance to be compared. |
Definition at line 569 of file search.h.
References icu::operator==().
|
protected |
Assignment operator.
Sets this iterator to have the same behavior, and iterate over the same text, as the one passed in.
that | instance to be copied. |
|
virtual |
Equality operator.
that | SearchIterator instance to be compared. |
Reimplemented in icu::StringSearch.
int32_t icu::SearchIterator::preceding | ( | int32_t | position, |
UErrorCode & | status | ||
) |
Returns the first index less than position
at which the string text matches the search pattern.
The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to the index USEARCH_DONE
When USEARCH_OVERLAP
option is off, the last index of the result match is always less than position
. When USERARCH_OVERLAP
is on, the result match may span across position
.
position | where search is to start from. If position is less than or greater than the text range for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned |
status | for errors if it occurs |
position
, or USEARCH_DONE
if there are no matches. int32_t icu::SearchIterator::previous | ( | UErrorCode & | status | ) |
Returns the index of the previous point at which the string text matches the search pattern, starting at the current position.
The iterator is adjusted so that its current index (as returned by getOffset
) is the match position if one was found. If a match is not found, USEARCH_DONE
will be returned and the iterator will be adjusted to the index USEARCH_DONE
status | for errors if it occurs |
USEARCH_DONE
if there are no more matches. Resets the iteration.
Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string.
Reimplemented in icu::StringSearch.
|
pure virtual |
Returns a copy of SearchIterator with the same behavior, and iterating over the same text, as this one.
Note that all data will be replicated, except for the text string to be searched.
Implemented in icu::StringSearch.
void icu::SearchIterator::setAttribute | ( | USearchAttribute | attribute, |
USearchAttributeValue | value, | ||
UErrorCode & | status | ||
) |
Sets the text searching attributes located in the enum USearchAttribute with values from the enum USearchAttributeValue.
USEARCH_DEFAULT can be used for all attributes for resetting.
attribute | text attribute (enum USearchAttribute) to be set |
value | text attribute value |
status | for errors if it occurs |
void icu::SearchIterator::setBreakIterator | ( | BreakIterator * | breakiter, |
UErrorCode & | status | ||
) |
Set the BreakIterator that will be used to restrict the points at which matches are detected.
The user is responsible for deleting the breakiterator.
breakiter | A BreakIterator that will be used to restrict the points at which matches are detected. If a match is found, but the match's start or end index is not a boundary as determined by the BreakIterator , the match will be rejected and another will be searched for. If this parameter is nullptr , no break detection is attempted. |
status | for errors if it occurs |
Sets the length of the currently matched string in the text string to be searched.
Subclasses' handleNext
and handlePrev
methods should call this when they find a match in the target text.
length | length of the matched text. |
Sets the offset of the currently matched string in the text string to be searched.
Subclasses' handleNext
and handlePrev
methods should call this when they find a match in the target text.
position | start offset of the matched text. |
|
pure virtual |
Sets the index to point to the given position, and clears any state that's affected.
This method takes the argument index and sets the position in the text string accordingly without checking if the index is pointing to a valid starting point to begin searching.
position | within the text to be set. If position is less than or greater than the text range for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned |
status | for errors if it occurs |
Implemented in icu::StringSearch.
|
virtual |
Set the string text to be searched.
Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text.
Note: No parsing of the text within the CharacterIterator
will be done during searching for this version. The block of text in CharacterIterator
will be used as it is. The user is responsible for deleting the text.
text | string iterator to be searched. |
status | for errors if any. If the text length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned. |
Reimplemented in icu::StringSearch.
|
virtual |
Set the string text to be searched.
Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text. The user is responsible for deleting the text.
text | string to be searched. |
status | for errors. If the text length is 0, an U_ILLEGAL_ARGUMENT_ERROR is returned. |
Reimplemented in icu::StringSearch.
|
protected |
|
protected |
|
protected |