public final class StringSearch extends SearchIterator
SearchIterator
that provides
language-sensitive text searching based on the comparison rules defined
in a RuleBasedCollator
object.
StringSearch ensures that language eccentricity can be
handled, e.g. for the German collator, characters ß and SS will be matched
if case is chosen to be ignored.
See the
"ICU Collation Design Document" for more information.
There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and
end [start, end].
A pattern string P matches a text string S at the offsets [start, end]
if
option 1. Some canonical equivalent of P matches some canonical equivalent of S' option 2. P matches S' and if P starts or ends with a combining mark, there exists no non-ignorable combining mark before or after S? in S respectively.Option 2. is the default.
This search has APIs similar to that of other text iteration mechanisms
such as the break iterators in BreakIterator
. Using these
APIs, it is easy to scan through text looking for all occurrences of
a given pattern. This search iterator allows changing of direction by
calling a reset()
followed by a SearchIterator.next()
or SearchIterator.previous()
.
Though a direction change can occur without calling reset()
first,
this operation comes with some speed penalty.
Match results in the forward direction will match the result matches in
the backwards direction in the reverse order
SearchIterator
provides APIs to specify the starting position
within the text string to be searched, e.g. setIndex
,
preceding
and following
.
Since the starting position will be set as it is specified, please take note that
there are some danger points at which the search may render incorrect
results:
A BreakIterator
can be used if only matches at logical breaks are desired.
Using a BreakIterator
will only give you results that exactly matches the
boundaries given by the BreakIterator
. For instance the pattern "e" will
not be found in the string "é" if a character break iterator is used.
Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where mutually exclusive matches only produces the result of 0.
Options are also provided to implement "asymmetric search" as described in UTS #10 Unicode Collation Algorithm, specifically the ElementComparisonType values.
Though collator attributes will be taken into consideration while
performing matches, there are no APIs here for setting and getting the
attributes. These attributes can be set by getting the collator
from getCollator()
and using the APIs in RuleBasedCollator
.
Lastly to update StringSearch to the new collator attributes,
reset()
has to be called.
Restriction:
Currently there are no composite characters that consists of a
character with combining class > 0 before a character with combining
class == 0. However, if such a character exists in the future,
StringSearch does not guarantee the results for option 1.
Consult the SearchIterator
documentation for information on
and examples of how to use instances of this class to implement text
searching.
Note, StringSearch is not to be subclassed.
SearchIterator
,
RuleBasedCollator
SearchIterator.ElementComparisonType
breakIterator, DONE, matchLength, targetText
Constructor and Description |
---|
StringSearch(String pattern,
CharacterIterator target,
Locale locale)
Initializes the iterator to use the language-specific rules and
break iterator rules defined in the argument locale to search for
argument pattern in the argument target text.
|
StringSearch(String pattern,
CharacterIterator target,
RuleBasedCollator collator)
Initializes the iterator to use the language-specific rules defined in
the argument collator to search for argument pattern in the argument
target text.
|
StringSearch(String pattern,
CharacterIterator target,
RuleBasedCollator collator,
BreakIterator breakiter)
Initializes the iterator to use the language-specific rules defined in
the argument collator to search for argument pattern in the argument
target text.
|
StringSearch(String pattern,
CharacterIterator target,
ULocale locale)
Initializes the iterator to use the language-specific rules and
break iterator rules defined in the argument locale to search for
argument pattern in the argument target text.
|
StringSearch(String pattern,
String target)
Initializes the iterator to use the language-specific rules and
break iterator rules defined in the default locale to search for
argument pattern in the argument target text.
|
Modifier and Type | Method and Description |
---|---|
RuleBasedCollator |
getCollator()
Gets the
RuleBasedCollator used for the language rules. |
int |
getIndex()
Return the current index in the text being searched.
|
String |
getPattern()
Returns the pattern for which StringSearch is searching for.
|
protected int |
handleNext(int position)
Abstract method which subclasses override to provide the mechanism
for finding the next match in the target text.
|
protected int |
handlePrevious(int position)
Abstract method which subclasses override to provide the mechanism for
finding the previous match in the target text.
|
boolean |
isCanonical()
Determines whether canonical matches (option 1, as described in the
class documentation) is set.
|
void |
reset()
Resets the iteration.
|
void |
setCanonical(boolean allowCanonical)
Set the canonical match mode.
|
void |
setCollator(RuleBasedCollator collator)
Sets the
RuleBasedCollator to be used for language-specific searching. |
void |
setIndex(int position)
Sets the position in the target text at which the next search will start.
|
protected void |
setMatchNotFound()
Deprecated.
This API is ICU internal only.
|
void |
setPattern(String pattern)
Set the pattern to search for.
|
void |
setTarget(CharacterIterator text)
Set the target text to be searched.
|
first, following, getBreakIterator, getElementComparisonType, getMatchedText, getMatchLength, getMatchStart, getTarget, isOverlapping, last, next, preceding, previous, setBreakIterator, setElementComparisonType, setMatchLength, setOverlapping
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator, BreakIterator breakiter)
breakiter
is used to define logical matches.
See super class documentation for more details on the use of the target
text and BreakIterator
.pattern
- text to look for.target
- target text to search for pattern.collator
- RuleBasedCollator
that defines the language rulesbreakiter
- A BreakIterator
that is used to determine the
boundaries of a logical match. This argument can be null.IllegalArgumentException
- thrown when argument target is null,
or of length 0BreakIterator
,
RuleBasedCollator
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator)
BreakIterator
s are set to test for logical matches.pattern
- text to look for.target
- target text to search for pattern.collator
- RuleBasedCollator
that defines the language rulesIllegalArgumentException
- thrown when argument target is null,
or of length 0RuleBasedCollator
public StringSearch(String pattern, CharacterIterator target, Locale locale)
pattern
- text to look for.target
- target text to search for pattern.locale
- locale to use for language and break iterator rulesIllegalArgumentException
- thrown when argument target is null,
or of length 0. ClassCastException thrown if the collator for
the specified locale is not a RuleBasedCollator.public StringSearch(String pattern, CharacterIterator target, ULocale locale)
BreakIterator
.pattern
- text to look for.target
- target text to search for pattern.locale
- locale to use for language and break iterator rulesIllegalArgumentException
- thrown when argument target is null,
or of length 0. ClassCastException thrown if the collator for
the specified locale is not a RuleBasedCollator.BreakIterator
,
RuleBasedCollator
,
SearchIterator
public StringSearch(String pattern, String target)
pattern
- text to look for.target
- target text to search for pattern.IllegalArgumentException
- thrown when argument target is null,
or of length 0. ClassCastException thrown if the collator for
the default locale is not a RuleBasedCollator.public RuleBasedCollator getCollator()
RuleBasedCollator
used for the language rules.
Since StringSearch depends on the returned RuleBasedCollator
, any
changes to the RuleBasedCollator
result should follow with a call to
either reset()
or setCollator(RuleBasedCollator)
to ensure the correct
search behavior.
RuleBasedCollator
used by this StringSearchRuleBasedCollator
,
setCollator(com.ibm.icu.text.RuleBasedCollator)
public void setCollator(RuleBasedCollator collator)
RuleBasedCollator
to be used for language-specific searching.
The iterator's position will not be changed by this method.
collator
- to use for this StringSearchIllegalArgumentException
- thrown when collator is nullgetCollator()
public String getPattern()
public void setPattern(String pattern)
pattern
- for searchingIllegalArgumentException
- thrown if pattern is null or of
length 0getPattern()
public boolean isCanonical()
setCanonical(boolean)
public void setCanonical(boolean allowCanonical)
allowCanonical
- flag indicator if canonical matches are allowedisCanonical()
public void setTarget(CharacterIterator text)
setTarget
in class SearchIterator
text
- new text iterator to look for match,SearchIterator.getTarget()
public int getIndex()
SearchIterator.DONE
is returned.getIndex
in class SearchIterator
public void setIndex(int position)
Sets the position in the target text at which the next search will start. This method clears any previous match.
setIndex
in class SearchIterator
position
- position from which to start the next searchSearchIterator.getIndex()
public void reset()
reset
in class SearchIterator
protected int handleNext(int position)
If a match is found, the implementation should return the index at
which the match starts and should call
SearchIterator.setMatchLength(int)
with the number of characters
in the target text that make up the match. If no match is found, the
method should return SearchIterator.DONE
.
handleNext
in class SearchIterator
position
- The index in the target text at which the search
should start.SearchIterator.DONE
is returnedSearchIterator.setMatchLength(int)
protected int handlePrevious(int position)
If a match is found, the implementation should return the index at
which the match starts and should call
SearchIterator.setMatchLength(int)
with the number of characters
in the target text that make up the match. If no match is found, the
method should return SearchIterator.DONE
.
handlePrevious
in class SearchIterator
position
- The index in the target text at which the search
should start.SearchIterator.DONE
is returnedSearchIterator.setMatchLength(int)
@Deprecated protected void setMatchNotFound()
setMatchNotFound
in class SearchIterator
Copyright © 2016 Unicode, Inc. and others.