public class RuleBasedBreakIterator extends BreakIterator
Modifier and Type | Field and Description |
---|---|
static String |
fDebugEnv
Deprecated.
This API is ICU internal only.
|
com.ibm.icu.impl.RBBIDataWrapper |
fRData
Deprecated.
This API is ICU internal only.
|
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD, WORD_IDEO, WORD_IDEO_LIMIT, WORD_KANA, WORD_KANA_LIMIT, WORD_LETTER, WORD_LETTER_LIMIT, WORD_NONE, WORD_NONE_LIMIT, WORD_NUMBER, WORD_NUMBER_LIMIT
Constructor and Description |
---|
RuleBasedBreakIterator(String rules)
Construct a RuleBasedBreakIterator from a set of rules supplied as a string.
|
Modifier and Type | Method and Description |
---|---|
protected static void |
checkOffset(int offset,
CharacterIterator text)
Throw IllegalArgumentException unless begin <= offset < end.
|
Object |
clone()
Clones this iterator.
|
static void |
compileRules(String rules,
OutputStream ruleBinary)
Compile a set of source break rules into the binary state tables used
by the break iterator engine.
|
int |
current()
Returns the current iteration position.
|
void |
dump(PrintStream out)
Deprecated.
This API is ICU internal only.
|
boolean |
equals(Object that)
Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text.
|
int |
first()
Sets the current iteration position to the beginning of the text.
|
int |
following(int startPos)
Sets the iterator to refer to the first boundary position following
the specified position.
|
static RuleBasedBreakIterator |
getInstanceFromCompiledRules(ByteBuffer bytes)
Deprecated.
This API is ICU internal only.
|
static RuleBasedBreakIterator |
getInstanceFromCompiledRules(InputStream is)
Create a break iterator from a precompiled set of break rules.
|
int |
getRuleStatus()
Return the status tag from the break rule that determined the boundary at
the current iteration position.
|
int |
getRuleStatusVec(int[] fillInArray)
Get the status (tag) values from the break rule(s) that determined the boundary
at the current iteration position.
|
CharacterIterator |
getText()
Returns a CharacterIterator over the text being analyzed.
|
int |
hashCode()
Compute a hashcode for this BreakIterator
|
boolean |
isBoundary(int offset)
Returns true if the specified position is a boundary position.
|
int |
last()
Sets the current iteration position to the end of the text.
|
int |
next()
Advances the iterator to the next boundary position.
|
int |
next(int n)
Advances the iterator either forward or backward the specified number of steps.
|
int |
preceding(int offset)
Sets the iterator to refer to the last boundary position before the
specified position.
|
int |
previous()
Moves the iterator backwards, to the boundary preceding the current one.
|
void |
setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text.
|
String |
toString()
Returns the description (rules) used to create this iterator.
|
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, registerInstance, registerInstance, setText, setText, unregister
@Deprecated public com.ibm.icu.impl.RBBIDataWrapper fRData
@Deprecated public static final String fDebugEnv
public RuleBasedBreakIterator(String rules)
rules
- The break rules to be used.public static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is) throws IOException
is
- an input stream supplying the compiled binary rules.IOException
- if there is an error while reading the rules from the InputStream.compileRules(String, OutputStream)
@Deprecated public static RuleBasedBreakIterator getInstanceFromCompiledRules(ByteBuffer bytes) throws IOException
bytes
- a buffer supplying the compiled binary rules.IOException
- if there is an error while reading the rules from the buffer.compileRules(String, OutputStream)
public Object clone()
clone
in class BreakIterator
public boolean equals(Object that)
public String toString()
public int hashCode()
@Deprecated public void dump(PrintStream out)
public static void compileRules(String rules, OutputStream ruleBinary) throws IOException
rules
- The source form of the break rulesruleBinary
- An output stream to receive the compiled rules.IOException
- If there is an error writing the output.getInstanceFromCompiledRules(InputStream)
public int first()
first
in class BreakIterator
public int last()
last
in class BreakIterator
public int next(int n)
next
in class BreakIterator
n
- The number of steps to move. The sign indicates the direction
(negative is backwards, and positive is forwards).public int next()
next
in class BreakIterator
public int previous()
previous
in class BreakIterator
public int following(int startPos)
following
in class BreakIterator
startPos
- The position from which to begin searching for a break position.public int preceding(int offset)
preceding
in class BreakIterator
offset
- The position to begin searching for a break from.protected static final void checkOffset(int offset, CharacterIterator text)
public boolean isBoundary(int offset)
isBoundary
in class BreakIterator
offset
- the offset to check.public int current()
current
in class BreakIterator
public int getRuleStatus()
Of the standard types of ICU break iterators, only the word and line break
iterator provides status values. The values are defined in
class RuleBasedBreakIterator, and allow distinguishing between words
that contain alphabetic letters, "words" that appear to be numbers,
punctuation and spaces, words containing ideographic characters, and
more. Call getRuleStatus
after obtaining a boundary
position from next()
, previous()
, or
any other break iterator functions that returns a boundary position.
Note that getRuleStatus()
returns the value corresponding to
current()
index even after next()
has returned DONE.
getRuleStatus
in class BreakIterator
public int getRuleStatusVec(int[] fillInArray)
The status values used by the standard ICU break rules are defined as public constants in class RuleBasedBreakIterator.
If the size of the output array is insufficient to hold the data, the output will be truncated to the available length. No exception will be thrown.
getRuleStatusVec
in class BreakIterator
fillInArray
- an array to be filled in with the status values.public CharacterIterator getText()
Caution:The state of the returned CharacterIterator must not be modified in any way while the BreakIterator is still in use. Doing so will lead to undefined behavior of the BreakIterator. Clone the returned CharacterIterator first and work with that.
The returned CharacterIterator is a reference to the actual iterator being used by the BreakIterator. No guarantees are made about the current position of this iterator when it is returned; it may differ from the BreakIterators current position. If you need to move that position to examine the text, clone this function's return value first.
getText
in class BreakIterator
public void setText(CharacterIterator newText)
Caution: The supplied CharacterIterator is used directly by the BreakIterator, and must not be altered in any way by code outside of the BreakIterator. Doing so will lead to undefined behavior of the BreakIterator.
setText
in class BreakIterator
newText
- An iterator over the text to analyze.Copyright © 2016 Unicode, Inc. and others.