public class Bidi extends Object
Note: Libraries that perform a bidirectional algorithm and reorder strings accordingly are sometimes called "Storage Layout Engines". ICU's Bidi and shaping (ArabicShaping) classes can be used at the core of such "Storage Layout Engines".
Some of the API methods provide access to "runs". Such a "run" is defined as a sequence of characters that are at the same embedding level after performing the Bidi algorithm.
Block Separator
. For handling of
paragraphs, see:
Levels can be abstract values when used for the
paraLevel
and embeddingLevels
arguments of setPara()
; there:
embeddingLevels[]
value indicates whether the using application is
specifying the level of a character to override whatever the
Bidi implementation would resolve it to.paraLevel
can be set to the
pseudo-level values LEVEL_DEFAULT_LTR
and LEVEL_DEFAULT_RTL
.The related constants are not real, valid level values.
DEFAULT_XXX
can be used to specify
a default for the paragraph level for
when the setPara()
method
shall determine it but there is no
strongly typed character in the input.
Note that the value for LEVEL_DEFAULT_LTR
is even
and the one for LEVEL_DEFAULT_RTL
is odd,
just like with normal LTR and RTL level values -
these special values are designed that way. Also, the implementation
assumes that MAX_EXPLICIT_LEVEL is odd.
Note: The numeric values of the related constants will not change: They are tied to the use of 7-bit byte values (plus the override bit) and of the byte data type in this API.
See Also:
LEVEL_DEFAULT_LTR
LEVEL_DEFAULT_RTL
LEVEL_OVERRIDE
MAX_EXPLICIT_LEVEL
setPara(java.lang.String, byte, byte[])
setReorderingMode(int)
REORDER_DEFAULT
REORDER_NUMBERS_SPECIAL
REORDER_GROUP_NUMBERS_WITH_R
REORDER_RUNS_ONLY
REORDER_INVERSE_NUMBERS_AS_L
REORDER_INVERSE_LIKE_DIRECT
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
setReorderingOptions(int)
OPTION_DEFAULT
OPTION_INSERT_MARKS
OPTION_REMOVE_CONTROLS
OPTION_STREAMING
The basic assumptions are:
package com.ibm.icu.dev.test.bidi; import com.ibm.icu.text.Bidi; import com.ibm.icu.text.BidiRun; public class Sample { static final int styleNormal = 0; static final int styleSelected = 1; static final int styleBold = 2; static final int styleItalics = 4; static final int styleSuper=8; static final int styleSub = 16; static class StyleRun { int limit; int style; public StyleRun(int limit, int style) { this.limit = limit; this.style = style; } } static class Bounds { int start; int limit; public Bounds(int start, int limit) { this.start = start; this.limit = limit; } } static int getTextWidth(String text, int start, int limit, StyleRun[] styleRuns, int styleRunCount) { // simplistic way to compute the width return limit - start; } // set limit and StyleRun limit for a line // from text[start] and from styleRuns[styleRunStart] // using Bidi.getLogicalRun(...) // returns line width static int getLineBreak(String text, Bounds line, Bidi para, StyleRun styleRuns[], Bounds styleRun) { // dummy return return 0; } // render runs on a line sequentially, always from left to right // prepare rendering a new line static void startLine(byte textDirection, int lineWidth) { System.out.println(); } // render a run of text and advance to the right by the run width // the text[start..limit-1] is always in logical order static void renderRun(String text, int start, int limit, byte textDirection, int style) { } // We could compute a cross-product // from the style runs with the directional runs // and then reorder it. // Instead, here we iterate over each run type // and render the intersections - // with shortcuts in simple (and common) cases. // renderParagraph() is the main function. // render a directional run with // (possibly) multiple style runs intersecting with it static void renderDirectionalRun(String text, int start, int limit, byte direction, StyleRun styleRuns[], int styleRunCount) { int i; // iterate over style runs if (direction == Bidi.LTR) { int styleLimit; for (i = 0; i < styleRunCount; ++i) { styleLimit = styleRuns[i].limit; if (start < styleLimit) { if (styleLimit > limit) { styleLimit = limit; } renderRun(text, start, styleLimit, direction, styleRuns[i].style); if (styleLimit == limit) { break; } start = styleLimit; } } } else { int styleStart; for (i = styleRunCount-1; i >= 0; --i) { if (i > 0) { styleStart = styleRuns[i-1].limit; } else { styleStart = 0; } if (limit >= styleStart) { if (styleStart < start) { styleStart = start; } renderRun(text, styleStart, limit, direction, styleRuns[i].style); if (styleStart == start) { break; } limit = styleStart; } } } } // the line object represents text[start..limit-1] static void renderLine(Bidi line, String text, int start, int limit, StyleRun styleRuns[], int styleRunCount) { byte direction = line.getDirection(); if (direction != Bidi.MIXED) { // unidirectional if (styleRunCount <= 1) { renderRun(text, start, limit, direction, styleRuns[0].style); } else { renderDirectionalRun(text, start, limit, direction, styleRuns, styleRunCount); } } else { // mixed-directional int count, i; BidiRun run; try { count = line.countRuns(); } catch (IllegalStateException e) { e.printStackTrace(); return; } if (styleRunCount <= 1) { int style = styleRuns[0].style; // iterate over directional runs for (i = 0; i < count; ++i) { run = line.getVisualRun(i); renderRun(text, run.getStart(), run.getLimit(), run.getDirection(), style); } } else { // iterate over both directional and style runs for (i = 0; i < count; ++i) { run = line.getVisualRun(i); renderDirectionalRun(text, run.getStart(), run.getLimit(), run.getDirection(), styleRuns, styleRunCount); } } } } static void renderParagraph(String text, byte textDirection, StyleRun styleRuns[], int styleRunCount, int lineWidth) { int length = text.length(); Bidi para = new Bidi(); try { para.setPara(text, textDirection != 0 ? Bidi.LEVEL_DEFAULT_RTL : Bidi.LEVEL_DEFAULT_LTR, null); } catch (Exception e) { e.printStackTrace(); return; } byte paraLevel = (byte)(1 & para.getParaLevel()); StyleRun styleRun = new StyleRun(length, styleNormal); if (styleRuns == null || styleRunCount <= 0) { styleRuns = new StyleRun[1]; styleRunCount = 1; styleRuns[0] = styleRun; } // assume styleRuns[styleRunCount-1].limit>=length int width = getTextWidth(text, 0, length, styleRuns, styleRunCount); if (width <= lineWidth) { // everything fits onto one line // prepare rendering a new line from either left or right startLine(paraLevel, width); renderLine(para, text, 0, length, styleRuns, styleRunCount); } else { // we need to render several lines Bidi line = new Bidi(length, 0); int start = 0, limit; int styleRunStart = 0, styleRunLimit; for (;;) { limit = length; styleRunLimit = styleRunCount; width = getLineBreak(text, new Bounds(start, limit), para, styleRuns, new Bounds(styleRunStart, styleRunLimit)); try { line = para.setLine(start, limit); } catch (Exception e) { e.printStackTrace(); return; } // prepare rendering a new line // from either left or right startLine(paraLevel, width); if (styleRunStart > 0) { int newRunCount = styleRuns.length - styleRunStart; StyleRun[] newRuns = new StyleRun[newRunCount]; System.arraycopy(styleRuns, styleRunStart, newRuns, 0, newRunCount); renderLine(line, text, start, limit, newRuns, styleRunLimit - styleRunStart); } else { renderLine(line, text, start, limit, styleRuns, styleRunLimit - styleRunStart); } if (limit == length) { break; } start = limit; styleRunStart = styleRunLimit - 1; if (start >= styleRuns[styleRunStart].limit) { ++styleRunStart; } } } } public static void main(String[] args) { renderParagraph("Some Latin text...", Bidi.LTR, null, 0, 80); renderParagraph("Some Hebrew text...", Bidi.RTL, null, 0, 60); } }
Modifier and Type | Field and Description |
---|---|
static int |
CLASS_DEFAULT
Deprecated.
ICU 58 The numeric value may change over time, see ICU ticket #12420.
|
static int |
DIRECTION_DEFAULT_LEFT_TO_RIGHT
Constant indicating that the base direction depends on the first strong
directional character in the text according to the Unicode Bidirectional
Algorithm.
|
static int |
DIRECTION_DEFAULT_RIGHT_TO_LEFT
Constant indicating that the base direction depends on the first strong
directional character in the text according to the Unicode Bidirectional
Algorithm.
|
static int |
DIRECTION_LEFT_TO_RIGHT
Constant indicating base direction is left-to-right.
|
static int |
DIRECTION_RIGHT_TO_LEFT
Constant indicating base direction is right-to-left.
|
static short |
DO_MIRRORING
option bit for writeReordered():
replace characters with the "mirrored" property in RTL runs
by their mirror-image mappings
|
static short |
INSERT_LRM_FOR_NUMERIC
option bit for writeReordered():
surround the run with LRMs if necessary;
this is part of the approximate "inverse Bidi" algorithm
This option does not imply corresponding adjustment of the index
mappings.
|
static short |
KEEP_BASE_COMBINING
option bit for writeReordered():
keep combining characters after their base characters in RTL runs
|
static byte |
LEVEL_DEFAULT_LTR
Paragraph level setting
Constant indicating that the base direction depends on the first strong
directional character in the text according to the Unicode Bidirectional
Algorithm.
|
static byte |
LEVEL_DEFAULT_RTL
Paragraph level setting
Constant indicating that the base direction depends on the first strong
directional character in the text according to the Unicode Bidirectional
Algorithm.
|
static byte |
LEVEL_OVERRIDE
Bit flag for level input.
|
static byte |
LTR
Left-to-right text.
|
static int |
MAP_NOWHERE
Special value which can be returned by the mapping methods when a
logical index has no corresponding visual index or vice-versa.
|
static byte |
MAX_EXPLICIT_LEVEL
Maximum explicit embedding level.
|
static byte |
MIXED
Mixed-directional text.
|
static byte |
NEUTRAL
No strongly directional text.
|
static int |
OPTION_DEFAULT
Option value for
setReorderingOptions :
disable all the options which can be set with this method |
static int |
OPTION_INSERT_MARKS
Option bit for
setReorderingOptions :
insert Bidi marks (LRM or RLM) when needed to ensure correct result of
a reordering to a Logical order
This option must be set or reset before calling
setPara . |
static int |
OPTION_REMOVE_CONTROLS
Option bit for
setReorderingOptions :
remove Bidi control characters
This option must be set or reset before calling
setPara . |
static int |
OPTION_STREAMING
Option bit for
setReorderingOptions :
process the output as part of a stream to be continued
This option must be set or reset before calling
setPara . |
static short |
OUTPUT_REVERSE
option bit for writeReordered():
write the output in reverse order
This has the same effect as calling
writeReordered()
first without this option, and then calling
writeReverse() without mirroring. |
static short |
REMOVE_BIDI_CONTROLS
option bit for writeReordered():
remove Bidi control characters
(this does not affect INSERT_LRM_FOR_NUMERIC)
This option does not imply corresponding adjustment of the index
mappings.
|
static short |
REORDER_DEFAULT
Reordering mode: Regular Logical to Visual Bidi algorithm according to Unicode.
|
static short |
REORDER_GROUP_NUMBERS_WITH_R
Reordering mode: Logical to Visual algorithm grouping numbers with
adjacent R characters (reversible algorithm).
|
static short |
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
Reordering mode: Inverse Bidi (Visual to Logical) algorithm for the
REORDER_NUMBERS_SPECIAL Bidi algorithm. |
static short |
REORDER_INVERSE_LIKE_DIRECT
Reordering mode: Visual to Logical algorithm equivalent to the regular
Logical to Visual algorithm.
|
static short |
REORDER_INVERSE_NUMBERS_AS_L
Reordering mode: Visual to Logical algorithm which handles numbers
like L (same algorithm as selected by
setInverse(true) . |
static short |
REORDER_NUMBERS_SPECIAL
Reordering mode: Logical to Visual algorithm which handles numbers in
a way which mimicks the behavior of Windows XP.
|
static short |
REORDER_RUNS_ONLY
Reordering mode: Reorder runs only to transform a Logical LTR string
to the logical RTL string with the same display, or vice-versa.
|
static byte |
RTL
Right-to-left text.
|
Constructor and Description |
---|
Bidi()
Allocate a
Bidi object. |
Bidi(AttributedCharacterIterator paragraph)
Create Bidi from the given paragraph of text.
|
Bidi(char[] text,
int textStart,
byte[] embeddings,
int embStart,
int paragraphLength,
int flags)
Create Bidi from the given text, embedding, and direction information.
|
Bidi(int maxLength,
int maxRunCount)
Allocate a
Bidi object with preallocated memory
for internal structures. |
Bidi(String paragraph,
int flags)
Create Bidi from the given paragraph of text and base direction.
|
Modifier and Type | Method and Description |
---|---|
boolean |
baseIsLeftToRight()
Return true if the base direction is left-to-right
|
int |
countParagraphs()
Get the number of paragraphs.
|
int |
countRuns()
Get the number of runs.
|
Bidi |
createLineBidi(int lineStart,
int lineLimit)
Create a Bidi object representing the bidi information on a line of text
within the paragraph represented by the current Bidi.
|
static byte |
getBaseDirection(CharSequence paragraph)
Get the base direction of the text provided according to the Unicode
Bidirectional Algorithm.
|
int |
getBaseLevel()
Return the base level (0 if left-to-right, 1 if right-to-left).
|
BidiClassifier |
getCustomClassifier()
Gets the current custom class classifier used for Bidi class
determination.
|
int |
getCustomizedClass(int c)
Retrieves the Bidi class for a given code point.
|
byte |
getDirection()
Get the directionality of the text.
|
int |
getLength()
Get the length of the text.
|
byte |
getLevelAt(int charIndex)
Get the level for one character.
|
byte[] |
getLevels()
Get an array of levels for each character.
|
int |
getLogicalIndex(int visualIndex)
Get the logical text position from a visual position.
|
int[] |
getLogicalMap()
Get a logical-to-visual index map (array) for the characters in the
Bidi (paragraph or line) object. |
BidiRun |
getLogicalRun(int logicalPosition)
Get a logical run.
|
BidiRun |
getParagraph(int charIndex)
Get a paragraph, given a position within the text.
|
BidiRun |
getParagraphByIndex(int paraIndex)
Get a paragraph, given the index of this paragraph.
|
int |
getParagraphIndex(int charIndex)
Get the index of a paragraph, given a position within the text.
|
byte |
getParaLevel()
Get the paragraph level of the text.
|
int |
getProcessedLength()
Get the length of the source text processed by the last call to
setPara() . |
int |
getReorderingMode()
What is the requested reordering mode for a given Bidi object?
|
int |
getReorderingOptions()
What are the reordering options applied to a given Bidi object?
|
int |
getResultLength()
Get the length of the reordered text resulting from the last call to
setPara() . |
int |
getRunCount()
Return the number of level runs.
|
int |
getRunLevel(int run)
Return the level of the nth logical run in this line.
|
int |
getRunLimit(int run)
Return the index of the character past the end of the nth logical run in
this line, as an offset from the start of the line.
|
int |
getRunStart(int run)
Return the index of the character at the start of the nth logical run in
this line, as an offset from the start of the line.
|
char[] |
getText()
Get the text.
|
String |
getTextAsString()
Get the text.
|
int |
getVisualIndex(int logicalIndex)
Get the visual position from a logical text position.
|
int[] |
getVisualMap()
Get a visual-to-logical index map (array) for the characters in the
Bidi (paragraph or line) object. |
BidiRun |
getVisualRun(int runIndex)
Get a
BidiRun object according to its index. |
static int[] |
invertMap(int[] srcMap)
Invert an index map.
|
boolean |
isInverse()
Is this
Bidi object set to perform the inverse Bidi
algorithm? |
boolean |
isLeftToRight()
Return true if the line is all left-to-right text and the base direction
is left-to-right.
|
boolean |
isMixed()
Return true if the line is not left-to-right or right-to-left.
|
boolean |
isOrderParagraphsLTR()
Is this
Bidi object set to allocate level 0 to block
separators so that successive paragraphs progress from left to right? |
boolean |
isRightToLeft()
Return true if the line is all right-to-left text, and the base direction
is right-to-left
|
void |
orderParagraphsLTR(boolean ordarParaLTR)
Specify whether block separators must be allocated level zero,
so that successive paragraphs will progress from left to right.
|
static int[] |
reorderLogical(byte[] levels)
This is a convenience method that does not use a
Bidi object. |
static int[] |
reorderVisual(byte[] levels)
This is a convenience method that does not use a
Bidi object. |
static void |
reorderVisually(byte[] levels,
int levelStart,
Object[] objects,
int objectStart,
int count)
Reorder the objects in the array into visual order based on their levels.
|
static boolean |
requiresBidi(char[] text,
int start,
int limit)
Return true if the specified text requires bidi analysis.
|
void |
setContext(String prologue,
String epilogue)
Set the context before a call to setPara().
|
void |
setCustomClassifier(BidiClassifier classifier)
Set a custom Bidi classifier used by the UBA implementation for Bidi
class determination.
|
void |
setInverse(boolean isInverse)
Modify the operation of the Bidi algorithm such that it
approximates an "inverse Bidi" algorithm.
|
Bidi |
setLine(int start,
int limit)
setLine() returns a Bidi object to
contain the reordering information, especially the resolved levels,
for all the characters in a line of text. |
void |
setPara(AttributedCharacterIterator paragraph)
Perform the Unicode Bidi algorithm on a given paragraph, as defined in the
Unicode Standard Annex #9,
version 13,
also described in The Unicode Standard, Version 4.0 .
|
void |
setPara(char[] chars,
byte paraLevel,
byte[] embeddingLevels)
Perform the Unicode Bidi algorithm.
|
void |
setPara(String text,
byte paraLevel,
byte[] embeddingLevels)
Perform the Unicode Bidi algorithm.
|
void |
setReorderingMode(int reorderingMode)
Modify the operation of the Bidi algorithm such that it implements some
variant to the basic Bidi algorithm or approximates an "inverse Bidi"
algorithm, depending on different values of the "reordering mode".
|
void |
setReorderingOptions(int options)
Specify which of the reordering options should be applied during Bidi
transformations.
|
String |
writeReordered(int options)
Take a
Bidi object containing the reordering
information for a piece of text (one or more paragraphs) set by
setPara() or for a line of text set by setLine()
and return a string containing the reordered text. |
static String |
writeReverse(String src,
int options)
Reverse a Right-To-Left run of Unicode text.
|
public static final byte LEVEL_DEFAULT_LTR
Constant indicating that the base direction depends on the first strong directional character in the text according to the Unicode Bidirectional Algorithm. If no strong directional character is present, then set the paragraph level to 0 (left-to-right).
If this value is used in conjunction with reordering modes
REORDER_INVERSE_LIKE_DIRECT
or
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
, the text to reorder
is assumed to be visual LTR, and the text after reordering is required
to be the corresponding logical string with appropriate contextual
direction. The direction of the result string will be RTL if either
the rightmost or leftmost strong character of the source text is RTL
or Arabic Letter, the direction will be LTR otherwise.
If reordering option OPTION_INSERT_MARKS
is set, an RLM may
be added at the beginning of the result string to ensure round trip
(that the result string, when reordered back to visual, will produce
the original source text).
REORDER_INVERSE_LIKE_DIRECT
,
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
,
Constant Field Valuespublic static final byte LEVEL_DEFAULT_RTL
Constant indicating that the base direction depends on the first strong directional character in the text according to the Unicode Bidirectional Algorithm. If no strong directional character is present, then set the paragraph level to 1 (right-to-left).
If this value is used in conjunction with reordering modes
REORDER_INVERSE_LIKE_DIRECT
or
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
, the text to reorder
is assumed to be visual LTR, and the text after reordering is required
to be the corresponding logical string with appropriate contextual
direction. The direction of the result string will be RTL if either
the rightmost or leftmost strong character of the source text is RTL
or Arabic Letter, or if the text contains no strong character;
the direction will be LTR otherwise.
If reordering option OPTION_INSERT_MARKS
is set, an RLM may
be added at the beginning of the result string to ensure round trip
(that the result string, when reordered back to visual, will produce
the original source text).
REORDER_INVERSE_LIKE_DIRECT
,
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
,
Constant Field Valuespublic static final byte MAX_EXPLICIT_LEVEL
MAX_EXPLICIT_LEVEL+1
).public static final byte LEVEL_OVERRIDE
public static final int MAP_NOWHERE
OPTION_REMOVE_CONTROLS
is
specified. This can also happen for the visual-to-logical mapping of a
Bidi mark (LRM or RLM) inserted by option
OPTION_INSERT_MARKS
.getVisualIndex(int)
,
getVisualMap()
,
getLogicalIndex(int)
,
getLogicalMap()
,
OPTION_INSERT_MARKS
,
OPTION_REMOVE_CONTROLS
,
Constant Field Valuespublic static final byte LTR
getDirection()
, it means
that the source string contains no right-to-left characters, or
that the source string is empty and the paragraph level is even.
getBaseDirection()
, it
means that the first strong character of the source string has
a left-to-right direction.
public static final byte RTL
getDirection()
, it means
that the source string contains no left-to-right characters, or
that the source string is empty and the paragraph level is odd.
getBaseDirection()
, it
means that the first strong character of the source string has
a right-to-left direction.
public static final byte MIXED
As return value for getDirection()
, it means
that the source string contains both left-to-right and
right-to-left characters.
public static final byte NEUTRAL
As return value for getBaseDirection()
, it means
that the source string is missing or empty, or contains neither
left-to-right nor right-to-left characters.
public static final short KEEP_BASE_COMBINING
writeReordered(int)
,
Constant Field Valuespublic static final short DO_MIRRORING
writeReordered(int)
,
Constant Field Valuespublic static final short INSERT_LRM_FOR_NUMERIC
This option does not imply corresponding adjustment of the index mappings.
setInverse(boolean)
,
writeReordered(int)
,
Constant Field Valuespublic static final short REMOVE_BIDI_CONTROLS
This option does not imply corresponding adjustment of the index mappings.
writeReordered(int)
,
INSERT_LRM_FOR_NUMERIC
,
Constant Field Valuespublic static final short OUTPUT_REVERSE
This has the same effect as calling writeReordered()
first without this option, and then calling
writeReverse()
without mirroring.
Doing this in the same step is faster and avoids a temporary buffer.
An example for using this option is output to a character terminal that
is designed for RTL scripts and stores text in reverse order.
writeReordered(int)
,
Constant Field Valuespublic static final short REORDER_DEFAULT
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_NUMBERS_SPECIAL
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_GROUP_NUMBERS_WITH_R
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_RUNS_ONLY
OPTION_INSERT_MARKS
, some Bidi controls in the source
text may be removed and other controls may be added to produce the
minimum combination which has the required display.OPTION_INSERT_MARKS
,
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_INVERSE_NUMBERS_AS_L
setInverse(true)
.setInverse(boolean)
,
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_INVERSE_LIKE_DIRECT
setReorderingMode(int)
,
Constant Field Valuespublic static final short REORDER_INVERSE_FOR_NUMBERS_SPECIAL
REORDER_NUMBERS_SPECIAL
Bidi algorithm.setReorderingMode(int)
,
Constant Field Valuespublic static final int OPTION_DEFAULT
setReorderingOptions
:
disable all the options which can be set with this methodsetReorderingOptions(int)
,
Constant Field Valuespublic static final int OPTION_INSERT_MARKS
setReorderingOptions
:
insert Bidi marks (LRM or RLM) when needed to ensure correct result of
a reordering to a Logical order
This option must be set or reset before calling
setPara
.
This option is significant only with reordering modes which generate a result with Logical order, specifically.
REORDER_RUNS_ONLY
REORDER_INVERSE_NUMBERS_AS_L
REORDER_INVERSE_LIKE_DIRECT
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
If this option is set in conjunction with reordering mode
REORDER_INVERSE_NUMBERS_AS_L
or with calling
setInverse(true)
, it implies option
INSERT_LRM_FOR_NUMERIC
in calls to method
writeReordered()
.
For other reordering modes, a minimum number of LRM or RLM characters
will be added to the source text after reordering it so as to ensure
round trip, i.e. when applying the inverse reordering mode on the
resulting logical text with removal of Bidi marks
(option OPTION_REMOVE_CONTROLS
set before calling
setPara()
or option
REMOVE_BIDI_CONTROLS
in
writeReordered
), the result will be identical to the
source text in the first transformation.
This option will be ignored if specified together with option
OPTION_REMOVE_CONTROLS
. It inhibits option
REMOVE_BIDI_CONTROLS
in calls to method
writeReordered()
and it implies option
INSERT_LRM_FOR_NUMERIC
in calls to method
writeReordered()
if the reordering mode is
REORDER_INVERSE_NUMBERS_AS_L
.
public static final int OPTION_REMOVE_CONTROLS
setReorderingOptions
:
remove Bidi control characters
This option must be set or reset before calling
setPara
.
This option nullifies option
OPTION_INSERT_MARKS
. It inhibits option
INSERT_LRM_FOR_NUMERIC
in calls to method
writeReordered()
and it implies option
REMOVE_BIDI_CONTROLS
in calls to that method.
setReorderingMode(int)
,
setReorderingOptions(int)
,
OPTION_INSERT_MARKS
,
INSERT_LRM_FOR_NUMERIC
,
REMOVE_BIDI_CONTROLS
,
Constant Field Valuespublic static final int OPTION_STREAMING
setReorderingOptions
:
process the output as part of a stream to be continued
This option must be set or reset before calling
setPara
.
This option specifies that the caller is interested in processing large text object in parts. The results of the successive calls are expected to be concatenated by the caller. Only the call for the last part will have this option bit off.
When this option bit is on, setPara()
may process
less than the full source text in order to truncate the text at a
meaningful boundary. The caller should call
getProcessedLength()
immediately after calling
setPara()
in order to determine how much of the source
text has been processed. Source text beyond that length should be
resubmitted in following calls to setPara
. The
processed length may be less than the length of the source text if a
character preceding the last character of the source text constitutes a
reasonable boundary (like a block separator) for text to be continued.
If the last character of the source text constitutes a reasonable
boundary, the whole text will be processed at once.
If nowhere in the source text there exists
such a reasonable boundary, the processed length will be zero.
The caller should check for such an occurrence and do one of the following:
OPTION_STREAMING
.When the OPTION_STREAMING
option is used, it is
recommended to call orderParagraphsLTR(true)
before calling
setPara()
so that later paragraphs may be concatenated to
previous paragraphs on the right.
setReorderingMode(int)
,
setReorderingOptions(int)
,
getProcessedLength()
,
Constant Field Values@Deprecated public static final int CLASS_DEFAULT
BidiClassifier
when there is no need to
override the standard Bidi class for a given code point.
This constant is deprecated; use UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1 instead.
BidiClassifier
,
Constant Field Valuespublic static final int DIRECTION_LEFT_TO_RIGHT
public static final int DIRECTION_RIGHT_TO_LEFT
public static final int DIRECTION_DEFAULT_LEFT_TO_RIGHT
public static final int DIRECTION_DEFAULT_RIGHT_TO_LEFT
public Bidi()
Bidi
object.
Such an object is initially empty. It is assigned
the Bidi properties of a piece of text containing one or more paragraphs
by setPara()
or the Bidi properties of a line within a paragraph by
setLine()
.This object can be reused.
setPara()
and setLine()
will allocate
additional memory for internal structures as necessary.
public Bidi(int maxLength, int maxRunCount)
Bidi
object with preallocated memory
for internal structures.
This method provides a Bidi
object like the default constructor
but it also preallocates memory for internal structures
according to the sizings supplied by the caller.
The preallocation can be limited to some of the internal memory
by setting some values to 0 here. That means that if, e.g.,
maxRunCount
cannot be reasonably predetermined and should not
be set to maxLength
(the only failproof value) to avoid
wasting memory, then maxRunCount
could be set to 0 here
and the internal structures that are associated with it will be allocated
on demand, just like with the default constructor.
maxLength
- is the maximum text or line length that internal memory
will be preallocated for. An attempt to associate this object with a
longer text will fail, unless this value is 0, which leaves the allocation
up to the implementation.maxRunCount
- is the maximum anticipated number of same-level runs
that internal memory will be preallocated for. An attempt to access
visual runs on an object that was not preallocated for as many runs
as the text was actually resolved to will fail,
unless this value is 0, which leaves the allocation up to the implementation.maxLength
. It is typically small.IllegalArgumentException
- if maxLength or maxRunCount is less than 0public Bidi(String paragraph, int flags)
paragraph
- a paragraph of textflags
- a collection of flags that control the algorithm. The
algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.DIRECTION_LEFT_TO_RIGHT
,
DIRECTION_RIGHT_TO_LEFT
,
DIRECTION_DEFAULT_LEFT_TO_RIGHT
,
DIRECTION_DEFAULT_RIGHT_TO_LEFT
public Bidi(AttributedCharacterIterator paragraph)
The RUN_DIRECTION attribute in the text, if present, determines the base direction (left-to-right or right-to-left). If not present, the base direction is computed using the Unicode Bidirectional Algorithm, defaulting to left-to-right if there are no strong directional characters in the text. This attribute, if present, must be applied to all the text in the paragraph.
The BIDI_EMBEDDING attribute in the text, if present, represents
embedding level information.
Negative values indicate overrides at the absolute value of the level.
Positive values indicate embeddings. (See MAX_EXPLICIT_LEVEL
.)
Where values are zero or not defined, the base
embedding level as determined by the base direction is assumed.
The NUMERIC_SHAPING attribute in the text, if present, converts European digits to other decimal digits before running the bidi algorithm. This attribute, if present, must be applied to all the text in the paragraph.
Note: this constructor calls setPara() internally.
paragraph
- a paragraph of text with optional character and
paragraph attribute informationpublic Bidi(char[] text, int textStart, byte[] embeddings, int embStart, int paragraphLength, int flags)
The embeddings array may be null. If present, the values represent
embedding level information.
Negative values indicate overrides at the absolute value of the level.
Positive values indicate embeddings. (See MAX_EXPLICIT_LEVEL
.)
Where values are zero, the base embedding level
as determined by the base direction is assumed,
except for paragraph separators which remain at 0 to prevent reordering of paragraphs.
Note: This constructor calls setPara() internally,
after converting the java.text.Bidi-style embeddings with negative overrides
into ICU-style embeddings with bit fields for LEVEL_OVERRIDE
and the level.
text
- an array containing the paragraph of text to process.textStart
- the index into the text array of the start of the
paragraph.embeddings
- an array containing embedding values for each character
in the paragraph. This can be null, in which case it is assumed
that there is no external embedding information.embStart
- the index into the embedding array of the start of the
paragraph.paragraphLength
- the length of the paragraph in the text and
embeddings arrays.flags
- a collection of flags that control the algorithm. The
algorithm understands the flags DIRECTION_LEFT_TO_RIGHT,
DIRECTION_RIGHT_TO_LEFT, DIRECTION_DEFAULT_LEFT_TO_RIGHT, and
DIRECTION_DEFAULT_RIGHT_TO_LEFT. Other values are reserved.IllegalArgumentException
- if the values in embeddings are
not within the allowed rangeDIRECTION_LEFT_TO_RIGHT
,
DIRECTION_RIGHT_TO_LEFT
,
DIRECTION_DEFAULT_LEFT_TO_RIGHT
,
DIRECTION_DEFAULT_RIGHT_TO_LEFT
public void setInverse(boolean isInverse)
setPara()
.
The normal operation of the Bidi algorithm as described in the Unicode Technical Report is to take text stored in logical (keyboard, typing) order and to determine the reordering of it for visual rendering. Some legacy systems store text in visual order, and for operations with standard, Unicode-based algorithms, the text needs to be transformed to logical order. This is effectively the inverse algorithm of the described Bidi algorithm. Note that there is no standard algorithm for this "inverse Bidi" and that the current implementation provides only an approximation of "inverse Bidi".
With isInversed
set to true
,
this method changes the behavior of some of the subsequent methods
in a way that they can be used for the inverse Bidi algorithm.
Specifically, runs of text with numeric characters will be treated in a
special way and may need to be surrounded with LRM characters when they are
written in reordered sequence.
Output runs should be retrieved using getVisualRun()
.
Since the actual input for "inverse Bidi" is visually ordered text and
getVisualRun()
gets the reordered runs, these are actually
the runs of the logically ordered output.
Calling this method with argument isInverse
set to
true
is equivalent to calling setReorderingMode
with argument reorderingMode
set to REORDER_INVERSE_NUMBERS_AS_L
.
Calling this method with argument isInverse
set to
false
is equivalent to calling setReorderingMode
with argument reorderingMode
set to REORDER_DEFAULT
.
isInverse
- specifies "forward" or "inverse" Bidi operation.setPara(java.lang.String, byte, byte[])
,
writeReordered(int)
,
setReorderingMode(int)
,
REORDER_INVERSE_NUMBERS_AS_L
,
REORDER_DEFAULT
public boolean isInverse()
Bidi
object set to perform the inverse Bidi
algorithm?
Note: calling this method after setting the reordering mode with
setReorderingMode
will return true
if the
reordering mode was set to
REORDER_INVERSE_NUMBERS_AS_L
, false
for all other values.
true
if the Bidi
object is set to
perform the inverse Bidi algorithm by handling numbers as L.setInverse(boolean)
,
setReorderingMode(int)
,
REORDER_INVERSE_NUMBERS_AS_L
public void setReorderingMode(int reorderingMode)
setPara()
, and stays in
effect until called again with a different argument.
The normal operation of the Bidi algorithm as described in the Unicode Standard Annex #9 is to take text stored in logical (keyboard, typing) order and to determine how to reorder it for visual rendering.
With the reordering mode set to a value other than
REORDER_DEFAULT
, this method changes the behavior of some of
the subsequent methods in a way such that they implement an inverse Bidi
algorithm or some other algorithm variants.
Some legacy systems store text in visual order, and for operations with standard, Unicode-based algorithms, the text needs to be transformed into logical order. This is effectively the inverse algorithm of the described Bidi algorithm. Note that there is no standard algorithm for this "inverse Bidi", so a number of variants are implemented here.
In other cases, it may be desirable to emulate some variant of the Logical to Visual algorithm (e.g. one used in MS Windows), or perform a Logical to Logical transformation.
REORDER_DEFAULT
,
the standard Bidi Logical to Visual algorithm is applied.REORDER_NUMBERS_SPECIAL
,
the algorithm used to perform Bidi transformations when calling
setPara
should approximate the algorithm used in Microsoft
Windows XP rather than strictly conform to the Unicode Bidi algorithm.
REORDER_GROUP_NUMBERS_WITH_R
,
numbers located between LTR text and RTL text are associated with the RTL
text. For instance, an LTR paragraph with content "abc 123 DEF" (where
upper case letters represent RTL characters) will be transformed to
"abc FED 123" (and not "abc 123 FED"), "DEF 123 abc" will be transformed
to "123 FED abc" and "123 FED abc" will be transformed to "DEF 123 abc".
This makes the algorithm reversible and makes it useful when round trip
(from visual to logical and back to visual) must be achieved without
adding LRM characters. However, this is a variation from the standard
Unicode Bidi algorithm.REORDER_RUNS_ONLY
,
a "Logical to Logical" transformation must be performed:
paraLevel
in setPara
) is even, the source text
will be handled as LTR logical text and will be transformed to the RTL
logical text which has the same LTR visual display.REORDER_INVERSE_NUMBERS_AS_L
, an "inverse Bidi"
algorithm is applied.
Runs of text with numeric characters will be treated like LTR letters and
may need to be surrounded with LRM characters when they are written in
reordered sequence (the option INSERT_LRM_FOR_NUMERIC
can
be used with method writeReordered
to this end. This mode
is equivalent to calling setInverse()
with
argument isInverse
set to true
.REORDER_INVERSE_LIKE_DIRECT
, the "direct" Logical to
Visual Bidi algorithm is used as an approximation of an "inverse Bidi"
algorithm. This mode is similar to mode
REORDER_INVERSE_NUMBERS_AS_L
but is closer to the
regular Bidi algorithm.
REORDER_INVERSE_NUMBERS_AS_L
.OPTION_INSERT_MARKS
, this mode generally
adds Bidi marks to the output significantly more sparingly than mode
REORDER_INVERSE_NUMBERS_AS_L
.INSERT_LRM_FOR_NUMERIC
in calls to
writeReordered
.REORDER_INVERSE_FOR_NUMBERS_SPECIAL
, the Logical to Visual
Bidi algorithm used in Windows XP is used as an approximation of an "inverse
Bidi" algorithm.
In all the reordering modes specifying an "inverse Bidi" algorithm
(i.e. those with a name starting with REORDER_INVERSE
),
output runs should be retrieved using getVisualRun()
, and
the output text with writeReordered()
. The caller should
keep in mind that in "inverse Bidi" modes the input is actually visually
ordered text and reordered output returned by getVisualRun()
or writeReordered()
are actually runs or character string
of logically ordered output.
For all the "inverse Bidi" modes, the source text should not contain
Bidi control characters other than LRM or RLM.
Note that option OUTPUT_REVERSE
of
writeReordered
has no useful meaning and should not be used
in conjunction with any value of the reordering mode specifying "inverse
Bidi" or with value REORDER_RUNS_ONLY
.
reorderingMode
- specifies the required variant of the Bidi
algorithm.setInverse(boolean)
,
setPara(java.lang.String, byte, byte[])
,
writeReordered(int)
,
INSERT_LRM_FOR_NUMERIC
,
OUTPUT_REVERSE
,
REORDER_DEFAULT
,
REORDER_NUMBERS_SPECIAL
,
REORDER_GROUP_NUMBERS_WITH_R
,
REORDER_RUNS_ONLY
,
REORDER_INVERSE_NUMBERS_AS_L
,
REORDER_INVERSE_LIKE_DIRECT
,
REORDER_INVERSE_FOR_NUMBERS_SPECIAL
public int getReorderingMode()
setReorderingMode(int)
public void setReorderingOptions(int options)
options
- A combination of zero or more of the following
reordering options:
OPTION_DEFAULT
, OPTION_INSERT_MARKS
,
OPTION_REMOVE_CONTROLS
, OPTION_STREAMING
.getReorderingOptions()
,
OPTION_DEFAULT
,
OPTION_INSERT_MARKS
,
OPTION_REMOVE_CONTROLS
,
OPTION_STREAMING
public int getReorderingOptions()
setReorderingOptions(int)
public static byte getBaseDirection(CharSequence paragraph)
public void setContext(String prologue, String epilogue)
setPara() computes the left-right directionality for a given piece of text which is supplied as one of its arguments. Sometimes this piece of text (the "main text") should be considered in context, because text appearing before ("prologue") and/or after ("epilogue") the main text may affect the result of this computation.
This function specifies the prologue and/or the epilogue for the next call to setPara(). If successive calls to setPara() all need specification of a context, setContext() must be called before each call to setPara(). In other words, a context is not "remembered" after the following successful call to setPara().
If a call to setPara() specifies DEFAULT_LTR or DEFAULT_RTL as paraLevel and is preceded by a call to setContext() which specifies a prologue, the paragraph level will be computed taking in consideration the text in the prologue.
When setPara() is called without a previous call to setContext, the main text is handled as if preceded and followed by strong directional characters at the current paragraph level. Calling setContext() with specification of a prologue will change this behavior by handling the main text as if preceded by the last strong character appearing in the prologue, if any. Calling setContext() with specification of an epilogue will change the behavior of setPara() by handling the main text as if followed by the first strong character or digit appearing in the epilogue, if any.
Note 1: if setContext
is called repeatedly without
calling setPara
, the earlier calls have no effect,
only the last call will be remembered for the next call to
setPara
.
Note 2: calling setContext(null, null)
cancels any previous setting of non-empty prologue or epilogue.
The next call to setPara()
will process no
prologue or epilogue.
Note 3: users must be aware that even after setting the context
before a call to setPara() to perform e.g. a logical to visual
transformation, the resulting string may not be identical to what it
would have been if all the text, including prologue and epilogue, had
been processed together.
Example (upper case letters represent RTL characters):
prologue = "abc DE
"
epilogue = none
main text = "FGH xyz
"
paraLevel = LTR
display without prologue = "HGF xyz
"
("HGF" is adjacent to "xyz")
display with prologue = "abc HGFED xyz
"
("HGF" is not adjacent to "xyz")
prologue
- is the text which precedes the text that
will be specified in a coming call to setPara().
If there is no prologue to consider,
this parameter can be null
.epilogue
- is the text which follows the text that
will be specified in a coming call to setPara().
If there is no epilogue to consider,
this parameter can be null
.setPara(java.lang.String, byte, byte[])
public void setPara(String text, byte paraLevel, byte[] embeddingLevels)
This method takes a piece of plain text containing one or more paragraphs, with or without externally specified embedding levels from styled text and computes the left-right-directionality of each character.
If the entire text is all of the same directionality, then
the method may not perform all the steps described by the algorithm,
i.e., some levels may not be the same as if all steps were performed.
This is not relevant for unidirectional text.
For example, in pure LTR text with numbers the numbers would get
a resolved level of 2 higher than the surrounding text according to
the algorithm. This implementation may set all resolved levels to
the same value in such a case.
The text can be composed of multiple paragraphs. Occurrence of a block separator in the text terminates a paragraph, and whatever comes next starts a new paragraph. The exception to this rule is when a Carriage Return (CR) is followed by a Line Feed (LF). Both CR and LF are block separators, but in that case, the pair of characters is considered as terminating the preceding paragraph, and a new paragraph will be started by a character coming after the LF.
Although the text is passed here as a String
, it is
stored internally as an array of characters. Therefore the
documentation will refer to indexes of the characters in the text.
text
- contains the text that the Bidi algorithm will be performed
on. This text can be retrieved with getText()
or
getTextAsString
.paraLevel
- specifies the default level for the text;
it is typically 0 (LTR) or 1 (RTL).
If the method shall determine the paragraph level from the text,
then paraLevel
can be set to
either LEVEL_DEFAULT_LTR
or LEVEL_DEFAULT_RTL
; if the text contains multiple
paragraphs, the paragraph level shall be determined separately for
each paragraph; if a paragraph does not include any strongly typed
character, then the desired default is used (0 for LTR or 1 for RTL).
Any other value between 0 and MAX_EXPLICIT_LEVEL
is also valid, with odd levels indicating RTL.embeddingLevels
- (in) may be used to preset the embedding and override levels,
ignoring characters like LRE and PDF in the text.
A level overrides the directional property of its corresponding
(same index) character if the level has the
LEVEL_OVERRIDE
bit set.paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL
,
except that level 0 is always allowed.
Level 0 for a paragraph separator prevents reordering of paragraphs;
this only works reliably if LEVEL_OVERRIDE
is also set for paragraph separators.
Level 0 for other characters is treated as a wildcard
and is lifted up to the resolved level of the surrounding paragraph.Bidi
object;
the embeddingLevels
should not be modified to avoid unexpected results on subsequent
Bidi operations. However, the setPara()
and
setLine()
methods may modify some or all of the
levels.embeddingLevels
array must
have one entry for each character in text
.IllegalArgumentException
- if the values in embeddingLevels are
not within the allowed rangeLEVEL_DEFAULT_LTR
,
LEVEL_DEFAULT_RTL
,
LEVEL_OVERRIDE
,
MAX_EXPLICIT_LEVEL
public void setPara(char[] chars, byte paraLevel, byte[] embeddingLevels)
This method takes a piece of plain text containing one or more paragraphs, with or without externally specified embedding levels from styled text and computes the left-right-directionality of each character.
If the entire text is all of the same directionality, then
the method may not perform all the steps described by the algorithm,
i.e., some levels may not be the same as if all steps were performed.
This is not relevant for unidirectional text.
For example, in pure LTR text with numbers the numbers would get
a resolved level of 2 higher than the surrounding text according to
the algorithm. This implementation may set all resolved levels to
the same value in such a case.
The text can be composed of multiple paragraphs. Occurrence of a block separator in the text terminates a paragraph, and whatever comes next starts a new paragraph. The exception to this rule is when a Carriage Return (CR) is followed by a Line Feed (LF). Both CR and LF are block separators, but in that case, the pair of characters is considered as terminating the preceding paragraph, and a new paragraph will be started by a character coming after the LF.
The text is stored internally as an array of characters. Therefore the documentation will refer to indexes of the characters in the text.
chars
- contains the text that the Bidi algorithm will be performed
on. This text can be retrieved with getText()
or
getTextAsString
.paraLevel
- specifies the default level for the text;
it is typically 0 (LTR) or 1 (RTL).
If the method shall determine the paragraph level from the text,
then paraLevel
can be set to
either LEVEL_DEFAULT_LTR
or LEVEL_DEFAULT_RTL
; if the text contains multiple
paragraphs, the paragraph level shall be determined separately for
each paragraph; if a paragraph does not include any strongly typed
character, then the desired default is used (0 for LTR or 1 for RTL).
Any other value between 0 and MAX_EXPLICIT_LEVEL
is also valid, with odd levels indicating RTL.embeddingLevels
- (in) may be used to preset the embedding and
override levels, ignoring characters like LRE and PDF in the text.
A level overrides the directional property of its corresponding
(same index) character if the level has the
LEVEL_OVERRIDE
bit set.paraLevel<=embeddingLevels[]<=MAX_EXPLICIT_LEVEL
,
except that level 0 is always allowed.
Level 0 for a paragraph separator prevents reordering of paragraphs;
this only works reliably if LEVEL_OVERRIDE
is also set for paragraph separators.
Level 0 for other characters is treated as a wildcard
and is lifted up to the resolved level of the surrounding paragraph.Bidi
object;
the embeddingLevels
should not be modified to avoid unexpected results on subsequent
Bidi operations. However, the setPara()
and
setLine()
methods may modify some or all of the
levels.embeddingLevels
array must
have one entry for each character in text
.IllegalArgumentException
- if the values in embeddingLevels are
not within the allowed rangeLEVEL_DEFAULT_LTR
,
LEVEL_DEFAULT_RTL
,
LEVEL_OVERRIDE
,
MAX_EXPLICIT_LEVEL
public void setPara(AttributedCharacterIterator paragraph)
This method takes a paragraph of text and computes the left-right-directionality of each character. The text should not contain any Unicode block separators.
The RUN_DIRECTION attribute in the text, if present, determines the base direction (left-to-right or right-to-left). If not present, the base direction is computed using the Unicode Bidirectional Algorithm, defaulting to left-to-right if there are no strong directional characters in the text. This attribute, if present, must be applied to all the text in the paragraph.
The BIDI_EMBEDDING attribute in the text, if present, represents
embedding level information.
Negative values indicate overrides at the absolute value of the level.
Positive values indicate embeddings. (See MAX_EXPLICIT_LEVEL
.)
Where values are zero or not defined, the base
embedding level as determined by the base direction is assumed.
The NUMERIC_SHAPING attribute in the text, if present, converts European
digits to other decimal digits before running the bidi algorithm. This
attribute, if present, must be applied to all the text in the paragraph.
If the entire text is all of the same directionality, then
the method may not perform all the steps described by the algorithm,
i.e., some levels may not be the same as if all steps were performed.
This is not relevant for unidirectional text.
For example, in pure LTR text with numbers the numbers would get
a resolved level of 2 higher than the surrounding text according to
the algorithm. This implementation may set all resolved levels to
the same value in such a case.
paragraph
- a paragraph of text with optional character and
paragraph attribute informationpublic void orderParagraphsLTR(boolean ordarParaLTR)
setPara()
.
Paragraph separators (B) may appear in the text. Setting them to level zero
means that all paragraph separators (including one possibly appearing
in the last text position) are kept in the reordered text after the text
that they follow in the source text.
When this feature is not enabled, a paragraph separator at the last
position of the text before reordering will go to the first position
of the reordered text when the paragraph level is odd.ordarParaLTR
- specifies whether paragraph separators (B) must
receive level 0, so that successive paragraphs progress from left to right.setPara(java.lang.String, byte, byte[])
public boolean isOrderParagraphsLTR()
Bidi
object set to allocate level 0 to block
separators so that successive paragraphs progress from left to right?true
if the Bidi
object is set to
allocate level 0 to block separators.public byte getDirection()
LTR
, RTL
or MIXED
that indicates if the entire text
represented by this object is unidirectional,
and which direction, or if it is mixed-directional.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
LTR
,
RTL
,
MIXED
public String getTextAsString()
String
containing the text that the
Bidi
object was created for.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
setPara(java.lang.String, byte, byte[])
,
setLine(int, int)
public char[] getText()
char
array containing the text that the
Bidi
object was created for.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
setPara(java.lang.String, byte, byte[])
,
setLine(int, int)
public int getLength()
Bidi
object was
created for.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public int getProcessedLength()
setPara()
. This length may be different from the length of
the source text if option OPTION_STREAMING
has been
set.
setPara
(which receives unprocessed source text)
and getLength
(which returns the original length of the
source text).limit
argument of
setLine
charIndex
argument of
getParagraph
charIndex
argument of
getLevelAt
getLevels
logicalStart
argument of
getLogicalRun
logicalIndex
argument of
getVisualIndex
getLogicalMap
writeReordered
setPara
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
setPara(java.lang.String, byte, byte[])
,
OPTION_STREAMING
public int getResultLength()
setPara()
. This length may be different from the length
of the source text if option OPTION_INSERT_MARKS
or option OPTION_REMOVE_CONTROLS
has been set.
visualIndex
argument of
getLogicalIndex
getVisualMap
writeReordered
, or if option
REORDER_INVERSE_NUMBERS_AS_L
has been set.setPara
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
setPara(java.lang.String, byte, byte[])
,
OPTION_INSERT_MARKS
,
OPTION_REMOVE_CONTROLS
,
REORDER_INVERSE_NUMBERS_AS_L
public byte getParaLevel()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
LEVEL_DEFAULT_LTR
,
LEVEL_DEFAULT_RTL
,
getParagraph(int)
,
getParagraphByIndex(int)
public int countParagraphs()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public BidiRun getParagraphByIndex(int paraIndex)
paraIndex
- is the number of the paragraph, in the
range [0..countParagraphs()-1]
.start
will receive the index of the first character
of the paragraph in the text.limit
will receive the limit of the paragraph.embeddingLevel
will receive the level of the paragraph.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if paraIndex is not in the range
[0..countParagraphs()-1]
BidiRun
public BidiRun getParagraph(int charIndex)
charIndex
- is the index of a character within the text, in the
range [0..getProcessedLength()-1]
.start
will receive the index of the first character
of the paragraph in the text.limit
will receive the limit of the paragraph.embeddingLevel
will receive the level of the paragraph.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if charIndex is not within the legal rangeBidiRun
,
getParagraphByIndex(int)
,
getProcessedLength()
public int getParagraphIndex(int charIndex)
charIndex
- is the index of a character within the text, in the
range [0..getProcessedLength()-1]
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if charIndex is not within the legal rangeBidiRun
,
getProcessedLength()
public void setCustomClassifier(BidiClassifier classifier)
classifier
- A new custom classifier. This can be null.getCustomClassifier()
public BidiClassifier getCustomClassifier()
BidiClassifier
setCustomClassifier(com.ibm.icu.text.BidiClassifier)
public int getCustomizedClass(int c)
If a BidiClassifier
is defined and returns a value
other than UCharacter.getIntPropertyMaxValue(UProperty.BIDI_CLASS)+1
,
that value is used; otherwise the default class determination mechanism is invoked.
c
- The code point to get a Bidi class for.c
that is in effect
for this Bidi
instance.BidiClassifier
public Bidi setLine(int start, int limit)
setLine()
returns a Bidi
object to
contain the reordering information, especially the resolved levels,
for all the characters in a line of text. This line of text is
specified by referring to a Bidi
object representing
this information for a piece of text containing one or more paragraphs,
and by specifying a range of indexes in this text.
In the new line object, the indexes will range from 0 to limit-start-1
.
This is used after calling setPara()
for a piece of text, and after line-breaking on that text.
It is not necessary if each paragraph is treated as a single line.
After line-breaking, rules (L1) and (L2) for the treatment of
trailing WS and for reordering are performed on
a Bidi
object that represents a line.
Important: the line Bidi
object may
reference data within the global text Bidi
object.
You should not alter the content of the global text object until
you are finished using the line object.
start
- is the line's first index into the text.limit
- is just behind the line's last index into the text
(its last index +1).Bidi
object that will now represent a line of the text.IllegalStateException
- if this call is not preceded by a successful
call to setPara
IllegalArgumentException
- if start and limit are not in the range
0<=start<limit<=getProcessedLength()
,
or if the specified line crosses a paragraph boundarysetPara(java.lang.String, byte, byte[])
,
getProcessedLength()
public byte getLevelAt(int charIndex)
charIndex
- the index of a character.charIndex
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if charIndex is not in the range
0<=charIndex<getProcessedLength()
getProcessedLength()
public byte[] getLevels()
Note that this method may allocate memory under some
circumstances, unlike getLevelAt()
.
null
if an error occurs.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public BidiRun getLogicalRun(int logicalPosition)
This is especially useful for line-breaking on a paragraph.
logicalPosition
- is a logical position within the source text.start
containing
the first character of the run, limit
containing
the limit of the run, and embeddingLevel
containing
the level of the run.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if logicalPosition is not in the range
0<=logicalPosition<getProcessedLength()
BidiRun
,
BidiRun.getStart()
,
BidiRun.getLimit()
,
BidiRun.getEmbeddingLevel()
public int countRuns()
Bidi
object, after setPara()
may have resolved only the levels of the text. Therefore,
countRuns()
may have to allocate memory,
and may throw an exception if it fails to do so.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public BidiRun getVisualRun(int runIndex)
BidiRun
object according to its index. BidiRun methods
may be used to retrieve the run's logical start, length and level,
which can be even for an LTR run or odd for an RTL run.
In an RTL run, the character at the logical start is
visually on the right of the displayed run.
The length is the number of characters in the run.
countRuns()
is normally called
before the runs are retrieved.
Example:
Bidi bidi = new Bidi(); String text = "abc 123 DEFG xyz"; bidi.setPara(text, Bidi.RTL, null); int i, count=bidi.countRuns(), logicalStart, visualIndex=0, length; BidiRun run; for (i = 0; i < count; ++i) { run = bidi.getVisualRun(i); logicalStart = run.getStart(); length = run.getLength(); if (Bidi.LTR == run.getEmbeddingLevel()) { do { // LTR show_char(text.charAt(logicalStart++), visualIndex++); } while (--length > 0); } else { logicalStart += length; // logicalLimit do { // RTL show_char(text.charAt(--logicalStart), visualIndex++); } while (--length > 0); } }
Note that in right-to-left runs, code like this places second surrogates before first ones (which is generally a bad idea) and combining characters before base characters.
Use of
, optionally with the
writeReordered(int)
option, can be considered in
order to avoid these issues.KEEP_BASE_COMBINING
runIndex
- is the number of the run in visual order, in the
range [0..countRuns()-1]
.LTR==0
or RTL==1
,
never MIXED
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if runIndex
is not in
the range 0<=runIndex<countRuns()
countRuns()
,
BidiRun
,
BidiRun.getStart()
,
BidiRun.getLength()
,
BidiRun.getEmbeddingLevel()
public int getVisualIndex(int logicalIndex)
Bidi
object, then calling
getLogicalMap()
is more efficient.
The value returned may be MAP_NOWHERE
if there is no
visual position because the corresponding text character is a Bidi
control removed from output by the option
OPTION_REMOVE_CONTROLS
.
When the visual output is altered by using options of
writeReordered()
such as INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
, OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
, the visual position returned may not
be correct. It is advised to use, when possible, reordering options
such as OPTION_INSERT_MARKS
and OPTION_REMOVE_CONTROLS
.
Note that in right-to-left runs, this mapping places
second surrogates before first ones (which is generally a bad idea)
and combining characters before base characters.
Use of
, optionally with the
writeReordered(int)
option can be considered instead
of using the mapping, in order to avoid these issues.KEEP_BASE_COMBINING
logicalIndex
- is the index of a character in the text.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if logicalIndex
is not in
the range 0<=logicalIndex<getProcessedLength()
getLogicalMap()
,
getLogicalIndex(int)
,
getProcessedLength()
,
MAP_NOWHERE
,
OPTION_REMOVE_CONTROLS
,
writeReordered(int)
public int getLogicalIndex(int visualIndex)
Bidi
object, then calling
getVisualMap()
is more efficient.
The value returned may be MAP_NOWHERE
if there is no
logical position because the corresponding text character is a Bidi
mark inserted in the output by option
OPTION_INSERT_MARKS
.
This is the inverse method to getVisualIndex()
.
When the visual output is altered by using options of
writeReordered()
such as INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
, OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
, the logical position returned may not
be correct. It is advised to use, when possible, reordering options
such as OPTION_INSERT_MARKS
and OPTION_REMOVE_CONTROLS
.
visualIndex
- is the visual position of a character.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if visualIndex
is not in
the range 0<=visualIndex<getResultLength()
getVisualMap()
,
getVisualIndex(int)
,
getResultLength()
,
MAP_NOWHERE
,
OPTION_INSERT_MARKS
,
writeReordered(int)
public int[] getLogicalMap()
Bidi
(paragraph or line) object.
Some values in the map may be MAP_NOWHERE
if the
corresponding text characters are Bidi controls removed from the visual
output by the option OPTION_REMOVE_CONTROLS
.
When the visual output is altered by using options of
writeReordered()
such as INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
, OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
, the visual positions returned may not
be correct. It is advised to use, when possible, reordering options
such as OPTION_INSERT_MARKS
and OPTION_REMOVE_CONTROLS
.
Note that in right-to-left runs, this mapping places
second surrogates before first ones (which is generally a bad idea)
and combining characters before base characters.
Use of
, optionally with the
writeReordered(int)
option can be considered instead
of using the mapping, in order to avoid these issues.KEEP_BASE_COMBINING
getProcessedLength()
indexes which will reflect the reordering of the characters.indexMap[logicalIndex]==visualIndex
, where
indexMap
represents the returned array.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
getVisualMap()
,
getVisualIndex(int)
,
getProcessedLength()
,
MAP_NOWHERE
,
OPTION_REMOVE_CONTROLS
,
writeReordered(int)
public int[] getVisualMap()
Bidi
(paragraph or line) object.
Some values in the map may be MAP_NOWHERE
if the
corresponding text characters are Bidi marks inserted in the visual
output by the option OPTION_INSERT_MARKS
.
When the visual output is altered by using options of
writeReordered()
such as INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
, OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
, the logical positions returned may not
be correct. It is advised to use, when possible, reordering options
such as OPTION_INSERT_MARKS
and OPTION_REMOVE_CONTROLS
.
getResultLength()
indexes which will reflect the reordering of the characters.indexMap[visualIndex]==logicalIndex
, where
indexMap
represents the returned array.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
getLogicalMap()
,
getLogicalIndex(int)
,
getResultLength()
,
MAP_NOWHERE
,
OPTION_INSERT_MARKS
,
writeReordered(int)
public static int[] reorderLogical(byte[] levels)
Bidi
object.
It is intended to be used for when an application has determined the levels
of objects (character sequences) and just needs to have them reordered (L2).
This is equivalent to using getLogicalMap()
on a
Bidi
object.levels
- is an array of levels that have been determined by
the application.levels.length
indexes which will reflect the reordering of the characters.
The index map will result in
indexMap[logicalIndex]==visualIndex
, where
indexMap
represents the returned array.
public static int[] reorderVisual(byte[] levels)
Bidi
object.
It is intended to be used for when an application has determined the levels
of objects (character sequences) and just needs to have them reordered (L2).
This is equivalent to using getVisualMap()
on a
Bidi
object.levels
- is an array of levels that have been determined by
the application.levels.length
indexes which will reflect the reordering of the characters.
The index map will result in
indexMap[visualIndex]==logicalIndex
, where
indexMap
represents the returned array.
public static int[] invertMap(int[] srcMap)
srcMap
- is an array whose elements define the original mapping
from a source array to a destination array.
Some elements of the source array may have no mapping in the
destination array. In that case, their value will be
the special value MAP_NOWHERE
.
All elements must be >=0 or equal to MAP_NOWHERE
.
Some elements in the source map may have a value greater than the
srcMap.length if the destination array has more elements than the
source array.
There must be no duplicate indexes (two or more elements with the
same value except MAP_NOWHERE
).srcMap
.
For elements of the result array which have no matching elements
in the source array, the corresponding elements in the inverse
map will receive a value equal to MAP_NOWHERE
.
If element with index i in srcMap
has a value k different
from MAP_NOWHERE
, this means that element i of
the source array maps to element k in the destination array.
The inverse map will have value i in its k-th element.
For all elements of the destination array which do not map to
an element in the source array, the corresponding element in the
inverse map will have a value equal to MAP_NOWHERE
.MAP_NOWHERE
public Bidi createLineBidi(int lineStart, int lineLimit)
lineStart
- the offset from the start of the paragraph to the start
of the line.lineLimit
- the offset from the start of the paragraph to the limit
of the line.IllegalStateException
- if this call is not preceded by a successful
call to setPara
IllegalArgumentException
- if lineStart and lineLimit are not in the range
0<=lineStart<lineLimit<=getProcessedLength()
,
or if the specified line crosses a paragraph boundarypublic boolean isMixed()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
public boolean isLeftToRight()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
public boolean isRightToLeft()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
public boolean baseIsLeftToRight()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public int getBaseLevel()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public int getRunCount()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
public int getRunLevel(int run)
run
- the index of the run, between 0 and countRuns()-1
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if run
is not in
the range 0<=run<countRuns()
public int getRunStart(int run)
run
- the index of the run, between 0 and countRuns()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if run
is not in
the range 0<=run<countRuns()
public int getRunLimit(int run)
run
- the index of the run, between 0 and countRuns()
IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
IllegalArgumentException
- if run
is not in
the range 0<=run<countRuns()
public static boolean requiresBidi(char[] text, int start, int limit)
text
- the text containing the characters to teststart
- the start of the range of characters to testlimit
- the limit of the range of characters to testpublic static void reorderVisually(byte[] levels, int levelStart, Object[] objects, int objectStart, int count)
index
from
objectStart
up to objectStart + count
in the
objects array will be reordered into visual order assuming
each run of text has the level indicated by the corresponding element in
the levels array (at index - objectStart + levelStart
).levels
- an array representing the bidi level of each objectlevelStart
- the start position in the levels arrayobjects
- the array of objects to be reordered into visual orderobjectStart
- the start position in the objects arraycount
- the number of objects to reorderpublic String writeReordered(int options)
Bidi
object containing the reordering
information for a piece of text (one or more paragraphs) set by
setPara()
or for a line of text set by setLine()
and return a string containing the reordered text.
The text may have been aliased (only a reference was stored
without copying the contents), thus it must not have been modified
since the setPara()
call.
This method preserves the integrity of characters with multiple
code units and (optionally) combining characters.
Characters in RTL runs can be replaced by mirror-image characters
in the returned string. Note that "real" mirroring has to be done in a
rendering engine by glyph selection and that for many "mirrored"
characters there are no Unicode characters as mirror-image equivalents.
There are also options to insert or remove Bidi control
characters; see the descriptions of the return value and the
options
parameter, and of the option bit flags.
options
- A bit set of options for the reordering that control
how the reordered text is written.
The options include mirroring the characters on a code
point basis and inserting LRM characters, which is used
especially for transforming visually stored text
to logically stored text (although this is still an
imperfect implementation of an "inverse Bidi" algorithm
because it uses the "forward Bidi" algorithm at its core).
The available options are:
DO_MIRRORING
,
INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
,
OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
,
STREAMING
INSERT_LRM_FOR_NUMERIC
option is set, then
the length of the returned string could be as large as
getLength()+2*countRuns()
.REMOVE_BIDI_CONTROLS
option is set, then the
length of the returned string may be less than
getLength()
.getProcessedLength()
.IllegalStateException
- if this call is not preceded by a successful
call to setPara
or setLine
DO_MIRRORING
,
INSERT_LRM_FOR_NUMERIC
,
KEEP_BASE_COMBINING
,
OUTPUT_REVERSE
,
REMOVE_BIDI_CONTROLS
,
OPTION_STREAMING
,
getProcessedLength()
public static String writeReverse(String src, int options)
writeReordered()
. For detailed descriptions
of the parameters, see there.
Since no Bidi controls are inserted here, the output string length
will never exceed src.length()
.src
- The RTL run text.options
- A bit set of options for the reordering that control
how the reordered text is written.
See the options
parameter in writeReordered()
.REMOVE_BIDI_CONTROLS
option
is set, then the length of the returned string may be less than
src.length()
. If this option is not set,
then the length of the returned string will be exactly
src.length()
.IllegalArgumentException
- if src
is null.writeReordered(int)
Copyright © 2016 Unicode, Inc. and others.