public abstract class Transliterator extends Object implements StringTransform
Transliterator
is an abstract class that transliterates text from one format to another. The most common
kind of transliterator is a script, or alphabet, transliterator. For example, a Russian to Latin transliterator
changes Russian text written in Cyrillic characters to phonetically equivalent Latin characters. It does not
translate Russian to English! Transliteration, unlike translation, operates on characters, without reference
to the meanings of words and sentences.
Although script conversion is its most common use, a transliterator can actually perform a more general class of
tasks. In fact, Transliterator
defines a very general API which specifies only that a segment of the
input text is replaced by new text. The particulars of this conversion are determined entirely by subclasses of
Transliterator
.
Transliterators are stateless
Transliterator
objects are stateless; they retain no information between calls to
transliterate()
. As a result, threads may share transliterators without synchronizing them. This might
seem to limit the complexity of the transliteration operation. In practice, subclasses perform complex
transliterations by delaying the replacement of text until it is known that no other replacements are possible. In
other words, although the Transliterator
objects are stateless, the source text itself embodies all the
needed information, and delayed operation allows arbitrary complexity.
Batch transliteration
The simplest way to perform transliteration is all at once, on a string of existing text. This is referred to as
batch transliteration. For example, given a string input
and a transliterator t
,
the call
String result = t.transliterate(input);
will transliterate it and return the result. Other methods allow the client to specify a substring to be
transliterated and to use Replaceable
objects instead of strings, in order to preserve out-of-band
information (such as text styles).
Keyboard transliteration
Somewhat more involved is keyboard, or incremental transliteration. This is the transliteration of text that is arriving from some source (typically the user's keyboard) one character at a time, or in some other piecemeal fashion.
In keyboard transliteration, a Replaceable
buffer stores the text. As text is inserted, as much as
possible is transliterated on the fly. This means a GUI that displays the contents of the buffer may show text being
modified as each new character arrives.
Consider the simple rule-based Transliterator:
th>{theta}
t>{tau}
When the user types 't', nothing will happen, since the transliterator is waiting to see if the next character is
'h'. To remedy this, we introduce the notion of a cursor, marked by a '|' in the output string:
t>|{tau}
{tau}h>{theta}
Now when the user types 't', tau appears, and if the next character is 'h', the tau changes to a theta. This is
accomplished by maintaining a cursor position (independent of the insertion point, and invisible in the GUI) across
calls to transliterate()
. Typically, the cursor will be coincident with the insertion point, but in a
case like the one above, it will precede the insertion point.
Keyboard transliteration methods maintain a set of three indices that are updated with each call to
transliterate()
, including the cursor, start, and limit. These indices are changed by the method, and
they are passed in and out via a Position object. The start
index marks the beginning of the substring
that the transliterator will look at. It is advanced as text becomes committed (but it is not the committed index;
that's the cursor
). The cursor
index, described above, marks the point at which the
transliterator last stopped, either because it reached the end, or because it required more characters to
disambiguate between possible inputs. The cursor
can also be explicitly set by rules.
Any characters before the cursor
index are frozen; future keyboard
transliteration calls within this input sequence will not change them. New text is inserted at the limit
index, which marks the end of the substring that the transliterator looks at.
Because keyboard transliteration assumes that more characters are to arrive, it is conservative in its operation. It
only transliterates when it can do so unambiguously. Otherwise it waits for more characters to arrive. When the
client code knows that no more characters are forthcoming, perhaps because the user has performed some input
termination operation, then it should call finishTransliteration()
to complete any pending
transliterations.
Inverses
Pairs of transliterators may be inverses of one another. For example, if transliterator A transliterates
characters by incrementing their Unicode value (so "abc" -> "def"), and transliterator B decrements character
values, then A is an inverse of B and vice versa. If we compose A with B in a compound
transliterator, the result is the identity transliterator, that is, a transliterator that does not change its input
text.
The Transliterator
method getInverse()
returns a transliterator's inverse, if one exists,
or null
otherwise. However, the result of getInverse()
usually will not be a true
mathematical inverse. This is because true inverse transliterators are difficult to formulate. For example, consider
two transliterators: AB, which transliterates the character 'A' to 'B', and BA, which transliterates
'B' to 'A'. It might seem that these are exact inverses, since
"A" x AB -> "B"where 'x' represents transliteration. However,
"B" x BA -> "A"
"ABCD" x AB -> "BBCD"so AB composed with BA is not the identity. Nonetheless, BA may be usefully considered to be AB's inverse, and it is on this basis that AB
"BBCD" x BA -> "AACD"
.getInverse()
could legitimately return
BA.
Filtering
Each transliterator has a filter, which restricts changes to those characters selected by the filter. The filter affects just the characters that are changed -- the characters outside of the filter are still part of the context for the filter. For example, in the following even though 'x' is filtered out, and doesn't convert to y, it does affect the conversion of 'a'.
String rules = "x > y; x{a} > b; "; Transliterator tempTrans = Transliterator.createFromRules("temp", rules, Transliterator.FORWARD); tempTrans.setFilter(new UnicodeSet("[a]")); String tempResult = tempTrans.transform("xa"); // results in "xb"
IDs and display names
A transliterator is designated by a short identifier string or ID. IDs follow the format source-destination, where source describes the entity being replaced, and destination describes the entity replacing source. The entities may be the names of scripts, particular sequences of characters, or whatever else it is that the transliterator converts to or from. For example, a transliterator from Russian to Latin might be named "Russian-Latin". A transliterator from keyboard escape sequences to Latin-1 characters might be named "KeyboardEscape-Latin1". By convention, system entity names are in English, with the initial letters of words capitalized; user entity names may follow any format so long as they do not contain dashes.
In addition to programmatic IDs, transliterator objects have display names for presentation in user interfaces,
returned by getDisplayName(java.lang.String)
.
Factory methods and registration
In general, client code should use the factory method getInstance()
to obtain an instance of a
transliterator given its ID. Valid IDs may be enumerated using getAvailableIDs()
. Since transliterators
are stateless, multiple calls to getInstance()
with the same ID will return the same object.
In addition to the system transliterators registered at startup, user transliterators may be registered by calling
registerInstance()
at run time. To register a transliterator subclass without instantiating it (until it
is needed), users may call registerClass()
.
Composed transliterators
In addition to built-in system transliterators like "Latin-Greek", there are also built-in composed transliterators. These are implemented by composing two or more component transliterators. For example, if we have scripts "A", "B", "C", and "D", and we want to transliterate between all pairs of them, then we need to write 12 transliterators: "A-B", "A-C", "A-D", "B-A",..., "D-A", "D-B", "D-C". If it is possible to convert all scripts to an intermediate script "M", then instead of writing 12 rule sets, we only need to write 8: "A~M", "B~M", "C~M", "D~M", "M~A", "M~B", "M~C", "M~D". (This might not seem like a big win, but it's really 2n vs. n 2 - n, so as n gets larger the gain becomes significant. With 9 scripts, it's 18 vs. 72 rule sets, a big difference.) Note the use of "~" rather than "-" for the script separator here; this indicates that the given transliterator is intended to be composed with others, rather than be used as is.
Composed transliterators can be instantiated as usual. For example, the system transliterator "Devanagari-Gujarati" is a composed transliterator built internally as "Devanagari~InterIndic;InterIndic~Gujarati". When this transliterator is instantiated, it appears externally to be a standard transliterator (e.g., getID() returns "Devanagari-Gujarati").
Subclassing
Subclasses must implement the abstract method handleTransliterate()
.
Subclasses should override the transliterate()
method taking a Replaceable
and the
transliterate()
method taking a String
and StringBuffer
if the performance of
these methods can be improved over the performance obtained by the default implementations in this class.
Rule syntax
A set of rules determines how to perform translations. Rules within a rule set are separated by semicolons (';'). To include a literal semicolon, prefix it with a backslash ('\'). Unicode Pattern_White_Space is ignored. If the first non-blank character on a line is '#', the entire line is ignored as a comment.
Each set of rules consists of two groups, one forward, and one reverse. This is a convention that is not enforced; rules for one direction may be omitted, with the result that translations in that direction will not modify the source text. In addition, bidirectional forward-reverse rules may be specified for symmetrical transformations.
Note: Another description of the Transliterator rule syntax is available in section Transform Rules Syntax of UTS #35: Unicode LDML. The rules are shown there using arrow symbols ← and → and ↔. ICU supports both those and the equivalent ASCII symbols < and > and <>.
Rule statements take one of the following forms:
$alefmadda=\\u0622;
$alefmadda
", will be replaced by
the Unicode character U+0622. Variable names must begin
with a letter and consist only of letters, digits, and
underscores. Case is significant. Duplicate names cause
an exception to be thrown, that is, variables cannot be
redefined. The right hand side may contain well-formed
text of any length, including no text at all ("$empty=;
").
The right hand side may contain embedded UnicodeSet
patterns, for example, "$softvowel=[eiyEIY]
".ai>$alefmadda;
ai<$alefmadda;
ai<>$alefmadda;
Translation rules consist of a match pattern and an output
string. The match pattern consists of literal characters,
optionally preceded by context, and optionally followed by
context. Context characters, like literal pattern characters,
must be matched in the text being transliterated. However, unlike
literal pattern characters, they are not replaced by the output
text. For example, the pattern "abc{def}
"
indicates the characters "def
" must be
preceded by "abc
" for a successful match.
If there is a successful match, "def
" will
be replaced, but not "abc
". The final '}
'
is optional, so "abc{def
" is equivalent to
"abc{def}
". Another example is "{123}456
"
(or "123}456
") in which the literal
pattern "123
" must be followed by "456
".
The output string of a forward or reverse rule consists of
characters to replace the literal pattern characters. If the
output string contains the character '|
', this is
taken to indicate the location of the cursor after
replacement. The cursor is the point in the text at which the
next replacement, if any, will be applied. The cursor is usually
placed within the replacement text; however, it can actually be
placed into the precending or following context by using the
special character '@'. Examples:
a {foo} z > | @ bar; # foo -> bar, move cursor before a {foo} xyz > bar @@|; # foo -> bar, cursor between y and z
UnicodeSet
UnicodeSet
patterns may appear anywhere that
makes sense. They may appear in variable definitions.
Contrariwise, UnicodeSet
patterns may themselves
contain variable references, such as "$a=[a-z];$not_a=[^$a]
",
or "$range=a-z;$ll=[$range]
".
UnicodeSet
patterns may also be embedded directly
into rule strings. Thus, the following two rules are equivalent:
$vowel=[aeiou]; $vowel>'*'; # One way to do this [aeiou]>'*'; # Another way
See UnicodeSet
for more documentation and examples.
Segments
Segments of the input string can be matched and copied to the output string. This makes certain sets of rules simpler and more general, and makes reordering possible. For example:
([a-z]) > $1 $1; # double lowercase letters ([:Lu:]) ([:Ll:]) > $2 $1; # reverse order of Lu-Ll pairs
The segment of the input string to be copied is delimited by
"(
" and ")
". Up to
nine segments may be defined. Segments may not overlap. In the
output string, "$1
" through "$9
"
represent the input string segments, in left-to-right order of
definition.
Anchors
Patterns can be anchored to the beginning or the end of the text. This is done with the
special characters '^
' and '$
'. For example:
^ a > 'BEG_A'; # match 'a' at start of text a > 'A'; # match other instances of 'a' z $ > 'END_Z'; # match 'z' at end of text z > 'Z'; # match other instances of 'z'
It is also possible to match the beginning or the end of the text using a UnicodeSet
.
This is done by including a virtual anchor character '$
' at the end of the
set pattern. Although this is usually the match character for the end anchor, the set will
match either the beginning or the end of the text, depending on its placement. For
example:
$x = [a-z$]; # match 'a' through 'z' OR anchor $x 1 > 2; # match '1' after a-z or at the start 3 $x > 4; # match '3' before a-z or at the end
Example
The following example rules illustrate many of the features of the rule language.
Rule 1. | abc{def}>x|y |
Rule 2. | xyz>r |
Rule 3. | yz>q |
Applying these rules to the string "adefabcdefz
"
yields the following results:
|adefabcdefz |
Initial state, no rules match. Advance cursor. |
a|defabcdefz |
Still no match. Rule 1 does not match because the preceding context is not present. |
ad|efabcdefz |
Still no match. Keep advancing until there is a match... |
ade|fabcdefz |
... |
adef|abcdefz |
... |
adefa|bcdefz |
... |
adefab|cdefz |
... |
adefabc|defz |
Rule 1 matches; replace "def "
with "xy " and back up the cursor
to before the 'y '. |
adefabcx|yz |
Although "xyz " is
present, rule 2 does not match because the cursor is
before the 'y ', not before the 'x '.
Rule 3 does match. Replace "yz "
with "q ". |
adefabcxq| |
The cursor is at the end; transliteration is complete. |
The order of rules is significant. If multiple rules may match at some point, the first matching rule is applied.
Forward and reverse rules may have an empty output string. Otherwise, an empty left or right hand side of any statement is a syntax error.
Single quotes are used to quote any character other than a
digit or letter. To specify a single quote itself, inside or
outside of quotes, use two single quotes in a row. For example,
the rule "'>'>o''clock
" changes the
string ">
" to the string "o'clock
".
Notes
While a Transliterator is being built from rules, it checks that the rules are added in proper order. For example, if the rule "a>x" is followed by the rule "ab>y", then the second rule will throw an exception. The reason is that the second rule can never be triggered, since the first rule always matches anything it matches. In other words, the first rule masks the second rule.
Modifier and Type | Class and Description |
---|---|
static interface |
Transliterator.Factory
The factory interface for transliterators.
|
static class |
Transliterator.Position
Position structure for incremental transliteration.
|
Modifier and Type | Field and Description |
---|---|
static int |
FORWARD
Direction constant indicating the forward direction in a transliterator,
e.g., the forward rules of a rule-based Transliterator.
|
static int |
REVERSE
Direction constant indicating the reverse direction in a transliterator,
e.g., the reverse rules of a rule-based Transliterator.
|
Modifier | Constructor and Description |
---|---|
protected |
Transliterator(String ID,
UnicodeFilter filter)
Default constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addSourceTargetSet(UnicodeSet inputFilter,
UnicodeSet sourceSet,
UnicodeSet targetSet)
Deprecated.
This API is ICU internal only.
|
protected String |
baseToRules(boolean escapeUnprintable)
Returns a rule string for this transliterator.
|
static Transliterator |
createFromRules(String ID,
String rules,
int dir)
Returns a
Transliterator object constructed from
the given rule string. |
void |
filteredTransliterate(Replaceable text,
Transliterator.Position index,
boolean incremental)
Transliterate a substring of text, as specified by index, taking filters
into account.
|
void |
finishTransliteration(Replaceable text,
Transliterator.Position index)
Finishes any pending transliterations that were waiting for
more characters.
|
static Enumeration<String> |
getAvailableIDs()
Returns an enumeration over the programmatic names of registered
Transliterator objects. |
static Enumeration<String> |
getAvailableSources()
Returns an enumeration over the source names of registered
transliterators.
|
static Enumeration<String> |
getAvailableTargets(String source)
Returns an enumeration over the target names of registered
transliterators having a given source name.
|
static Enumeration<String> |
getAvailableVariants(String source,
String target)
Returns an enumeration over the variant names of registered
transliterators having a given source name and target name.
|
static String |
getDisplayName(String ID)
Returns a name for this transliterator that is appropriate for
display to the user in the default
DISPLAY locale. |
static String |
getDisplayName(String id,
Locale inLocale)
Returns a name for this transliterator that is appropriate for
display to the user in the given locale.
|
static String |
getDisplayName(String id,
ULocale inLocale)
Returns a name for this transliterator that is appropriate for
display to the user in the given locale.
|
Transliterator[] |
getElements()
Return the elements that make up this transliterator.
|
UnicodeFilter |
getFilter()
Returns the filter used by this transliterator, or null
if this transliterator uses no filter.
|
UnicodeSet |
getFilterAsUnicodeSet(UnicodeSet externalFilter)
Deprecated.
This API is ICU internal only.
|
String |
getID()
Returns a programmatic identifier for this transliterator.
|
static Transliterator |
getInstance(String ID)
Returns a
Transliterator object given its ID. |
static Transliterator |
getInstance(String ID,
int dir)
Returns a
Transliterator object given its ID. |
Transliterator |
getInverse()
Returns this transliterator's inverse.
|
int |
getMaximumContextLength()
Returns the length of the longest context required by this transliterator.
|
UnicodeSet |
getSourceSet()
Returns the set of all characters that may be modified in the
input text by this Transliterator.
|
UnicodeSet |
getTargetSet()
Returns the set of all characters that may be generated as
replacement text by this transliterator.
|
protected UnicodeSet |
handleGetSourceSet()
Framework method that returns the set of all characters that
may be modified in the input text by this Transliterator,
ignoring the effect of this object's filter.
|
protected abstract void |
handleTransliterate(Replaceable text,
Transliterator.Position pos,
boolean incremental)
Abstract method that concrete subclasses define to implement
their transliteration algorithm.
|
static void |
registerAlias(String aliasID,
String realID)
Register an ID as an alias of another ID.
|
static void |
registerAny()
Deprecated.
This API is ICU internal only.
|
static void |
registerClass(String ID,
Class<? extends Transliterator> transClass,
String displayName)
Registers a subclass of
Transliterator with the
system. |
static void |
registerFactory(String ID,
Transliterator.Factory factory)
Register a factory object with the given ID.
|
static void |
registerInstance(Transliterator trans)
Register a Transliterator object with the given ID.
|
void |
setFilter(UnicodeFilter filter)
Changes the filter used by this transliterator.
|
protected void |
setID(String id)
Set the programmatic identifier for this transliterator.
|
protected void |
setMaximumContextLength(int a)
Method for subclasses to use to set the maximum context length.
|
String |
toRules(boolean escapeUnprintable)
Returns a rule string for this transliterator.
|
String |
transform(String source)
Implements StringTransform via this method.
|
void |
transliterate(Replaceable text)
Transliterates an entire string in place.
|
int |
transliterate(Replaceable text,
int start,
int limit)
Transliterates a segment of a string, with optional filtering.
|
void |
transliterate(Replaceable text,
Transliterator.Position index)
Transliterates the portion of the text buffer that can be
transliterated unambiguosly.
|
void |
transliterate(Replaceable text,
Transliterator.Position index,
int insertion)
Transliterates the portion of the text buffer that can be
transliterated unambiguosly after a new character has been
inserted, typically as a result of a keyboard event.
|
void |
transliterate(Replaceable text,
Transliterator.Position index,
String insertion)
Transliterates the portion of the text buffer that can be
transliterated unambiguosly after new text has been inserted,
typically as a result of a keyboard event.
|
String |
transliterate(String text)
Transliterate an entire string and returns the result.
|
static void |
unregister(String ID)
Unregisters a transliterator or class.
|
public static final int FORWARD
public static final int REVERSE
protected Transliterator(String ID, UnicodeFilter filter)
ID
- the string identifier for this transliteratorfilter
- the filter. Any character for which
filter.contains() returns false will not be
altered by this transliterator. If filter is
null then no filtering is applied.public final int transliterate(Replaceable text, int start, int limit)
text
- the string to be transliteratedstart
- the beginning index, inclusive; 0 <= start
<= limit
.limit
- the ending index, exclusive; start <= limit
<= text.length()
.[start,
limit)
has been transliterated, possibly to a string of a different
length, at [start,
new-limit)
, where
new-limit is the return value. If the input offsets are out of bounds,
the returned value is -1 and the input string remains unchanged.public final void transliterate(Replaceable text)
text
- the string to be transliteratedpublic final String transliterate(String text)
text
- the string to be transliteratedpublic final void transliterate(Replaceable text, Transliterator.Position index, String insertion)
insertion
will be inserted into text
at index.contextLimit
, advancing
index.contextLimit
by insertion.length()
.
Then the transliterator will try to transliterate characters of
text
between index.start
and
index.contextLimit
. Characters before
index.start
will not be changed.
Upon return, values in index
will be updated.
index.contextStart
will be advanced to the first
character that future calls to this method will read.
index.start
and index.contextLimit
will
be adjusted to delimit the range of text that future calls to
this method may change.
Typical usage of this method begins with an initial call
with index.contextStart
and index.contextLimit
set to indicate the portion of text
to be
transliterated, and index.start == index.contextStart
.
Thereafter, index
can be used without
modification in future calls, provided that all changes to
text
are made via this method.
This method assumes that future calls may be made that will
insert new text into the buffer. As a result, it only performs
unambiguous transliterations. After the last call to this
method, there may be untransliterated text that is waiting for
more input to resolve an ambiguity. In order to perform these
pending transliterations, clients should call finishTransliteration(com.ibm.icu.text.Replaceable, com.ibm.icu.text.Transliterator.Position)
after the last call to this
method has been made.
text
- the buffer holding transliterated and untransliterated textindex
- the start and limit of the text, the position
of the cursor, and the start and limit of transliteration.insertion
- text to be inserted and possibly
transliterated into the translation buffer at
index.contextLimit
. If null
then no text
is inserted.IllegalArgumentException
- if index
is invalidhandleTransliterate(com.ibm.icu.text.Replaceable, com.ibm.icu.text.Transliterator.Position, boolean)
public final void transliterate(Replaceable text, Transliterator.Position index, int insertion)
transliterate(Replaceable,
Transliterator.Position, String)
for details.text
- the buffer holding transliterated and
untransliterated textindex
- the start and limit of the text, the position
of the cursor, and the start and limit of transliteration.insertion
- text to be inserted and possibly
transliterated into the translation buffer at
index.contextLimit
.transliterate(Replaceable, Transliterator.Position, String)
public final void transliterate(Replaceable text, Transliterator.Position index)
transliterate(Replaceable, Transliterator.Position,
String)
for details.text
- the buffer holding transliterated and
untransliterated textindex
- the start and limit of the text, the position
of the cursor, and the start and limit of transliteration.transliterate(Replaceable, Transliterator.Position, String)
public final void finishTransliteration(Replaceable text, Transliterator.Position index)
transliterate()
.text
- the buffer holding transliterated and
untransliterated text.index
- the array of indices previously passed to transliterate(com.ibm.icu.text.Replaceable, int, int)
protected abstract void handleTransliterate(Replaceable text, Transliterator.Position pos, boolean incremental)
originalStart
refer to the value of
pos.start
upon entry.
incremental
is false, then this method
should transliterate all characters between
pos.start
and pos.limit
. Upon return
pos.start
must == pos.limit
.incremental
is true, then this method
should transliterate all characters between
pos.start
and pos.limit
that can be
unambiguously transliterated, regardless of future insertions
of text at pos.limit
. Upon return,
pos.start
should be in the range
[originalStart
, pos.limit
).
pos.start
should be positioned such that
characters [originalStart
,
pos.start
) will not be changed in the future by this
transliterator and characters [pos.start
,
pos.limit
) are unchanged.Implementations of this method should also obey the following invariants:
pos.limit
and pos.contextLimit
should be updated to reflect changes in length of the text
between pos.start
and pos.limit
. The
difference pos.contextLimit - pos.limit
should
not change.pos.contextStart
should not change.pos.start
nor
pos.limit
should be less than
originalStart
.originalStart
and text after
pos.limit
should not change.pos.contextStart
and text after
pos.contextLimit
should be ignored.Subclasses may safely assume that all characters in
[pos.start
, pos.limit
) are filtered.
In other words, the filter has already been applied by the time
this method is called. See
filteredTransliterate()
.
This method is not for public consumption. Calling
this method directly will transliterate
[pos.start
, pos.limit
) without
applying the filter. End user code should call
transliterate()
instead of this method. Subclass code
should call filteredTransliterate()
instead of
this method.
text
- the buffer holding transliterated and
untransliterated textpos
- the indices indicating the start, limit, context
start, and context limit of the text.incremental
- if true, assume more text may be inserted at
pos.limit
and act accordingly. Otherwise,
transliterate all text between pos.start
and
pos.limit
and move pos.start
up to
pos.limit
.transliterate(com.ibm.icu.text.Replaceable, int, int)
public void filteredTransliterate(Replaceable text, Transliterator.Position index, boolean incremental)
text
- the text to be transliteratedindex
- the position indicesincremental
- if true, then assume more characters may be inserted
at index.limit, and postpone processing to accommodate future incoming
characterspublic final int getMaximumContextLength()
setMaximumContextLength()
.
For example, if a transliterator translates "ddd" (where
d is any digit) to "555" when preceded by "(ddd)", then the preceding
context length is 5, the length of "(ddd)".protected void setMaximumContextLength(int a)
getMaximumContextLength()
public final String getID()
getInstance()
, it
will return this object, if it has been registered.registerClass(java.lang.String, java.lang.Class<? extends com.ibm.icu.text.Transliterator>, java.lang.String)
,
getAvailableIDs()
protected final void setID(String id)
public static final String getDisplayName(String ID)
DISPLAY
locale. See getDisplayName(String,Locale)
for details.ULocale.Category.DISPLAY
public static String getDisplayName(String id, Locale inLocale)
java.text
package.
If no localized names exist in the system resource bundles,
a name is synthesized using a localized
MessageFormat
pattern from the resource data. The
arguments to this pattern are an integer followed by one or two
strings. The integer is the number of strings, either 1 or 2.
The strings are formed by splitting the ID for this
transliterator at the first '-'. If there is no '-', then the
entire ID forms the only string.
inLocale
- the Locale in which the display name should be
localized.MessageFormat
public static String getDisplayName(String id, ULocale inLocale)
java.text
package.
If no localized names exist in the system resource bundles,
a name is synthesized using a localized
MessageFormat
pattern from the resource data. The
arguments to this pattern are an integer followed by one or two
strings. The integer is the number of strings, either 1 or 2.
The strings are formed by splitting the ID for this
transliterator at the first '-'. If there is no '-', then the
entire ID forms the only string.
inLocale
- the ULocale in which the display name should be
localized.MessageFormat
public final UnicodeFilter getFilter()
public void setFilter(UnicodeFilter filter)
Callers must take care if a transliterator is in use by multiple threads. The filter should not be changed by one thread while another thread may be transliterating.
public static final Transliterator getInstance(String ID)
Transliterator
object given its ID.
The ID must be either a system transliterator ID or a ID registered
using registerClass()
.ID
- a valid ID, as enumerated by getAvailableIDs()
Transliterator
object with the given IDIllegalArgumentException
- if the given ID is invalid.public static Transliterator getInstance(String ID, int dir)
Transliterator
object given its ID.
The ID must be either a system transliterator ID or a ID registered
using registerClass()
.ID
- a valid ID, as enumerated by getAvailableIDs()
dir
- either FORWARD or REVERSE. If REVERSE then the
inverse of the given ID is instantiated.Transliterator
object with the given IDIllegalArgumentException
- if the given ID is invalid.registerClass(java.lang.String, java.lang.Class<? extends com.ibm.icu.text.Transliterator>, java.lang.String)
,
getAvailableIDs()
,
getID()
public static final Transliterator createFromRules(String ID, String rules, int dir)
Transliterator
object constructed from
the given rule string. This will be a rule-based Transliterator,
if the rule string contains only rules, or a
compound Transliterator, if it contains ID blocks, or a
null Transliterator, if it contains ID blocks which parse as
empty for the given direction.ID
- the id for the transliterator.rules
- rules, separated by ';'dir
- either FORWARD or REVERSE.IllegalArgumentException
- if there is a problem with the ID or the rulespublic String toRules(boolean escapeUnprintable)
escapeUnprintable
- if true, then unprintable characters
will be converted to escape form backslash-'u' or
backslash-'U'.protected final String baseToRules(boolean escapeUnprintable)
escapeUnprintable
- if true, then unprintable characters
will be converted to escape form backslash-'u' or
backslash-'U'.public Transliterator[] getElements()
If this transliterator is not composed of other transliterators, then this method will return an array of length one containing a reference to this transliterator.
public final UnicodeSet getSourceSet()
handleGetSourceSet()
to return a more precise result. The
return result is approximate in any case and is intended for
use by tests, tools, or utilities.getTargetSet()
,
handleGetSourceSet()
protected UnicodeSet handleGetSourceSet()
getSourceSet()
,
getTargetSet()
public UnicodeSet getTargetSet()
Warning. You might expect an empty filter to always produce an empty target. However, consider the following:
[Pp]{}[ΣςσϷϸϺϻ] > \';With a filter of [], you still get some elements in the target set, because this rule will still match. It could be recast to the following if it were important.
[Pp]{([ΣςσϷϸϺϻ])} > \' | $1;
getTargetSet()
@Deprecated public void addSourceTargetSet(UnicodeSet inputFilter, UnicodeSet sourceSet, UnicodeSet targetSet)
SHOULD BE OVERRIDDEN BY SUBCLASSES. It is probably an error for any transliterator to NOT override this, but we can't force them to for backwards compatibility.
Other methods vector through this.
When gathering the information on source and target, the compound transliterator makes things complicated. For example, suppose we have:
Global FILTER = [ax] a > b; :: NULL; b > c; x > d;While the filter just allows a and x, b is an intermediate result, which could produce c. So the source and target sets cannot be gathered independently. What we have to do is filter the sources for the first transliterator according to the global filter, intersect that transliterator's filter. Based on that we get the target. The next transliterator gets as a global filter (global + last target). And so on.
There is another complication:
Global FILTER = [ax] a >|b; b >c;Even though b would be filtered from the input, whenever we have a backup, it could be part of the input. So ideally we will change the global filter as we go.
targetSet
- TODOgetTargetSet()
@Deprecated public UnicodeSet getFilterAsUnicodeSet(UnicodeSet externalFilter)
public final Transliterator getInverse()
getID()
returns "A-B", then this method will return the result of
getInstance("B-A")
, or null
if that
call fails.
Subclasses with knowledge of their inverse may wish to override this method.
null
if no such
transliterator is registered.registerClass(java.lang.String, java.lang.Class<? extends com.ibm.icu.text.Transliterator>, java.lang.String)
public static void registerClass(String ID, Class<? extends Transliterator> transClass, String displayName)
Transliterator
with the
system. This subclass must have a public constructor taking no
arguments. When that constructor is called, the resulting
object must return the ID
passed to this method if
its getID()
method is called.ID
- the result of getID()
for this
transliteratortransClass
- a subclass of Transliterator
unregister(java.lang.String)
public static void registerFactory(String ID, Transliterator.Factory factory)
Because ICU may choose to cache Transliterator objects internally, this must be called at application startup, prior to any calls to Transliterator.getInstance to avoid undefined behavior.
ID
- the ID of this transliteratorfactory
- the factory objectpublic static void registerInstance(Transliterator trans)
Because ICU may choose to cache Transliterator objects internally, this must be called at application startup, prior to any calls to Transliterator.getInstance to avoid undefined behavior.
trans
- the Transliterator objectpublic static void registerAlias(String aliasID, String realID)
Because ICU may choose to cache Transliterator objects internally, this must be called at application startup, prior to any calls to Transliterator.getInstance to avoid undefined behavior.
aliasID
- The new ID being registered.realID
- The existing ID that the new ID should be an alias of.public static void unregister(String ID)
ID
- the ID of the transliterator or classregisterClass(java.lang.String, java.lang.Class<? extends com.ibm.icu.text.Transliterator>, java.lang.String)
public static final Enumeration<String> getAvailableIDs()
Transliterator
objects. This includes both system
transliterators and user transliterators registered using
registerClass()
. The enumerated names may be
passed to getInstance()
.Enumeration
over String
objectsgetInstance(java.lang.String)
,
registerClass(java.lang.String, java.lang.Class<? extends com.ibm.icu.text.Transliterator>, java.lang.String)
public static final Enumeration<String> getAvailableSources()
public static final Enumeration<String> getAvailableTargets(String source)
public static final Enumeration<String> getAvailableVariants(String source, String target)
@Deprecated public static void registerAny()
Copyright © 2016 Unicode, Inc. and others.