public abstract class Collator extends Object implements Comparator<Object>, Freezable<Collator>, Cloneable
java.text.Collator
. Methods, fields, and other functionality specific to ICU are labeled '[icu]'.
Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.
A Collator is thread-safe only when frozen. See isFrozen()
and Freezable
.
Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:
For more information about the collation service see the User Guide.
Examples of use
// Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if (usCollator.compare("abc", "ABC") == 0) { System.out.println("Strings are equivalent"); } The following example shows how to compare two strings using the Collator for the default locale. // Compare two strings in the default locale Collator myCollator = Collator.getInstance(); myCollator.setDecomposition(NO_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition"); myCollator.setDecomposition(CANONICAL_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition"); } else { System.out.println("à\u0325 is equals to a\u0325̀ with decomposition"); } } else { System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition"); }
RuleBasedCollator
,
CollationKey
Modifier and Type | Class and Description |
---|---|
static class |
Collator.CollatorFactory
A factory used with registerFactory to register multiple collators and provide
display names for them.
|
static interface |
Collator.ReorderCodes
Reordering codes for non-script groups that can be reordered under collation.
|
Modifier and Type | Field and Description |
---|---|
static int |
CANONICAL_DECOMPOSITION
Decomposition mode value.
|
static int |
FULL_DECOMPOSITION
[icu] Note: This is for backwards compatibility with Java APIs only. |
static int |
IDENTICAL
Smallest Collator strength value.
|
static int |
NO_DECOMPOSITION
Decomposition mode value.
|
static int |
PRIMARY
Strongest collator strength value.
|
static int |
QUATERNARY
[icu] Fourth level collator strength value.
|
static int |
SECONDARY
Second level collator strength value.
|
static int |
TERTIARY
Third level collator strength value.
|
Modifier | Constructor and Description |
---|---|
protected |
Collator()
Empty default constructor to make javadocs happy
|
Modifier and Type | Method and Description |
---|---|
Object |
clone()
Clones the collator.
|
Collator |
cloneAsThawed()
Provides for the clone operation.
|
int |
compare(Object source,
Object target)
Compares the source Object to the target Object.
|
abstract int |
compare(String source,
String target)
Compares the source text String to the target text String according to
this Collator's rules, strength and decomposition mode.
|
protected int |
doCompare(CharSequence left,
CharSequence right)
Deprecated.
This API is ICU internal only.
|
boolean |
equals(Object obj)
Compares the equality of two Collator objects.
|
boolean |
equals(String source,
String target)
Compares the equality of two text Strings using
this Collator's rules, strength and decomposition mode.
|
Collator |
freeze()
Freezes the collator.
|
static Locale[] |
getAvailableLocales()
Returns the set of locales, as Locale objects, for which collators
are installed.
|
static ULocale[] |
getAvailableULocales()
[icu] Returns the set of locales, as ULocale objects, for which collators
are installed.
|
abstract CollationKey |
getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient
repeated comparison.
|
int |
getDecomposition()
Returns the decomposition mode of this Collator.
|
static String |
getDisplayName(Locale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default
DISPLAY locale. |
static String |
getDisplayName(Locale objectLocale,
Locale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
displayLocale.
|
static String |
getDisplayName(ULocale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default
DISPLAY locale. |
static String |
getDisplayName(ULocale objectLocale,
ULocale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
displayLocale.
|
static int[] |
getEquivalentReorderCodes(int reorderCode)
Retrieves all the reorder codes that are grouped with the given reorder code.
|
static ULocale |
getFunctionalEquivalent(String keyword,
ULocale locID)
[icu] Returns the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service.
|
static ULocale |
getFunctionalEquivalent(String keyword,
ULocale locID,
boolean[] isAvailable)
[icu] Returns the functionally equivalent locale for the given
requested locale, with respect to given keyword, for the
collation service.
|
static Collator |
getInstance()
Returns the Collator for the current default locale.
|
static Collator |
getInstance(Locale locale)
Returns the Collator for the desired locale.
|
static Collator |
getInstance(ULocale locale)
[icu] Returns the Collator for the desired locale.
|
static String[] |
getKeywords()
[icu] Returns an array of all possible keywords that are relevant to
collation.
|
static String[] |
getKeywordValues(String keyword)
[icu] Given a keyword, returns an array of all values for
that keyword that are currently in use.
|
static String[] |
getKeywordValuesForLocale(String key,
ULocale locale,
boolean commonlyUsed)
[icu] Given a key and a locale, returns an array of string values in a preferred
order that would make a difference.
|
ULocale |
getLocale(ULocale.Type type)
[icu] Returns the locale that was used to create this object, or null.
|
int |
getMaxVariable()
[icu] Returns the maximum reordering group whose characters are affected by
the alternate handling behavior.
|
abstract RawCollationKey |
getRawCollationKey(String source,
RawCollationKey key)
[icu] Returns the simpler form of a CollationKey for the String source following
the rules of this Collator and stores the result into the user provided argument
key.
|
int[] |
getReorderCodes()
Retrieves the reordering codes for this collator.
|
int |
getStrength()
Returns this Collator's strength attribute.
|
UnicodeSet |
getTailoredSet()
[icu] Returns a UnicodeSet that contains all the characters and sequences tailored
in this collator.
|
abstract VersionInfo |
getUCAVersion()
[icu] Returns the UCA version of this collator object.
|
abstract int |
getVariableTop()
[icu] Gets the variable top value of a Collator.
|
abstract VersionInfo |
getVersion()
[icu] Returns the version of this collator object.
|
int |
hashCode()
Generates a hash code for this Collator object.
|
boolean |
isFrozen()
Determines whether the object has been frozen or not.
|
static Object |
registerFactory(Collator.CollatorFactory factory)
[icu] Registers a collator factory.
|
static Object |
registerInstance(Collator collator,
ULocale locale)
[icu] Registers a collator as the default collator for the provided locale.
|
void |
setDecomposition(int decomposition)
Sets the decomposition mode of this Collator.
|
Collator |
setMaxVariable(int group)
[icu] Sets the variable top to the top of the specified reordering group.
|
void |
setReorderCodes(int... order)
Sets the reordering codes for this collator.
|
void |
setStrength(int newStrength)
Sets this Collator's strength attribute.
|
Collator |
setStrength2(int newStrength)
Deprecated.
This API is ICU internal only.
|
abstract void |
setVariableTop(int varTop)
Deprecated.
ICU 53 Call setMaxVariable() instead.
|
abstract int |
setVariableTop(String varTop)
Deprecated.
ICU 53 Call
setMaxVariable(int) instead. |
static boolean |
unregister(Object registryKey)
[icu] Unregisters a collator previously registered using registerInstance.
|
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
comparing, comparing, comparingDouble, comparingInt, comparingLong, naturalOrder, nullsFirst, nullsLast, reversed, reverseOrder, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
public static final int PRIMARY
setStrength(int)
,
getStrength()
,
Constant Field Valuespublic static final int SECONDARY
setStrength(int)
,
getStrength()
,
Constant Field Valuespublic static final int TERTIARY
setStrength(int)
,
getStrength()
,
Constant Field Valuespublic static final int QUATERNARY
setStrength(int)
,
getStrength()
,
Constant Field Valuespublic static final int IDENTICAL
Note this value is different from JDK's
public static final int FULL_DECOMPOSITION
[icu] Note: This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.
public static final int NO_DECOMPOSITION
Note this value is different from the JDK's.
CANONICAL_DECOMPOSITION
,
getDecomposition()
,
setDecomposition(int)
,
Constant Field Valuespublic static final int CANONICAL_DECOMPOSITION
CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.
NO_DECOMPOSITION
,
getDecomposition()
,
setDecomposition(int)
,
Constant Field Valuesprotected Collator()
public boolean equals(Object obj)
The base class checks for null and for equal types. Subclasses should override.
equals
in interface Comparator<Object>
equals
in class Object
obj
- the Collator to compare to.public int hashCode()
The implementation exists just for consistency with equals(Object)
implementation in this class and does not generate a useful hash code.
Subclasses should override this implementation.
public void setStrength(int newStrength)
The base class method does nothing. Subclasses should override it if appropriate.
See the Collator class description for an example of use.
newStrength
- the new strength value.IllegalArgumentException
- if the new strength value is not valid.getStrength()
,
PRIMARY
,
SECONDARY
,
TERTIARY
,
QUATERNARY
,
IDENTICAL
@Deprecated public Collator setStrength2(int newStrength)
public void setDecomposition(int decomposition)
Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.
The base class method does nothing. Subclasses should override it if appropriate.
See getDecomposition for a description of decomposition mode.
decomposition
- the new decomposition modeIllegalArgumentException
- If the given value is not a valid
decomposition mode.getDecomposition()
,
NO_DECOMPOSITION
,
CANONICAL_DECOMPOSITION
public void setReorderCodes(int... order)
Collator.ReorderCodes
entries.
By default, reordering codes specified for the start of the order are placed in the
order given after several special non-script blocks. These special groups of characters
are space, punctuation, symbol, currency, and digit. These special groups are represented with
Collator.ReorderCodes
entries. Script groups can be intermingled with
these special non-script groups if those special groups are explicitly specified in the reordering.
The special code OTHERS
stands for any script that is not explicitly
mentioned in the list of reordering codes given. Anything that is after OTHERS
will go at the very end of the reordering in the order given.
The special reorder code DEFAULT
will reset the reordering for this collator
to the default for this collator. The default reordering may be the DUCET/CLDR order or may be a reordering that
was specified when this collator was created from resource data or from rules. The
DEFAULT code must be the sole code supplied when it is used.
If not, then an IllegalArgumentException
will be thrown.
The special reorder code NONE
will remove any reordering for this collator.
The result of setting no reordering will be to have the DUCET/CLDR ordering used. The
NONE code must be the sole code supplied when it is used.
order
- the reordering codes to apply to this collator; if this is null or an empty array
then this clears any existing reorderinggetReorderCodes()
,
getEquivalentReorderCodes(int)
,
Collator.ReorderCodes
,
UScript
public static final Collator getInstance()
Locale.getDefault()
,
getInstance(Locale)
public Object clone() throws CloneNotSupportedException
clone
in class Object
CloneNotSupportedException
public static final Collator getInstance(ULocale locale)
For some languages, multiple collation types are available; for example, "de@collation=phonebook". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.
locale
- the desired locale.Locale
,
ResourceBundle
,
getInstance(Locale)
,
getInstance()
public static final Collator getInstance(Locale locale)
For some languages, multiple collation types are available;
for example, "de-u-co-phonebk".
Starting with ICU 54, collation attributes can be specified via locale keywords as well,
in the old locale extension syntax ("el@colCaseFirst=upper", only with ULocale
)
or in language tag syntax ("el-u-kf-upper").
See User Guide: Collation API.
locale
- the desired locale.Locale
,
ResourceBundle
,
getInstance(ULocale)
,
getInstance()
public static final Object registerInstance(Collator collator, ULocale locale)
Because ICU may choose to cache Collator objects internally, this must be called at application startup, prior to any calls to Collator.getInstance to avoid undefined behavior.
collator
- the collator to registerlocale
- the locale for which this is the default collatorpublic static final Object registerFactory(Collator.CollatorFactory factory)
Because ICU may choose to cache Collator objects internally, this must be called at application startup, prior to any calls to Collator.getInstance to avoid undefined behavior.
factory
- the factory to registerpublic static final boolean unregister(Object registryKey)
registryKey
- the object previously returned by registerInstance.public static Locale[] getAvailableLocales()
public static final ULocale[] getAvailableULocales()
public static final String[] getKeywords()
getKeywordValues(java.lang.String)
public static final String[] getKeywordValues(String keyword)
keyword
- one of the keywords returned by getKeywords.getKeywords()
public static final String[] getKeywordValuesForLocale(String key, ULocale locale, boolean commonlyUsed)
key
- one of the keys supported by this service. For now, only
"collation" is supported.locale
- the localecommonlyUsed
- if set to true it will return only commonly used values
with the given locale in preferred order. Otherwise,
it will return all the available values for the locale.public static final ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
keyword
- a particular keyword as enumerated by
getKeywords.locID
- The requested localeisAvailable
- If non-null, isAvailable[0] will receive and
output boolean that indicates whether the requested locale was
'available' to the collation service. If non-null, isAvailable
must have length >= 1.public static final ULocale getFunctionalEquivalent(String keyword, ULocale locID)
keyword
- a particular keyword as enumerated by
getKeywords.locID
- The requested localegetFunctionalEquivalent(String,ULocale,boolean[])
public static String getDisplayName(Locale objectLocale, Locale displayLocale)
objectLocale
- the locale of the collatordisplayLocale
- the locale for the collator's display namepublic static String getDisplayName(ULocale objectLocale, ULocale displayLocale)
objectLocale
- the locale of the collatordisplayLocale
- the locale for the collator's display namepublic static String getDisplayName(Locale objectLocale)
DISPLAY
locale.objectLocale
- the locale of the collatorULocale.Category.DISPLAY
public static String getDisplayName(ULocale objectLocale)
DISPLAY
locale.objectLocale
- the locale of the collatorULocale.Category.DISPLAY
public int getStrength()
[icu] Note: This can return QUATERNARY strength, which is not supported by the JDK version.
See the Collator class description for more details.
The base class method always returns TERTIARY
.
Subclasses should override it if appropriate.
setStrength(int)
,
PRIMARY
,
SECONDARY
,
TERTIARY
,
QUATERNARY
,
IDENTICAL
public int getDecomposition()
See the Collator class description for more details.
The base class method always returns NO_DECOMPOSITION
.
Subclasses should override it if appropriate.
setDecomposition(int)
,
NO_DECOMPOSITION
,
CANONICAL_DECOMPOSITION
public boolean equals(String source, String target)
source
- the source string to be compared.target
- the target string to be compared.NullPointerException
- thrown if either arguments is null.compare(java.lang.String, java.lang.String)
public UnicodeSet getTailoredSet()
public abstract int compare(String source, String target)
source
- the source String.target
- the target String.NullPointerException
- thrown if either argument is null.CollationKey
,
getCollationKey(java.lang.String)
public int compare(Object source, Object target)
compare
in interface Comparator<Object>
source
- the source Object.target
- the target Object.ClassCastException
- thrown if either arguments cannot be cast to CharSequence.@Deprecated protected int doCompare(CharSequence left, CharSequence right)
public abstract CollationKey getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.
Note that collation keys are often less efficient than simply doing comparison. For more details, see the ICU User Guide.
See the CollationKey class documentation for more information.
source
- the string to be transformed into a CollationKey.CollationKey
,
compare(String, String)
,
getRawCollationKey(java.lang.String, com.ibm.icu.text.RawCollationKey)
public abstract RawCollationKey getRawCollationKey(String source, RawCollationKey key)
Note that collation keys are often less efficient than simply doing comparison. For more details, see the ICU User Guide.
source
- the text String to be transformed into a RawCollationKeycompare(String, String)
,
getCollationKey(java.lang.String)
,
RawCollationKey
public Collator setMaxVariable(int group)
The base class implementation throws an UnsupportedOperationException.
group
- one of Collator.ReorderCodes.SPACE, Collator.ReorderCodes.PUNCTUATION,
Collator.ReorderCodes.SYMBOL, Collator.ReorderCodes.CURRENCY;
or Collator.ReorderCodes.DEFAULT to restore the default max variable groupgetMaxVariable()
public int getMaxVariable()
The base class implementation returns Collator.ReorderCodes.PUNCTUATION.
setMaxVariable(int)
@Deprecated public abstract int setVariableTop(String varTop)
setMaxVariable(int)
instead.Beginning with ICU 53, the variable top is pinned to
the top of one of the supported reordering groups,
and it must not be beyond the last of those groups.
See setMaxVariable(int)
.
varTop
- one or more (if contraction) characters to which the
variable top should be setIllegalArgumentException
- is thrown if varTop argument is not a valid variable top element. A variable top element is
invalid when
getVariableTop()
,
RuleBasedCollator.setAlternateHandlingShifted(boolean)
public abstract int getVariableTop()
getMaxVariable()
@Deprecated public abstract void setVariableTop(int varTop)
Beginning with ICU 53, the variable top is pinned to
the top of one of the supported reordering groups,
and it must not be beyond the last of those groups.
See setMaxVariable(int)
.
varTop
- primary weight, as returned by setVariableTop or getVariableTopgetVariableTop()
,
setVariableTop(String)
public abstract VersionInfo getVersion()
public abstract VersionInfo getUCAVersion()
public int[] getReorderCodes()
setReorderCodes(int...)
,
getEquivalentReorderCodes(int)
,
Collator.ReorderCodes
,
UScript
public static int[] getEquivalentReorderCodes(int reorderCode)
reorderCode
- The reorder code to determine equivalence for.setReorderCodes(int...)
,
getReorderCodes()
,
Collator.ReorderCodes
,
UScript
public boolean isFrozen()
An unfrozen Collator is mutable and not thread-safe. A frozen Collator is immutable and thread-safe.
public Collator freeze()
public Collator cloneAsThawed()
cloneAsThawed
in interface Freezable<Collator>
public ULocale getLocale(ULocale.Type type)
Note: This method will be implemented in ICU 3.0; ICU 2.8 contains a partial preview implementation. The actual locale is returned correctly, but the valid locale is not, in most cases.
The base class method always returns ULocale.ROOT
.
Subclasses should override it if appropriate.
type
- type of information requested, either ULocale.VALID_LOCALE
or ULocale.ACTUAL_LOCALE
.ULocale
,
ULocale.VALID_LOCALE
,
ULocale.ACTUAL_LOCALE
Copyright © 2016 Unicode, Inc. and others.