ICU 74.1 74.1
Macros | Typedefs | Functions
ucoleitr.h File Reference

C API: UCollationElements. More...

#include "unicode/utypes.h"
#include "unicode/ucol.h"

Go to the source code of this file.

Macros

#define UCOL_NULLORDER   ((int32_t)0xFFFFFFFF)
 This indicates an error has occurred during processing or if no more CEs is to be returned. More...
 

Typedefs

typedef struct UCollationElements UCollationElements
 The UCollationElements struct. More...
 

Functions

U_CAPI UCollationElementsucol_openElements (const UCollator *coll, const UChar *text, int32_t textLength, UErrorCode *status)
 Open the collation elements for a string. More...
 
U_CAPI int32_t ucol_keyHashCode (const uint8_t *key, int32_t length)
 get a hash code for a key... Not very useful! More...
 
U_CAPI void ucol_closeElements (UCollationElements *elems)
 Close a UCollationElements. More...
 
U_CAPI void ucol_reset (UCollationElements *elems)
 Reset the collation elements to their initial state. More...
 
U_CAPI int32_t ucol_next (UCollationElements *elems, UErrorCode *status)
 Get the ordering priority of the next collation element in the text. More...
 
U_CAPI int32_t ucol_previous (UCollationElements *elems, UErrorCode *status)
 Get the ordering priority of the previous collation element in the text. More...
 
U_CAPI int32_t ucol_getMaxExpansion (const UCollationElements *elems, int32_t order)
 Get the maximum length of any expansion sequences that end with the specified comparison order. More...
 
U_CAPI void ucol_setText (UCollationElements *elems, const UChar *text, int32_t textLength, UErrorCode *status)
 Set the text containing the collation elements. More...
 
U_CAPI int32_t ucol_getOffset (const UCollationElements *elems)
 Get the offset of the current source character. More...
 
U_CAPI void ucol_setOffset (UCollationElements *elems, int32_t offset, UErrorCode *status)
 Set the offset of the current source character. More...
 
U_CAPI int32_t ucol_primaryOrder (int32_t order)
 Get the primary order of a collation order. More...
 
U_CAPI int32_t ucol_secondaryOrder (int32_t order)
 Get the secondary order of a collation order. More...
 
U_CAPI int32_t ucol_tertiaryOrder (int32_t order)
 Get the tertiary order of a collation order. More...
 

Detailed Description

C API: UCollationElements.

The UCollationElements API is used as an iterator to walk through each character of an international string. Use the iterator to return the ordering priority of the positioned character. The ordering priority of a character, which we refer to as a key, defines how a character is collated in the given collation object. For example, consider the following in Slovak and in traditional Spanish collation:

.       "ca" -> the first key is key('c') and second key is key('a').
.       "cha" -> the first key is key('ch') and second key is key('a').

And in German phonebook collation,

.       "<ae ligature>b"-> the first key is key('a'), the second key is key('e'), and
.       the third key is key('b').

Example of the iterator usage: (without error checking)

.  void CollationElementIterator_Example()
.  {
.      UChar *s;
.      t_int32 order, primaryOrder;
.      UCollationElements *c;
.      UCollatorOld *coll;
.      UErrorCode success = U_ZERO_ERROR;
.      str=(UChar*)malloc(sizeof(UChar) * (strlen("This is a test")+1) );
.      u_uastrcpy(str, "This is a test");
.      coll = ucol_open(NULL, &success);
.      c = ucol_openElements(coll, str, u_strlen(str), &status);
.      order = ucol_next(c, &success);
.      ucol_reset(c);
.      order = ucol_prev(c, &success);
.      free(str);
.      ucol_close(coll);
.      ucol_closeElements(c);
.  }

<p<blockquote>‍

ucol_next() returns the collation order of the next. ucol_prev() returns the collation order of the previous character. The Collation Element Iterator moves only in one direction between calls to ucol_reset. That is, ucol_next() and ucol_prev can not be inter-used. Whenever ucol_prev is to be called after ucol_next() or vice versa, ucol_reset has to be called first to reset the status, shifting pointers to either the end or the start of the string. Hence at the next call of ucol_prev or ucol_next, the first or last collation order will be returned. If a change of direction is done without a ucol_reset, the result is undefined. The result of a forward iterate (ucol_next) and reversed result of the
backward iterate (ucol_prev) on the same string are equivalent, if collation orders with the value 0 are ignored. Character based on the comparison level of the collator. A collation order consists of primary order, secondary order and tertiary order. The data type of the collation order is int32_t.

See also
UCollator

Definition in file ucoleitr.h.

Macro Definition Documentation

◆ UCOL_NULLORDER

#define UCOL_NULLORDER   ((int32_t)0xFFFFFFFF)

This indicates an error has occurred during processing or if no more CEs is to be returned.

Stable:
ICU 2.0

Definition at line 30 of file ucoleitr.h.

Typedef Documentation

◆ UCollationElements

The UCollationElements struct.

For usage in C programs.

Stable:
ICU 2.0

Definition at line 39 of file ucoleitr.h.

Function Documentation

◆ ucol_closeElements()

U_CAPI void ucol_closeElements ( UCollationElements elems)

Close a UCollationElements.

Once closed, a UCollationElements may no longer be used.

Parameters
elemsThe UCollationElements to close.
Stable:
ICU 2.0

◆ ucol_getMaxExpansion()

U_CAPI int32_t ucol_getMaxExpansion ( const UCollationElements elems,
int32_t  order 
)

Get the maximum length of any expansion sequences that end with the specified comparison order.

This is useful for .... ?

Parameters
elemsThe UCollationElements containing the text.
orderA collation order returned by previous or next.
Returns
maximum size of the expansion sequences ending with the collation element or 1 if collation element does not occur at the end of any expansion sequence
Stable:
ICU 2.0

◆ ucol_getOffset()

U_CAPI int32_t ucol_getOffset ( const UCollationElements elems)

Get the offset of the current source character.

This is an offset into the text of the character containing the current collation elements.

Parameters
elemsThe UCollationElements to query.
Returns
The offset of the current source character.
See also
ucol_setOffset
Stable:
ICU 2.0

◆ ucol_keyHashCode()

U_CAPI int32_t ucol_keyHashCode ( const uint8_t *  key,
int32_t  length 
)

get a hash code for a key... Not very useful!

Parameters
keythe given key.
lengththe size of the key array.
Returns
the hash code.
Stable:
ICU 2.0

◆ ucol_next()

U_CAPI int32_t ucol_next ( UCollationElements elems,
UErrorCode status 
)

Get the ordering priority of the next collation element in the text.

A single character may contain more than one collation element.

Parameters
elemsThe UCollationElements containing the text.
statusA pointer to a UErrorCode to receive any errors.
Returns
The next collation elements ordering, otherwise returns UCOL_NULLORDER if an error has occurred or if the end of string has been reached
Stable:
ICU 2.0

◆ ucol_openElements()

U_CAPI UCollationElements * ucol_openElements ( const UCollator coll,
const UChar text,
int32_t  textLength,
UErrorCode status 
)

Open the collation elements for a string.

The UCollationElements retains a pointer to the supplied text. The caller must not modify or delete the text while the UCollationElements object is used to iterate over this text.

Parameters
collThe collator containing the desired collation rules.
textThe text to iterate over.
textLengthThe number of characters in text, or -1 if null-terminated
statusA pointer to a UErrorCode to receive any errors.
Returns
a struct containing collation element information
Stable:
ICU 2.0

◆ ucol_previous()

U_CAPI int32_t ucol_previous ( UCollationElements elems,
UErrorCode status 
)

Get the ordering priority of the previous collation element in the text.

A single character may contain more than one collation element. Note that internally a stack is used to store buffered collation elements.

Parameters
elemsThe UCollationElements containing the text.
statusA pointer to a UErrorCode to receive any errors. Notably a U_BUFFER_OVERFLOW_ERROR is returned if the internal stack buffer has been exhausted.
Returns
The previous collation elements ordering, otherwise returns UCOL_NULLORDER if an error has occurred or if the start of string has been reached.
Stable:
ICU 2.0

◆ ucol_primaryOrder()

U_CAPI int32_t ucol_primaryOrder ( int32_t  order)

Get the primary order of a collation order.

Parameters
orderthe collation order
Returns
the primary order of a collation order.
Stable:
ICU 2.6

◆ ucol_reset()

U_CAPI void ucol_reset ( UCollationElements elems)

Reset the collation elements to their initial state.

This will move the 'cursor' to the beginning of the text. Property settings for collation will be reset to the current status.

Parameters
elemsThe UCollationElements to reset.
See also
ucol_next
ucol_previous
Stable:
ICU 2.0

◆ ucol_secondaryOrder()

U_CAPI int32_t ucol_secondaryOrder ( int32_t  order)

Get the secondary order of a collation order.

Parameters
orderthe collation order
Returns
the secondary order of a collation order.
Stable:
ICU 2.6

◆ ucol_setOffset()

U_CAPI void ucol_setOffset ( UCollationElements elems,
int32_t  offset,
UErrorCode status 
)

Set the offset of the current source character.

This is an offset into the text of the character to be processed. Property settings for collation will remain the same. In order to reset the iterator to the current collation property settings, the API reset() has to be called.

Parameters
elemsThe UCollationElements to set.
offsetThe desired character offset.
statusA pointer to a UErrorCode to receive any errors.
See also
ucol_getOffset
Stable:
ICU 2.0

◆ ucol_setText()

U_CAPI void ucol_setText ( UCollationElements elems,
const UChar text,
int32_t  textLength,
UErrorCode status 
)

Set the text containing the collation elements.

Property settings for collation will remain the same. In order to reset the iterator to the current collation property settings, the API reset() has to be called.

The UCollationElements retains a pointer to the supplied text. The caller must not modify or delete the text while the UCollationElements object is used to iterate over this text.

Parameters
elemsThe UCollationElements to set.
textThe source text containing the collation elements.
textLengthThe length of text, or -1 if null-terminated.
statusA pointer to a UErrorCode to receive any errors.
See also
ucol_getText
Stable:
ICU 2.0

◆ ucol_tertiaryOrder()

U_CAPI int32_t ucol_tertiaryOrder ( int32_t  order)

Get the tertiary order of a collation order.

Parameters
orderthe collation order
Returns
the tertiary order of a collation order.
Stable:
ICU 2.6