ICU 74.1 74.1
Namespaces | Typedefs | Functions
ucasemap.h File Reference

C API: Unicode case mapping functions using a UCaseMap service object. More...

#include "unicode/utypes.h"
#include "unicode/stringoptions.h"
#include "unicode/ustring.h"
#include "unicode/localpointer.h"

Go to the source code of this file.

Namespaces

namespace  icu
 File coll.h.
 

Typedefs

typedef struct UCaseMap UCaseMap
 C typedef for struct UCaseMap. More...
 

Functions

U_CAPI UCaseMapucasemap_open (const char *locale, uint32_t options, UErrorCode *pErrorCode)
 Open a UCaseMap service object for a locale and a set of options. More...
 
U_CAPI void ucasemap_close (UCaseMap *csm)
 Close a UCaseMap service object. More...
 
U_CAPI const char * ucasemap_getLocale (const UCaseMap *csm)
 Get the locale ID that is used for language-dependent case mappings. More...
 
U_CAPI uint32_t ucasemap_getOptions (const UCaseMap *csm)
 Get the options bit set that is used for case folding and string comparisons. More...
 
U_CAPI void ucasemap_setLocale (UCaseMap *csm, const char *locale, UErrorCode *pErrorCode)
 Set the locale ID that is used for language-dependent case mappings. More...
 
U_CAPI void ucasemap_setOptions (UCaseMap *csm, uint32_t options, UErrorCode *pErrorCode)
 Set the options bit set that is used for case folding and string comparisons. More...
 
U_CAPI const UBreakIteratorucasemap_getBreakIterator (const UCaseMap *csm)
 Get the break iterator that is used for titlecasing. More...
 
U_CAPI void ucasemap_setBreakIterator (UCaseMap *csm, UBreakIterator *iterToAdopt, UErrorCode *pErrorCode)
 Set the break iterator that is used for titlecasing. More...
 
U_CAPI int32_t ucasemap_toTitle (UCaseMap *csm, UChar *dest, int32_t destCapacity, const UChar *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-16 string. More...
 
U_CAPI int32_t ucasemap_utf8ToLower (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Lowercase the characters in a UTF-8 string. More...
 
U_CAPI int32_t ucasemap_utf8ToUpper (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Uppercase the characters in a UTF-8 string. More...
 
U_CAPI int32_t ucasemap_utf8ToTitle (UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-8 string. More...
 
U_CAPI int32_t ucasemap_utf8FoldCase (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Case-folds the characters in a UTF-8 string. More...
 

Detailed Description

C API: Unicode case mapping functions using a UCaseMap service object.

The service object takes care of memory allocations, data loading, and setup for the attributes, as usual.

Currently, the functionality provided here does not overlap with uchar.h and ustring.h, except for ucasemap_toTitle().

ucasemap_utf8XYZ() functions operate directly on UTF-8 strings.

Definition in file ucasemap.h.

Typedef Documentation

◆ UCaseMap

typedef struct UCaseMap UCaseMap

C typedef for struct UCaseMap.

Stable:
ICU 3.4

Definition at line 51 of file ucasemap.h.

Function Documentation

◆ ucasemap_close()

U_CAPI void ucasemap_close ( UCaseMap csm)

Close a UCaseMap service object.

Parameters
csmObject to be closed.
Stable:
ICU 3.4

◆ ucasemap_getBreakIterator()

U_CAPI const UBreakIterator * ucasemap_getBreakIterator ( const UCaseMap csm)

Get the break iterator that is used for titlecasing.

Do not modify the returned break iterator.

Parameters
csmUCaseMap service object.
Returns
titlecasing break iterator
Stable:
ICU 3.8

◆ ucasemap_getLocale()

U_CAPI const char * ucasemap_getLocale ( const UCaseMap csm)

Get the locale ID that is used for language-dependent case mappings.

Parameters
csmUCaseMap service object.
Returns
locale ID
Stable:
ICU 3.4

◆ ucasemap_getOptions()

U_CAPI uint32_t ucasemap_getOptions ( const UCaseMap csm)

Get the options bit set that is used for case folding and string comparisons.

Parameters
csmUCaseMap service object.
Returns
options bit set
Stable:
ICU 3.4

◆ ucasemap_open()

U_CAPI UCaseMap * ucasemap_open ( const char *  locale,
uint32_t  options,
UErrorCode pErrorCode 
)

Open a UCaseMap service object for a locale and a set of options.

The locale ID and options are preprocessed so that functions using the service object need not process them in each call.

Parameters
localeICU locale ID, used for language-dependent upper-/lower-/title-casing according to the Unicode standard. Usual semantics: ""=root, NULL=default locale, etc.
optionsOptions bit set, used for case folding and string comparisons. Same flags as for u_foldCase(), u_strFoldCase(), u_strCaseCompare(), etc. Use 0 or U_FOLD_CASE_DEFAULT for default behavior.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
Pointer to a UCaseMap service object, if successful.
See also
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.4

◆ ucasemap_setBreakIterator()

U_CAPI void ucasemap_setBreakIterator ( UCaseMap csm,
UBreakIterator iterToAdopt,
UErrorCode pErrorCode 
)

Set the break iterator that is used for titlecasing.

The UCaseMap service object releases a previously set break iterator and "adopts" this new one, taking ownership of it. It will be released in a subsequent call to ucasemap_setBreakIterator() or ucasemap_close().

Break iterator operations are not thread-safe. Therefore, titlecasing functions use non-const UCaseMap objects. It is not possible to titlecase strings concurrently using the same UCaseMap.

Parameters
csmUCaseMap service object.
iterToAdoptBreak iterator to be adopted for titlecasing.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_toTitle
ucasemap_utf8ToTitle
Stable:
ICU 3.8

◆ ucasemap_setLocale()

U_CAPI void ucasemap_setLocale ( UCaseMap csm,
const char *  locale,
UErrorCode pErrorCode 
)

Set the locale ID that is used for language-dependent case mappings.

Parameters
csmUCaseMap service object.
localeLocale ID, see ucasemap_open().
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_open
Stable:
ICU 3.4

◆ ucasemap_setOptions()

U_CAPI void ucasemap_setOptions ( UCaseMap csm,
uint32_t  options,
UErrorCode pErrorCode 
)

Set the options bit set that is used for case folding and string comparisons.

Parameters
csmUCaseMap service object.
optionsOptions bit set, see ucasemap_open().
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_open
Stable:
ICU 3.4

◆ ucasemap_toTitle()

U_CAPI int32_t ucasemap_toTitle ( UCaseMap csm,
UChar dest,
int32_t  destCapacity,
const UChar src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-16 string.

This function is almost a duplicate of u_strToTitle(), except that it takes ucasemap_setOptions() into account and has performance advantages from being able to use a UCaseMap object for multiple case mapping operations, saving setup time.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setText(), first() and next() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object. This pointer is non-const! See the note above for details.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of UChars). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToTitle
Stable:
ICU 3.8

◆ ucasemap_utf8FoldCase()

U_CAPI int32_t ucasemap_utf8FoldCase ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Case-folds the characters in a UTF-8 string.

Case-folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strFoldCase
ucasemap_setOptions
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
Stable:
ICU 3.8

◆ ucasemap_utf8ToLower()

U_CAPI int32_t ucasemap_utf8ToLower ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Lowercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToLower
Stable:
ICU 3.4

◆ ucasemap_utf8ToTitle()

U_CAPI int32_t ucasemap_utf8ToTitle ( UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-8 string.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object. This pointer is non-const! See the note above for details.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToTitle
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.8

◆ ucasemap_utf8ToUpper()

U_CAPI int32_t ucasemap_utf8ToUpper ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Uppercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToUpper
Stable:
ICU 3.4