ICU 75.1 75.1
|
C API: Unicode case mapping functions using a UCaseMap service object. More...
#include "unicode/utypes.h"
#include "unicode/stringoptions.h"
#include "unicode/ustring.h"
#include "unicode/localpointer.h"
Go to the source code of this file.
Namespaces | |
namespace | icu |
File coll.h. | |
Typedefs | |
typedef struct UCaseMap | UCaseMap |
C typedef for struct UCaseMap. | |
Functions | |
U_CAPI UCaseMap * | ucasemap_open (const char *locale, uint32_t options, UErrorCode *pErrorCode) |
Open a UCaseMap service object for a locale and a set of options. | |
U_CAPI void | ucasemap_close (UCaseMap *csm) |
Close a UCaseMap service object. | |
U_CAPI const char * | ucasemap_getLocale (const UCaseMap *csm) |
Get the locale ID that is used for language-dependent case mappings. | |
U_CAPI uint32_t | ucasemap_getOptions (const UCaseMap *csm) |
Get the options bit set that is used for case folding and string comparisons. | |
U_CAPI void | ucasemap_setLocale (UCaseMap *csm, const char *locale, UErrorCode *pErrorCode) |
Set the locale ID that is used for language-dependent case mappings. | |
U_CAPI void | ucasemap_setOptions (UCaseMap *csm, uint32_t options, UErrorCode *pErrorCode) |
Set the options bit set that is used for case folding and string comparisons. | |
U_CAPI const UBreakIterator * | ucasemap_getBreakIterator (const UCaseMap *csm) |
Get the break iterator that is used for titlecasing. | |
U_CAPI void | ucasemap_setBreakIterator (UCaseMap *csm, UBreakIterator *iterToAdopt, UErrorCode *pErrorCode) |
Set the break iterator that is used for titlecasing. | |
U_CAPI int32_t | ucasemap_toTitle (UCaseMap *csm, UChar *dest, int32_t destCapacity, const UChar *src, int32_t srcLength, UErrorCode *pErrorCode) |
Titlecase a UTF-16 string. | |
U_CAPI int32_t | ucasemap_utf8ToLower (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode) |
Lowercase the characters in a UTF-8 string. | |
U_CAPI int32_t | ucasemap_utf8ToUpper (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode) |
Uppercase the characters in a UTF-8 string. | |
U_CAPI int32_t | ucasemap_utf8ToTitle (UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode) |
Titlecase a UTF-8 string. | |
U_CAPI int32_t | ucasemap_utf8FoldCase (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode) |
Case-folds the characters in a UTF-8 string. | |
C API: Unicode case mapping functions using a UCaseMap service object.
The service object takes care of memory allocations, data loading, and setup for the attributes, as usual.
Currently, the functionality provided here does not overlap with uchar.h and ustring.h, except for ucasemap_toTitle().
ucasemap_utf8XYZ() functions operate directly on UTF-8 strings.
Definition in file ucasemap.h.
U_CAPI const UBreakIterator * ucasemap_getBreakIterator | ( | const UCaseMap * | csm | ) |
Get the break iterator that is used for titlecasing.
Do not modify the returned break iterator.
csm | UCaseMap service object. |
Get the locale ID that is used for language-dependent case mappings.
csm | UCaseMap service object. |
Get the options bit set that is used for case folding and string comparisons.
csm | UCaseMap service object. |
U_CAPI UCaseMap * ucasemap_open | ( | const char * | locale, |
uint32_t | options, | ||
UErrorCode * | pErrorCode | ||
) |
Open a UCaseMap service object for a locale and a set of options.
The locale ID and options are preprocessed so that functions using the service object need not process them in each call.
locale | ICU locale ID, used for language-dependent upper-/lower-/title-casing according to the Unicode standard. Usual semantics: ""=root, NULL=default locale, etc. |
options | Options bit set, used for case folding and string comparisons. Same flags as for u_foldCase(), u_strFoldCase(), u_strCaseCompare(), etc. Use 0 or U_FOLD_CASE_DEFAULT for default behavior. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI void ucasemap_setBreakIterator | ( | UCaseMap * | csm, |
UBreakIterator * | iterToAdopt, | ||
UErrorCode * | pErrorCode | ||
) |
Set the break iterator that is used for titlecasing.
The UCaseMap service object releases a previously set break iterator and "adopts" this new one, taking ownership of it. It will be released in a subsequent call to ucasemap_setBreakIterator() or ucasemap_close().
Break iterator operations are not thread-safe. Therefore, titlecasing functions use non-const UCaseMap objects. It is not possible to titlecase strings concurrently using the same UCaseMap.
csm | UCaseMap service object. |
iterToAdopt | Break iterator to be adopted for titlecasing. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI void ucasemap_setLocale | ( | UCaseMap * | csm, |
const char * | locale, | ||
UErrorCode * | pErrorCode | ||
) |
Set the locale ID that is used for language-dependent case mappings.
csm | UCaseMap service object. |
locale | Locale ID, see ucasemap_open(). |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI void ucasemap_setOptions | ( | UCaseMap * | csm, |
uint32_t | options, | ||
UErrorCode * | pErrorCode | ||
) |
Set the options bit set that is used for case folding and string comparisons.
csm | UCaseMap service object. |
options | Options bit set, see ucasemap_open(). |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI int32_t ucasemap_toTitle | ( | UCaseMap * | csm, |
UChar * | dest, | ||
int32_t | destCapacity, | ||
const UChar * | src, | ||
int32_t | srcLength, | ||
UErrorCode * | pErrorCode | ||
) |
Titlecase a UTF-16 string.
This function is almost a duplicate of u_strToTitle(), except that it takes ucasemap_setOptions() into account and has performance advantages from being able to use a UCaseMap object for multiple case mapping operations, saving setup time.
Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)
Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.
The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.
This function uses only the setText(), first() and next() methods of the provided break iterator.
The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.
csm | UCaseMap service object. This pointer is non-const! See the note above for details. |
dest | A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure. |
destCapacity | The size of the buffer (number of UChars). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string. |
src | The original string. |
srcLength | The length of the original string. If -1, then src must be NUL-terminated. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI int32_t ucasemap_utf8FoldCase | ( | const UCaseMap * | csm, |
char * | dest, | ||
int32_t | destCapacity, | ||
const char * | src, | ||
int32_t | srcLength, | ||
UErrorCode * | pErrorCode | ||
) |
Case-folds the characters in a UTF-8 string.
Case-folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.
The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.
csm | UCaseMap service object. |
dest | A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure. |
destCapacity | The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string. |
src | The original string. |
srcLength | The length of the original string. If -1, then src must be NUL-terminated. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI int32_t ucasemap_utf8ToLower | ( | const UCaseMap * | csm, |
char * | dest, | ||
int32_t | destCapacity, | ||
const char * | src, | ||
int32_t | srcLength, | ||
UErrorCode * | pErrorCode | ||
) |
Lowercase the characters in a UTF-8 string.
Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.
csm | UCaseMap service object. |
dest | A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure. |
destCapacity | The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string. |
src | The original string. |
srcLength | The length of the original string. If -1, then src must be NUL-terminated. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI int32_t ucasemap_utf8ToTitle | ( | UCaseMap * | csm, |
char * | dest, | ||
int32_t | destCapacity, | ||
const char * | src, | ||
int32_t | srcLength, | ||
UErrorCode * | pErrorCode | ||
) |
Titlecase a UTF-8 string.
Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)
Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.
The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.
This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.
The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.
csm | UCaseMap service object. This pointer is non-const! See the note above for details. |
dest | A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure. |
destCapacity | The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string. |
src | The original string. |
srcLength | The length of the original string. If -1, then src must be NUL-terminated. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |
U_CAPI int32_t ucasemap_utf8ToUpper | ( | const UCaseMap * | csm, |
char * | dest, | ||
int32_t | destCapacity, | ||
const char * | src, | ||
int32_t | srcLength, | ||
UErrorCode * | pErrorCode | ||
) |
Uppercase the characters in a UTF-8 string.
Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.
csm | UCaseMap service object. |
dest | A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure. |
destCapacity | The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string. |
src | The original string. |
srcLength | The length of the original string. If -1, then src must be NUL-terminated. |
pErrorCode | Must be a valid pointer to an error code value, which must not indicate a failure before the function call. |