ICU 75.1 75.1
|
C API: UConverter predefined error callbacks. More...
#include "unicode/utypes.h"
Go to the source code of this file.
Data Structures | |
struct | UConverterFromUnicodeArgs |
The structure for the fromUnicode callback function parameter. More... | |
struct | UConverterToUnicodeArgs |
The structure for the toUnicode callback function parameter. More... | |
Macros | |
#define | UCNV_SUB_STOP_ON_ILLEGAL "i" |
FROM_U, TO_U context options for sub callback. | |
#define | UCNV_SKIP_STOP_ON_ILLEGAL "i" |
FROM_U, TO_U context options for skip callback. | |
#define | UCNV_ESCAPE_ICU NULL |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to ICU (UXXXX) | |
#define | UCNV_ESCAPE_JAVA "J" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to JAVA (\uXXXX) | |
#define | UCNV_ESCAPE_C "C" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to C (\uXXXX \UXXXXXXXX) TO_U_CALLBACK_ESCAPE option to escape the character value according to C (\xXXXX) | |
#define | UCNV_ESCAPE_XML_DEC "D" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Decimal escape (&#DDDD;) TO_U_CALLBACK_ESCAPE context option to escape the character value according to XML Decimal escape (&#DDDD;). | |
#define | UCNV_ESCAPE_XML_HEX "X" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Hex escape (&#xXXXX;) TO_U_CALLBACK_ESCAPE context option to escape the character value according to XML Hex escape (&#xXXXX;). | |
#define | UCNV_ESCAPE_UNICODE "U" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX) | |
#define | UCNV_ESCAPE_CSS2 "S" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to CSS2 conventions (\HH..H<space>, that is, a backslash, 1..6 hex digits, and a space) | |
Typedefs | |
typedef struct UConverter | UConverter |
Enumerations | |
enum | UConverterCallbackReason { UCNV_UNASSIGNED = 0 , UCNV_ILLEGAL = 1 , UCNV_IRREGULAR = 2 , UCNV_RESET = 3 , UCNV_CLOSE = 4 , UCNV_CLONE = 5 } |
The process condition code to be used with the callbacks. More... | |
Functions | |
U_CAPI void | UCNV_FROM_U_CALLBACK_STOP (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. | |
U_CAPI void | UCNV_TO_U_CALLBACK_STOP (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. | |
U_CAPI void | UCNV_FROM_U_CALLBACK_SKIP (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSIGNED_SEQUENCE depending on the context parameter simply ignoring those characters. | |
U_CAPI void | UCNV_FROM_U_CALLBACK_SUBSTITUTE (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE, or UNASSIGNED_SEQUENCE depending on context parameter, with the current substitution string for the converter. | |
U_CAPI void | UCNV_FROM_U_CALLBACK_ESCAPE (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal codepoints. | |
U_CAPI void | UCNV_TO_U_CALLBACK_SKIP (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSIGNED_SEQUENCE depending on the context parameter simply ignoring those characters. | |
U_CAPI void | UCNV_TO_U_CALLBACK_SUBSTITUTE (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE,or UNASSIGNED_SEQUENCE depending on context parameter, with the Unicode substitution character, U+FFFD. | |
U_CAPI void | UCNV_TO_U_CALLBACK_ESCAPE (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal bytes (in the format XNN, e.g. | |
C API: UConverter predefined error callbacks.
Defines some error behaviour functions called by ucnv_{from,to}Unicode These are provided as part of ICU and many are stable, but they can also be considered only as an example of what can be done with callbacks. You may of course write your own.
If you want to write your own, you may also find the functions from ucnv_cb.h useful when writing your own callbacks.
These functions, although public, should NEVER be called directly. They should be used as parameters to the ucnv_setFromUCallback and ucnv_setToUCallback functions, to set the behaviour of a converter when it encounters ILLEGAL/UNMAPPED/INVALID sequences.
usage example: 'STOP' doesn't need any context, but newContext could be set to something other than 'NULL' if needed. The available contexts in this header can modify the default behavior of the callback.
The code above tells "myConverter" to stop when it encounters an ILLEGAL/TRUNCATED/INVALID sequences when it is used to convert from Unicode -> Codepage. The behavior from Codepage to Unicode is not changed, and ucnv_setToUCallBack would need to be called in order to change that behavior too.
Here is an example with a context:
The code above tells "myConverter" to stop when it encounters an ILLEGAL/TRUNCATED/INVALID sequences when it is used to convert from Codepage -> Unicode. Any unmapped and legal characters will be substituted to be the default substitution character.
Definition in file ucnv_err.h.
#define UCNV_ESCAPE_C "C" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to C (\uXXXX \UXXXXXXXX) TO_U_CALLBACK_ESCAPE option to escape the character value according to C (\xXXXX)
Definition at line 125 of file ucnv_err.h.
#define UCNV_ESCAPE_CSS2 "S" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to CSS2 conventions (\HH..H<space>, that is, a backslash, 1..6 hex digits, and a space)
Definition at line 149 of file ucnv_err.h.
#define UCNV_ESCAPE_ICU NULL |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to ICU (UXXXX)
Definition at line 114 of file ucnv_err.h.
#define UCNV_ESCAPE_JAVA "J" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to JAVA (\uXXXX)
Definition at line 119 of file ucnv_err.h.
#define UCNV_ESCAPE_UNICODE "U" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX)
Definition at line 142 of file ucnv_err.h.
#define UCNV_ESCAPE_XML_DEC "D" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Decimal escape (&#DDDD;) TO_U_CALLBACK_ESCAPE context option to escape the character value according to XML Decimal escape (&#DDDD;).
Definition at line 131 of file ucnv_err.h.
#define UCNV_ESCAPE_XML_HEX "X" |
FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Hex escape (&#xXXXX;) TO_U_CALLBACK_ESCAPE context option to escape the character value according to XML Hex escape (&#xXXXX;).
Definition at line 137 of file ucnv_err.h.
#define UCNV_SKIP_STOP_ON_ILLEGAL "i" |
FROM_U, TO_U context options for skip callback.
Definition at line 108 of file ucnv_err.h.
#define UCNV_SUB_STOP_ON_ILLEGAL "i" |
FROM_U, TO_U context options for sub callback.
Definition at line 102 of file ucnv_err.h.
typedef struct UConverter UConverter |
Definition at line 96 of file ucnv_err.h.
The process condition code to be used with the callbacks.
Codes which are greater than UCNV_IRREGULAR should be passed on to any chained callbacks.
Enumerator | |
---|---|
UCNV_UNASSIGNED | The code point is unassigned. The error code U_INVALID_CHAR_FOUND will be set. |
UCNV_ILLEGAL | The code point is illegal. For example, \x81\x2E is illegal in SJIS because \x2E is not a valid trail byte for the \x81 lead byte. Also, starting with Unicode 3.0.1, non-shortest byte sequences in UTF-8 (like \xC1\xA1 instead of \x61 for U+0061) are also illegal, not just irregular. The error code U_ILLEGAL_CHAR_FOUND will be set. |
UCNV_IRREGULAR | The codepoint is not a regular sequence in the encoding. For example, \xED\xA0\x80..\xED\xBF\xBF are irregular UTF-8 byte sequences for single surrogate code points. The error code U_INVALID_CHAR_FOUND will be set. |
UCNV_RESET | The callback is called with this reason when a 'reset' has occurred. Callback should reset all state. |
UCNV_CLOSE | Called when the converter is closed. The callback should release any allocated memory. |
UCNV_CLONE | Called when ucnv_safeClone() is called on the converter. the pointer available as the 'context' is an alias to the original converters' context pointer. If the context must be owned by the new converter, the callback must clone the data and call ucnv_setFromUCallback (or setToUCallback) with the correct pointer.
|
Definition at line 157 of file ucnv_err.h.
U_CAPI void UCNV_FROM_U_CALLBACK_ESCAPE | ( | const void * | context, |
UConverterFromUnicodeArgs * | fromUArgs, | ||
const UChar * | codeUnits, | ||
int32_t | length, | ||
UChar32 | codePoint, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal codepoints.
context | The function currently recognizes the callback options:
|
fromUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' UChars of the concerned Unicode sequence |
length | Size (in bytes) of the concerned codepage sequence |
codePoint | Single UChar32 (UTF-32) containing the concerend Unicode codepoint. |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |
U_CAPI void UCNV_FROM_U_CALLBACK_SKIP | ( | const void * | context, |
UConverterFromUnicodeArgs * | fromUArgs, | ||
const UChar * | codeUnits, | ||
int32_t | length, | ||
UChar32 | codePoint, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSIGNED_SEQUENCE depending on the context parameter simply ignoring those characters.
context | The function currently recognizes the callback options: UCNV_SKIP_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Skips any ILLEGAL_SEQUENCE |
fromUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' UChars of the concerned Unicode sequence |
length | Size (in bytes) of the concerned codepage sequence |
codePoint | Single UChar32 (UTF-32) containing the concerend Unicode codepoint. |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |
U_CAPI void UCNV_FROM_U_CALLBACK_STOP | ( | const void * | context, |
UConverterFromUnicodeArgs * | fromUArgs, | ||
const UChar * | codeUnits, | ||
int32_t | length, | ||
UChar32 | codePoint, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately.
context | Pointer to the callback's private data |
fromUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' UChars of the concerned Unicode sequence |
length | Size (in bytes) of the concerned codepage sequence |
codePoint | Single UChar32 (UTF-32) containing the concerend Unicode codepoint. |
reason | Defines the reason the callback was invoked |
err | This should always be set to a failure status prior to calling. |
U_CAPI void UCNV_FROM_U_CALLBACK_SUBSTITUTE | ( | const void * | context, |
UConverterFromUnicodeArgs * | fromUArgs, | ||
const UChar * | codeUnits, | ||
int32_t | length, | ||
UChar32 | codePoint, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE, or UNASSIGNED_SEQUENCE depending on context parameter, with the current substitution string for the converter.
This is the default callback.
context | The function currently recognizes the callback options: UCNV_SUB_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Substitutes any ILLEGAL_SEQUENCE |
fromUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' UChars of the concerned Unicode sequence |
length | Size (in bytes) of the concerned codepage sequence |
codePoint | Single UChar32 (UTF-32) containing the concerend Unicode codepoint. |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |
U_CAPI void UCNV_TO_U_CALLBACK_ESCAPE | ( | const void * | context, |
UConverterToUnicodeArgs * | toUArgs, | ||
const char * | codeUnits, | ||
int32_t | length, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal bytes (in the format XNN, e.g.
"%XFF%X0A%XC8%X03").
context | This function currently recognizes the callback options: UCNV_ESCAPE_ICU, UCNV_ESCAPE_JAVA, UCNV_ESCAPE_C, UCNV_ESCAPE_XML_DEC, UCNV_ESCAPE_XML_HEX and UCNV_ESCAPE_UNICODE. |
toUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' bytes of the concerned codepage sequence |
length | Size (in bytes) of the concerned codepage sequence |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |
U_CAPI void UCNV_TO_U_CALLBACK_SKIP | ( | const void * | context, |
UConverterToUnicodeArgs * | toUArgs, | ||
const char * | codeUnits, | ||
int32_t | length, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSIGNED_SEQUENCE depending on the context parameter simply ignoring those characters.
context | The function currently recognizes the callback options: UCNV_SKIP_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Skips any ILLEGAL_SEQUENCE |
toUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' bytes of the concerned codepage sequence |
length | Size (in bytes) of the concerned codepage sequence |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |
U_CAPI void UCNV_TO_U_CALLBACK_STOP | ( | const void * | context, |
UConverterToUnicodeArgs * | toUArgs, | ||
const char * | codeUnits, | ||
int32_t | length, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately.
context | Pointer to the callback's private data |
toUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' bytes of the concerned codepage sequence |
length | Size (in bytes) of the concerned codepage sequence |
reason | Defines the reason the callback was invoked |
err | This should always be set to a failure status prior to calling. |
U_CAPI void UCNV_TO_U_CALLBACK_SUBSTITUTE | ( | const void * | context, |
UConverterToUnicodeArgs * | toUArgs, | ||
const char * | codeUnits, | ||
int32_t | length, | ||
UConverterCallbackReason | reason, | ||
UErrorCode * | err | ||
) |
DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE,or UNASSIGNED_SEQUENCE depending on context parameter, with the Unicode substitution character, U+FFFD.
context | The function currently recognizes the callback options: UCNV_SUB_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Substitutes any ILLEGAL_SEQUENCE |
toUArgs | Information about the conversion in progress |
codeUnits | Points to 'length' bytes of the concerned codepage sequence |
length | Size (in bytes) of the concerned codepage sequence |
reason | Defines the reason the callback was invoked |
err | Return value will be set to success if the callback was handled, otherwise this value will be set to a failure status. |