ICU 76.1 76.1
Loading...
Searching...
No Matches
Data Structures | Namespaces | Typedefs | Enumerations | Functions
utrans.h File Reference

C API: Transliterator. More...

#include "unicode/utypes.h"
#include "unicode/urep.h"
#include "unicode/parseerr.h"
#include "unicode/uenum.h"
#include "unicode/uset.h"
#include "unicode/localpointer.h"

Go to the source code of this file.

Data Structures

struct  UTransPosition
 Position structure for utrans_transIncremental() incremental transliteration. More...
 

Namespaces

namespace  icu
 File coll.h.
 

Typedefs

typedef void * UTransliterator
 An opaque transliterator for use in C.
 
typedef enum UTransDirection UTransDirection
 Direction constant indicating the direction in a transliterator, e.g., the forward or reverse rules of a RuleBasedTransliterator.
 
typedef struct UTransPosition UTransPosition
 Position structure for utrans_transIncremental() incremental transliteration.
 

Enumerations

enum  UTransDirection { UTRANS_FORWARD , UTRANS_REVERSE }
 Direction constant indicating the direction in a transliterator, e.g., the forward or reverse rules of a RuleBasedTransliterator. More...
 

Functions

U_CAPI UTransliteratorutrans_openU (const UChar *id, int32_t idLength, UTransDirection dir, const UChar *rules, int32_t rulesLength, UParseError *parseError, UErrorCode *pErrorCode)
 Open a custom transliterator, given a custom rules string OR a system transliterator, given its ID.
 
U_CAPI UTransliteratorutrans_openInverse (const UTransliterator *trans, UErrorCode *status)
 Open an inverse of an existing transliterator.
 
U_CAPI UTransliteratorutrans_clone (const UTransliterator *trans, UErrorCode *status)
 Create a copy of a transliterator.
 
U_CAPI void utrans_close (UTransliterator *trans)
 Close a transliterator.
 
U_CAPI const UCharutrans_getUnicodeID (const UTransliterator *trans, int32_t *resultLength)
 Return the programmatic identifier for this transliterator.
 
U_CAPI void utrans_register (UTransliterator *adoptedTrans, UErrorCode *status)
 Register an open transliterator with the system.
 
U_CAPI void utrans_unregisterID (const UChar *id, int32_t idLength)
 Unregister a transliterator from the system.
 
U_CAPI void utrans_setFilter (UTransliterator *trans, const UChar *filterPattern, int32_t filterPatternLen, UErrorCode *status)
 Set the filter used by a transliterator.
 
U_CAPI int32_t utrans_countAvailableIDs (void)
 Return the number of system transliterators.
 
U_CAPI UEnumerationutrans_openIDs (UErrorCode *pErrorCode)
 Return a UEnumeration for the available transliterators.
 
U_CAPI void utrans_trans (const UTransliterator *trans, UReplaceable *rep, const UReplaceableCallbacks *repFunc, int32_t start, int32_t *limit, UErrorCode *status)
 Transliterate a segment of a UReplaceable string.
 
U_CAPI void utrans_transIncremental (const UTransliterator *trans, UReplaceable *rep, const UReplaceableCallbacks *repFunc, UTransPosition *pos, UErrorCode *status)
 Transliterate the portion of the UReplaceable text buffer that can be transliterated unambiguously.
 
U_CAPI void utrans_transUChars (const UTransliterator *trans, UChar *text, int32_t *textLength, int32_t textCapacity, int32_t start, int32_t *limit, UErrorCode *status)
 Transliterate a segment of a UChar* string.
 
U_CAPI void utrans_transIncrementalUChars (const UTransliterator *trans, UChar *text, int32_t *textLength, int32_t textCapacity, UTransPosition *pos, UErrorCode *status)
 Transliterate the portion of the UChar* text buffer that can be transliterated unambiguously.
 
U_CAPI int32_t utrans_toRules (const UTransliterator *trans, UBool escapeUnprintable, UChar *result, int32_t resultLength, UErrorCode *status)
 Create a rule string that can be passed to utrans_openU to recreate this transliterator.
 
U_CAPI USetutrans_getSourceSet (const UTransliterator *trans, UBool ignoreFilter, USet *fillIn, UErrorCode *status)
 Returns the set of all characters that may be modified in the input text by this UTransliterator, optionally ignoring the transliterator's current filter.
 
UTransliteratorutrans_open (const char *id, UTransDirection dir, const UChar *rules, int32_t rulesLength, UParseError *parseError, UErrorCode *status)
 Deprecated, use utrans_openU() instead.
 
int32_t utrans_getID (const UTransliterator *trans, char *buf, int32_t bufCapacity)
 Deprecated, use utrans_getUnicodeID() instead.
 
void utrans_unregister (const char *id)
 Deprecated, use utrans_unregisterID() instead.
 
int32_t utrans_getAvailableID (int32_t index, char *buf, int32_t bufCapacity)
 Deprecated, use utrans_openIDs() instead.
 

Detailed Description

C API: Transliterator.

Transliteration

The data structures and functions described in this header provide transliteration services. Transliteration services are implemented as C++ classes. The comments and documentation in this header assume the reader is familiar with the C++ headers translit.h and associated documentation.

A significant but incomplete subset of the C++ transliteration services are available to C code through this header. In order to access more complex transliteration services, refer to the C++ headers and documentation.

There are two sets of functions for working with transliterator IDs:

An old, deprecated set uses char * IDs, which works for true and pure identifiers that these APIs were designed for, for example "Cyrillic-Latin". It does not work when the ID contains filters ("[:Script=Cyrl:]") or even a complete set of rules because then the ID string contains more than just "invariant" characters (see utypes.h).

A new set of functions replaces the old ones and uses UChar * IDs, paralleling the UnicodeString IDs in the C++ API. (New in ICU 2.8.)

Definition in file utrans.h.

Typedef Documentation

◆ UTransDirection

Direction constant indicating the direction in a transliterator, e.g., the forward or reverse rules of a RuleBasedTransliterator.

Specified when a transliterator is opened. An "A-B" transliterator transliterates A to B when operating in the forward direction, and B to A when operating in the reverse direction.

Stable:
ICU 2.0

◆ UTransliterator

typedef void* UTransliterator

An opaque transliterator for use in C.

Open with utrans_openxxx() and close with utrans_close() when done. Equivalent to the C++ class Transliterator and its subclasses.

See also
Transliterator
Stable:
ICU 2.0

Definition at line 73 of file utrans.h.

◆ UTransPosition

Position structure for utrans_transIncremental() incremental transliteration.

This structure defines two substrings of the text being transliterated. The first region, [contextStart, contextLimit), defines what characters the transliterator will read as context. The second region, [start, limit), defines what characters will actually be transliterated. The second region should be a subset of the first.

After a transliteration operation, some of the indices in this structure will be modified. See the field descriptions for details.

contextStart <= start <= limit <= contextLimit

Note: All index values in this structure must be at code point boundaries. That is, none of them may occur between two code units of a surrogate pair. If any index does split a surrogate pair, results are unspecified.

Stable:
ICU 2.0

Enumeration Type Documentation

◆ UTransDirection

Direction constant indicating the direction in a transliterator, e.g., the forward or reverse rules of a RuleBasedTransliterator.

Specified when a transliterator is opened. An "A-B" transliterator transliterates A to B when operating in the forward direction, and B to A when operating in the reverse direction.

Stable:
ICU 2.0
Enumerator
UTRANS_FORWARD 

UTRANS_FORWARD means from <source> to <target> for a transliterator with ID <source>-<target>.

For a transliterator opened using a rule, it means forward direction rules, e.g., "A > B".

UTRANS_REVERSE 

UTRANS_REVERSE means from <target> to <source> for a transliterator with ID <source>-<target>.

For a transliterator opened using a rule, it means reverse direction rules, e.g., "A < B".

Definition at line 83 of file utrans.h.

Function Documentation

◆ utrans_clone()

U_CAPI UTransliterator * utrans_clone ( const UTransliterator trans,
UErrorCode status 
)

Create a copy of a transliterator.

Any non-NULL result from this function should later be closed with utrans_close().

Parameters
transthe transliterator to be copied.
statusa pointer to the UErrorCode
Returns
a transliterator pointer that may be passed to other utrans_xxx() functions, or NULL if the clone call fails.
Stable:
ICU 2.0

◆ utrans_close()

U_CAPI void utrans_close ( UTransliterator trans)

Close a transliterator.

Any non-NULL pointer returned by utrans_openXxx() or utrans_clone() should eventually be closed.

Parameters
transthe transliterator to be closed.
Stable:
ICU 2.0

◆ utrans_countAvailableIDs()

U_CAPI int32_t utrans_countAvailableIDs ( void  )

Return the number of system transliterators.

It is recommended to use utrans_openIDs() instead.

Returns
the number of system transliterators.
Stable:
ICU 2.0

◆ utrans_getAvailableID()

int32_t utrans_getAvailableID ( int32_t  index,
char *  buf,
int32_t  bufCapacity 
)

Deprecated, use utrans_openIDs() instead.

Return the ID of the index-th system transliterator. The result is placed in the given buffer. If the given buffer is too small, the initial substring is copied to buf. The result in buf is always zero-terminated.

Parameters
indexthe number of the transliterator to return. Must satisfy 0 <= index < utrans_countAvailableIDs(). If index is out of range then it is treated as if it were 0.
bufthe buffer in which to receive the ID. This may be NULL, in which case no characters are copied.
bufCapacitythe capacity of the buffer. Ignored if buf is NULL.
Returns
the actual length of the index-th ID, not including zero-termination. This may be greater than bufCapacity.
Deprecated:
ICU 2.8 Use utrans_openIDs() instead, see utrans.h

◆ utrans_getID()

int32_t utrans_getID ( const UTransliterator trans,
char *  buf,
int32_t  bufCapacity 
)

Deprecated, use utrans_getUnicodeID() instead.

Return the programmatic identifier for this transliterator. If this identifier is passed to utrans_open(), it will open a transliterator equivalent to this one, if the ID has been registered.

Parameters
transthe transliterator to return the ID of.
bufthe buffer in which to receive the ID. This may be NULL, in which case no characters are copied.
bufCapacitythe capacity of the buffer. Ignored if buf is NULL.
Returns
the actual length of the ID, not including zero-termination. This may be greater than bufCapacity.
Deprecated:
ICU 2.8 Use utrans_getUnicodeID() instead, see utrans.h

◆ utrans_getSourceSet()

U_CAPI USet * utrans_getSourceSet ( const UTransliterator trans,
UBool  ignoreFilter,
USet fillIn,
UErrorCode status 
)

Returns the set of all characters that may be modified in the input text by this UTransliterator, optionally ignoring the transliterator's current filter.

Parameters
transThe transliterator.
ignoreFilterIf false, the returned set incorporates the UTransliterator's current filter; if the filter is changed, the return value of this function will change. If true, the returned set ignores the effect of the UTransliterator's current filter.
fillInPointer to a USet object to receive the modifiable characters set. Previous contents of fillIn are lost. If fillIn is NULL, then a new USet is created and returned. The caller owns the result and must dispose of it by calling uset_close.
statusA pointer to the UErrorCode.
Returns
USet* Either fillIn, or if fillIn is NULL, a pointer to a newly-allocated USet that the user must close. In case of error, NULL is returned.
Stable:
ICU 53

◆ utrans_getUnicodeID()

U_CAPI const UChar * utrans_getUnicodeID ( const UTransliterator trans,
int32_t *  resultLength 
)

Return the programmatic identifier for this transliterator.

If this identifier is passed to utrans_openU(), it will open a transliterator equivalent to this one, if the ID has been registered.

Parameters
transthe transliterator to return the ID of.
resultLengthpointer to an output variable receiving the length of the ID string; can be NULL
Returns
the NUL-terminated ID string. This pointer remains valid until utrans_close() is called on this transliterator.
Stable:
ICU 2.8

◆ utrans_open()

UTransliterator * utrans_open ( const char *  id,
UTransDirection  dir,
const UChar rules,
int32_t  rulesLength,
UParseError parseError,
UErrorCode status 
)

Deprecated, use utrans_openU() instead.

Open a custom transliterator, given a custom rules string OR a system transliterator, given its ID.
Any non-NULL result from this function should later be closed with utrans_close().

Parameters
ida valid ID, as returned by utrans_getAvailableID()
dirthe desired direction
rulesthe transliterator rules. See the C++ header rbt.h for rules syntax. If NULL then a system transliterator matching the ID is returned.
rulesLengththe length of the rules, or -1 if the rules are zero-terminated.
parseErrora pointer to a UParseError struct to receive the details of any parsing errors. This parameter may be NULL if no parsing error details are desired.
statusa pointer to the UErrorCode
Returns
a transliterator pointer that may be passed to other utrans_xxx() functions, or NULL if the open call fails.
Deprecated:
ICU 2.8 Use utrans_openU() instead, see utrans.h

◆ utrans_openIDs()

U_CAPI UEnumeration * utrans_openIDs ( UErrorCode pErrorCode)

Return a UEnumeration for the available transliterators.

Parameters
pErrorCodePointer to the UErrorCode in/out parameter.
Returns
UEnumeration for the available transliterators. Close with uenum_close().
Stable:
ICU 2.8

◆ utrans_openInverse()

U_CAPI UTransliterator * utrans_openInverse ( const UTransliterator trans,
UErrorCode status 
)

Open an inverse of an existing transliterator.

For this to work, the inverse must be registered with the system. For example, if the Transliterator "A-B" is opened, and then its inverse is opened, the result is the Transliterator "B-A", if such a transliterator is registered with the system. Otherwise the result is NULL and a failing UErrorCode is set. Any non-NULL result from this function should later be closed with utrans_close().

Parameters
transthe transliterator to open the inverse of.
statusa pointer to the UErrorCode
Returns
a pointer to a newly-opened transliterator that is the inverse of trans, or NULL if the open call fails.
Stable:
ICU 2.0

◆ utrans_openU()

U_CAPI UTransliterator * utrans_openU ( const UChar id,
int32_t  idLength,
UTransDirection  dir,
const UChar rules,
int32_t  rulesLength,
UParseError parseError,
UErrorCode pErrorCode 
)

Open a custom transliterator, given a custom rules string OR a system transliterator, given its ID.


Any non-NULL result from this function should later be closed with utrans_close().

Parameters
ida valid transliterator ID
idLengththe length of the ID string, or -1 if NUL-terminated
dirthe desired direction
rulesthe transliterator rules. See the C++ header rbt.h for rules syntax. If NULL then a system transliterator matching the ID is returned.
rulesLengththe length of the rules, or -1 if the rules are NUL-terminated.
parseErrora pointer to a UParseError struct to receive the details of any parsing errors. This parameter may be NULL if no parsing error details are desired.
pErrorCodea pointer to the UErrorCode
Returns
a transliterator pointer that may be passed to other utrans_xxx() functions, or NULL if the open call fails.
Stable:
ICU 2.8

◆ utrans_register()

U_CAPI void utrans_register ( UTransliterator adoptedTrans,
UErrorCode status 
)

Register an open transliterator with the system.

When utrans_open() is called with an ID string that is equal to that returned by utrans_getID(adoptedTrans,...), then utrans_clone(adoptedTrans,...) is returned.

NOTE: After this call the system owns the adoptedTrans and will close it. The user must not call utrans_close() on adoptedTrans.

Parameters
adoptedTransa transliterator, typically the result of utrans_openRules(), to be registered with the system.
statusa pointer to the UErrorCode
Stable:
ICU 2.0

◆ utrans_setFilter()

U_CAPI void utrans_setFilter ( UTransliterator trans,
const UChar filterPattern,
int32_t  filterPatternLen,
UErrorCode status 
)

Set the filter used by a transliterator.

A filter can be used to make the transliterator pass certain characters through untouched. The filter is expressed using a UnicodeSet pattern. If the filterPattern is NULL or the empty string, then the transliterator will be reset to use no filter.

Parameters
transthe transliterator
filterPatterna pattern string, in the form accepted by UnicodeSet, specifying which characters to apply the transliteration to. May be NULL or the empty string to indicate no filter.
filterPatternLenthe length of filterPattern, or -1 if filterPattern is zero-terminated
statusa pointer to the UErrorCode
See also
UnicodeSet
Stable:
ICU 2.0

◆ utrans_toRules()

U_CAPI int32_t utrans_toRules ( const UTransliterator trans,
UBool  escapeUnprintable,
UChar result,
int32_t  resultLength,
UErrorCode status 
)

Create a rule string that can be passed to utrans_openU to recreate this transliterator.

Parameters
transThe transliterator
escapeUnprintableif true then convert unprintable characters to their hex escape representations, \uxxxx or \Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
resultA pointer to a buffer to receive the rules.
resultLengthThe maximum size of result.
statusA pointer to the UErrorCode. In case of error status, the contents of result are undefined.
Returns
int32_t The length of the rule string (may be greater than resultLength, in which case an error is returned).
Stable:
ICU 53

◆ utrans_trans()

U_CAPI void utrans_trans ( const UTransliterator trans,
UReplaceable rep,
const UReplaceableCallbacks repFunc,
int32_t  start,
int32_t *  limit,
UErrorCode status 
)

Transliterate a segment of a UReplaceable string.

The string is passed in as a UReplaceable pointer rep and a UReplaceableCallbacks function pointer struct repFunc. Functions in the repFunc struct will be called in order to modify the rep string.

Parameters
transthe transliterator
repa pointer to the string. This will be passed to the repFunc functions.
repFunca set of function pointers that will be used to modify the string pointed to by rep.
startthe beginning index, inclusive; 0 <= start <= limit.
limitpointer to the ending index, exclusive; start <= limit <= repFunc->length(rep). Upon return, *limit will contain the new limit index. The text previously occupying [start, limit) has been transliterated, possibly to a string of a different length, at [start, new-limit), where new-limit is the return value.
statusa pointer to the UErrorCode
Stable:
ICU 2.0

◆ utrans_transIncremental()

U_CAPI void utrans_transIncremental ( const UTransliterator trans,
UReplaceable rep,
const UReplaceableCallbacks repFunc,
UTransPosition pos,
UErrorCode status 
)

Transliterate the portion of the UReplaceable text buffer that can be transliterated unambiguously.

This method is typically called after new text has been inserted, e.g. as a result of a keyboard event. The transliterator will try to transliterate characters of rep between index.cursor and index.limit. Characters before index.cursor will not be changed.

Upon return, values in index will be updated. index.start will be advanced to the first character that future calls to this method will read. index.cursor and index.limit will be adjusted to delimit the range of text that future calls to this method may change.

Typical usage of this method begins with an initial call with index.start and index.limit set to indicate the portion of text to be transliterated, and index.cursor == index.start. Thereafter, index can be used without modification in future calls, provided that all changes to text are made via this method.

This method assumes that future calls may be made that will insert new text into the buffer. As a result, it only performs unambiguous transliterations. After the last call to this method, there may be untransliterated text that is waiting for more input to resolve an ambiguity. In order to perform these pending transliterations, clients should call utrans_trans() with a start of index.start and a limit of index.end after the last call to this method has been made.

Parameters
transthe transliterator
repa pointer to the string. This will be passed to the repFunc functions.
repFunca set of function pointers that will be used to modify the string pointed to by rep.
posa struct containing the start and limit indices of the text to be read and the text to be transliterated
statusa pointer to the UErrorCode
Stable:
ICU 2.0

◆ utrans_transIncrementalUChars()

U_CAPI void utrans_transIncrementalUChars ( const UTransliterator trans,
UChar text,
int32_t *  textLength,
int32_t  textCapacity,
UTransPosition pos,
UErrorCode status 
)

Transliterate the portion of the UChar* text buffer that can be transliterated unambiguously.

See utrans_transIncremental(). The string is passed in in a UChar* buffer. The string is modified in place. If the result is longer than textCapacity, it is truncated. The actual length of the result is returned in *textLength, if textLength is non-NULL. *textLength may be greater than textCapacity, but only textCapacity UChars will be written to *text, including the zero terminator. See utrans_transIncremental() for usage details.

Parameters
transthe transliterator
texta pointer to a buffer containing the text to be transliterated on input and the result text on output.
textLengtha pointer to the length of the string in text. If the length is -1 then the string is assumed to be zero-terminated. Upon return, the new length is stored in *textLength. If textLength is NULL then the string is assumed to be zero-terminated.
textCapacitythe length of the text buffer
posa struct containing the start and limit indices of the text to be read and the text to be transliterated
statusa pointer to the UErrorCode
See also
utrans_transIncremental
Stable:
ICU 2.0

◆ utrans_transUChars()

U_CAPI void utrans_transUChars ( const UTransliterator trans,
UChar text,
int32_t *  textLength,
int32_t  textCapacity,
int32_t  start,
int32_t *  limit,
UErrorCode status 
)

Transliterate a segment of a UChar* string.

The string is passed in in a UChar* buffer. The string is modified in place. If the result is longer than textCapacity, it is truncated. The actual length of the result is returned in *textLength, if textLength is non-NULL. *textLength may be greater than textCapacity, but only textCapacity UChars will be written to *text, including the zero terminator.

Parameters
transthe transliterator
texta pointer to a buffer containing the text to be transliterated on input and the result text on output.
textLengtha pointer to the length of the string in text. If the length is -1 then the string is assumed to be zero-terminated. Upon return, the new length is stored in *textLength. If textLength is NULL then the string is assumed to be zero-terminated.
textCapacitythe length of the text buffer
startthe beginning index, inclusive; 0 <= start <= limit.
limitpointer to the ending index, exclusive; start <= limit <= repFunc->length(rep). Upon return, *limit will contain the new limit index. The text previously occupying [start, limit) has been transliterated, possibly to a string of a different length, at [start, new-limit), where new-limit is the return value.
statusa pointer to the UErrorCode
Stable:
ICU 2.0

◆ utrans_unregister()

void utrans_unregister ( const char *  id)

Deprecated, use utrans_unregisterID() instead.

Unregister a transliterator from the system. After this call the system will no longer recognize the given ID when passed to utrans_open(). If the id is invalid then nothing is done.

Parameters
ida zero-terminated ID
Deprecated:
ICU 2.8 Use utrans_unregisterID() instead, see utrans.h

◆ utrans_unregisterID()

U_CAPI void utrans_unregisterID ( const UChar id,
int32_t  idLength 
)

Unregister a transliterator from the system.

After this call the system will no longer recognize the given ID when passed to utrans_open(). If the ID is invalid then nothing is done.

Parameters
idan ID to unregister
idLengththe length of id, or -1 if id is zero-terminated
Stable:
ICU 2.8