ICU 75.1 75.1
Loading...
Searching...
No Matches
Data Structures | Namespaces | Macros | Typedefs | Functions
unistr.h File Reference

C++ API: Unicode String. More...

#include "unicode/utypes.h"
#include <cstddef>
#include "unicode/char16ptr.h"
#include "unicode/rep.h"
#include "unicode/std_string.h"
#include "unicode/stringpiece.h"
#include "unicode/bytestream.h"

Go to the source code of this file.

Data Structures

class  icu::UnicodeString
 UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes. More...
 

Namespaces

namespace  icu
 File coll.h.
 

Macros

#define US_INV   icu::UnicodeString::kInvariant
 Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.
 
#define UNICODE_STRING(cs, _length)   icu::UnicodeString(true, u ## cs, _length)
 Unicode String literals in C++.
 
#define UNICODE_STRING_SIMPLE(cs)   UNICODE_STRING(cs, -1)
 Unicode String literals in C++.
 
#define UNISTR_FROM_CHAR_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_FROM_STRING_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_OBJECT_SIZE   64
 Desired sizeof(UnicodeString) in bytes.
 

Typedefs

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)
 Internal string case mapping function type.
 

Functions

U_CAPI int32_t u_strlen (const UChar *s)
 
U_COMMON_API UnicodeString icu::operator+ (const UnicodeString &s1, const UnicodeString &s2)
 Create a new UnicodeString with the concatenation of two others.
 

Detailed Description

C++ API: Unicode String.

Definition in file unistr.h.

Macro Definition Documentation

◆ UNICODE_STRING

#define UNICODE_STRING (   cs,
  _length 
)    icu::UnicodeString(true, u ## cs, _length)

Unicode String literals in C++.

Note: these macros are not recommended for new code. Prior to the availability of C++11 and u"unicode string literals", these macros were provided for portability and efficiency when initializing UnicodeStrings from literals.

They work only for strings that contain "invariant characters", i.e., only latin letters, digits, and some punctuation. See utypes.h for details.

The string parameter must be a C string literal. The length of the string, not including the terminating NUL, must be specified as a constant.

Stable:
ICU 2.0

Definition at line 117 of file unistr.h.

◆ UNICODE_STRING_SIMPLE

#define UNICODE_STRING_SIMPLE (   cs)    UNICODE_STRING(cs, -1)

Unicode String literals in C++.

Dependent on the platform properties, different UnicodeString constructors should be used to create a UnicodeString object from a string literal. The macros are defined for improved performance. They work only for strings that contain "invariant characters", i.e., only latin letters, digits, and some punctuation. See utypes.h for details.

The string parameter must be a C string literal.

Stable:
ICU 2.0

Definition at line 135 of file unistr.h.

◆ UNISTR_FROM_CHAR_EXPLICIT

#define UNISTR_FROM_CHAR_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(char16_t) and UnicodeString(UChar32) constructors are marked as explicit, preventing their inadvertent use.

Stable:
ICU 49

Definition at line 150 of file unistr.h.

◆ UNISTR_FROM_STRING_EXPLICIT

#define UNISTR_FROM_STRING_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(const char *) and UnicodeString(const char16_t *) constructors are marked as explicit, preventing their inadvertent use.

In particular, this helps prevent accidentally depending on ICU conversion code by passing a string literal into an API with a const UnicodeString & parameter.

Stable:
ICU 49

Definition at line 170 of file unistr.h.

◆ UNISTR_OBJECT_SIZE

#define UNISTR_OBJECT_SIZE   64

Desired sizeof(UnicodeString) in bytes.

It should be a multiple of sizeof(pointer) to avoid unusable space for padding. The object size may want to be a multiple of 16 bytes, which is a common granularity for heap allocation.

Any space inside the object beyond sizeof(vtable pointer) + 2 is available for storing short strings inside the object. The bigger the object, the longer a string that can be stored inside the object, without additional heap allocation.

Depending on a platform's pointer size, pointer alignment requirements, and struct padding, the compiler will usually round up sizeof(UnicodeString) to 4 * sizeof(pointer) (or 3 * sizeof(pointer) for P128 data models), to hold the fields for heap-allocated strings. Such a minimum size also ensures that the object is easily large enough to hold at least 2 char16_ts, for one supplementary code point (U16_MAX_LENGTH).

sizeof(UnicodeString) >= 48 should work for all known platforms.

For example, on a 64-bit machine where sizeof(vtable pointer) is 8, sizeof(UnicodeString) = 64 would leave space for (64 - sizeof(vtable pointer) - 2) / U_SIZEOF_UCHAR = (64 - 8 - 2) / 2 = 27 char16_ts stored inside the object.

The minimum object size on a 64-bit machine would be 4 * sizeof(pointer) = 4 * 8 = 32 bytes, and the internal buffer would hold up to 11 char16_ts in that case.

See also
U16_MAX_LENGTH
Stable:
ICU 56

Definition at line 208 of file unistr.h.

◆ US_INV

#define US_INV   icu::UnicodeString::kInvariant

Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.

About invariant characters see utypes.h. This constructor has no runtime dependency on conversion code and is therefore recommended over ones taking a charset name string (where the empty string "" indicates invariant-character conversion).

Stable:
ICU 3.2

Definition at line 97 of file unistr.h.

Typedef Documentation

◆ UStringCaseMapper

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)

Internal string case mapping function type.

All error checking must be done. src and dest must not overlap.

Internal:
Do not use. This API is for internal use only.

Definition at line 70 of file unistr.h.