ICU 76.1 76.1
Loading...
Searching...
No Matches
Data Structures | Namespaces | Macros | Typedefs | Functions
unistr.h File Reference

C++ API: Unicode String. More...

#include "unicode/utypes.h"
#include <cstddef>
#include <string_view>
#include "unicode/char16ptr.h"
#include "unicode/rep.h"
#include "unicode/std_string.h"
#include "unicode/stringpiece.h"
#include "unicode/bytestream.h"

Go to the source code of this file.

Data Structures

class  icu::UnicodeString
 UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes. More...
 

Namespaces

namespace  icu
 File coll.h.
 

Macros

#define US_INV   icu::UnicodeString::kInvariant
 Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.
 
#define UNICODE_STRING(cs, _length)   icu::UnicodeString(true, u ## cs, _length)
 Obsolete macro approximating UnicodeString literals.
 
#define UNICODE_STRING_SIMPLE(cs)   UNICODE_STRING(cs, -1)
 Unicode String literals in C++.
 
#define UNISTR_FROM_CHAR_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_FROM_STRING_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_OBJECT_SIZE   64
 Desired sizeof(UnicodeString) in bytes.
 

Typedefs

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)
 Internal string case mapping function type.
 

Functions

U_CAPI int32_t u_strlen (const UChar *s)
 
U_COMMON_API UnicodeString icu::operator+ (const UnicodeString &s1, const UnicodeString &s2)
 Creates a new UnicodeString from the concatenation of two others.
 
template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
UnicodeString icu::operator+ (const UnicodeString &s1, const S &s2)
 Creates a new UnicodeString from the concatenation of a UnicodeString and s2 which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
 
U_COMMON_API UnicodeString icu::unistr_internalConcat (const UnicodeString &s1, std::u16string_view s2)
 

Detailed Description

C++ API: Unicode String.

Definition in file unistr.h.

Macro Definition Documentation

◆ UNICODE_STRING

#define UNICODE_STRING (   cs,
  _length 
)    icu::UnicodeString(true, u ## cs, _length)

Obsolete macro approximating UnicodeString literals.

Prior to the availability of C++11 and u"UTF-16 string literals", this macro was provided for portability and efficiency when initializing UnicodeStrings from literals.

Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:

UnicodeString str(u"literal");
if (str == u"other literal") { ... }

The string parameter must be a C string literal. The length of the string, not including the terminating NUL, must be specified as a constant.

Stable:
ICU 2.0

Definition at line 121 of file unistr.h.

◆ UNICODE_STRING_SIMPLE

#define UNICODE_STRING_SIMPLE (   cs)    UNICODE_STRING(cs, -1)

Unicode String literals in C++.

Obsolete macro approximating UnicodeString literals. See UNICODE_STRING.

The string parameter must be a C string literal.

Stable:
ICU 2.0
See also
UNICODE_STRING

Definition at line 135 of file unistr.h.

◆ UNISTR_FROM_CHAR_EXPLICIT

#define UNISTR_FROM_CHAR_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(char16_t) and UnicodeString(UChar32) constructors are marked as explicit, preventing their inadvertent use.

Stable:
ICU 49

Definition at line 150 of file unistr.h.

◆ UNISTR_FROM_STRING_EXPLICIT

#define UNISTR_FROM_STRING_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(const char *) and UnicodeString(const char16_t *) constructors are marked as explicit, preventing their inadvertent use.

In particular, this helps prevent accidentally depending on ICU conversion code by passing a string literal into an API with a const UnicodeString & parameter.

Stable:
ICU 49

Definition at line 170 of file unistr.h.

◆ UNISTR_OBJECT_SIZE

#define UNISTR_OBJECT_SIZE   64

Desired sizeof(UnicodeString) in bytes.

It should be a multiple of sizeof(pointer) to avoid unusable space for padding. The object size may want to be a multiple of 16 bytes, which is a common granularity for heap allocation.

Any space inside the object beyond sizeof(vtable pointer) + 2 is available for storing short strings inside the object. The bigger the object, the longer a string that can be stored inside the object, without additional heap allocation.

Depending on a platform's pointer size, pointer alignment requirements, and struct padding, the compiler will usually round up sizeof(UnicodeString) to 4 * sizeof(pointer) (or 3 * sizeof(pointer) for P128 data models), to hold the fields for heap-allocated strings. Such a minimum size also ensures that the object is easily large enough to hold at least 2 char16_ts, for one supplementary code point (U16_MAX_LENGTH).

sizeof(UnicodeString) >= 48 should work for all known platforms.

For example, on a 64-bit machine where sizeof(vtable pointer) is 8, sizeof(UnicodeString) = 64 would leave space for (64 - sizeof(vtable pointer) - 2) / U_SIZEOF_UCHAR = (64 - 8 - 2) / 2 = 27 char16_ts stored inside the object.

The minimum object size on a 64-bit machine would be 4 * sizeof(pointer) = 4 * 8 = 32 bytes, and the internal buffer would hold up to 11 char16_ts in that case.

See also
U16_MAX_LENGTH
Stable:
ICU 56

Definition at line 208 of file unistr.h.

◆ US_INV

#define US_INV   icu::UnicodeString::kInvariant

Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.

About invariant characters see utypes.h. This constructor has no runtime dependency on conversion code and is therefore recommended over ones taking a charset name string (where the empty string "" indicates invariant-character conversion).

Stable:
ICU 3.2

Definition at line 98 of file unistr.h.

Typedef Documentation

◆ UStringCaseMapper

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)

Internal string case mapping function type.

All error checking must be done. src and dest must not overlap.

Internal:
Do not use. This API is for internal use only.

Definition at line 71 of file unistr.h.