ICU 76.1 76.1
|
C++ API: Unicode String. More...
#include "unicode/utypes.h"
#include <cstddef>
#include <string_view>
#include "unicode/char16ptr.h"
#include "unicode/rep.h"
#include "unicode/std_string.h"
#include "unicode/stringpiece.h"
#include "unicode/bytestream.h"
Go to the source code of this file.
Data Structures | |
class | icu::UnicodeString |
UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes. More... | |
Namespaces | |
namespace | icu |
File coll.h. | |
Macros | |
#define | US_INV icu::UnicodeString::kInvariant |
Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string. | |
#define | UNICODE_STRING(cs, _length) icu::UnicodeString(true, u ## cs, _length) |
Obsolete macro approximating UnicodeString literals. | |
#define | UNICODE_STRING_SIMPLE(cs) UNICODE_STRING(cs, -1) |
Unicode String literals in C++. | |
#define | UNISTR_FROM_CHAR_EXPLICIT |
This can be defined to be empty or "explicit". | |
#define | UNISTR_FROM_STRING_EXPLICIT |
This can be defined to be empty or "explicit". | |
#define | UNISTR_OBJECT_SIZE 64 |
Desired sizeof(UnicodeString) in bytes. | |
Typedefs | |
typedef int32_t | UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode) |
Internal string case mapping function type. | |
Functions | |
U_CAPI int32_t | u_strlen (const UChar *s) |
U_COMMON_API UnicodeString | icu::operator+ (const UnicodeString &s1, const UnicodeString &s2) |
Creates a new UnicodeString from the concatenation of two others. | |
template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>> | |
UnicodeString | icu::operator+ (const UnicodeString &s1, const S &s2) |
Creates a new UnicodeString from the concatenation of a UnicodeString and s2 which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view. | |
U_COMMON_API UnicodeString | icu::unistr_internalConcat (const UnicodeString &s1, std::u16string_view s2) |
C++ API: Unicode String.
Definition in file unistr.h.
#define UNICODE_STRING | ( | cs, | |
_length | |||
) | icu::UnicodeString(true, u ## cs, _length) |
Obsolete macro approximating UnicodeString literals.
Prior to the availability of C++11 and u"UTF-16 string literals", this macro was provided for portability and efficiency when initializing UnicodeStrings from literals.
Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:
The string parameter must be a C string literal. The length of the string, not including the terminating NUL
, must be specified as a constant.
#define UNICODE_STRING_SIMPLE | ( | cs | ) | UNICODE_STRING(cs, -1) |
Unicode String literals in C++.
Obsolete macro approximating UnicodeString literals. See UNICODE_STRING.
The string parameter must be a C string literal.
#define UNISTR_FROM_CHAR_EXPLICIT |
#define UNISTR_FROM_STRING_EXPLICIT |
This can be defined to be empty or "explicit".
If explicit, then the UnicodeString(const char *) and UnicodeString(const char16_t *) constructors are marked as explicit, preventing their inadvertent use.
In particular, this helps prevent accidentally depending on ICU conversion code by passing a string literal into an API with a const UnicodeString & parameter.
#define UNISTR_OBJECT_SIZE 64 |
Desired sizeof(UnicodeString) in bytes.
It should be a multiple of sizeof(pointer) to avoid unusable space for padding. The object size may want to be a multiple of 16 bytes, which is a common granularity for heap allocation.
Any space inside the object beyond sizeof(vtable pointer) + 2 is available for storing short strings inside the object. The bigger the object, the longer a string that can be stored inside the object, without additional heap allocation.
Depending on a platform's pointer size, pointer alignment requirements, and struct padding, the compiler will usually round up sizeof(UnicodeString) to 4 * sizeof(pointer) (or 3 * sizeof(pointer) for P128 data models), to hold the fields for heap-allocated strings. Such a minimum size also ensures that the object is easily large enough to hold at least 2 char16_ts, for one supplementary code point (U16_MAX_LENGTH).
sizeof(UnicodeString) >= 48 should work for all known platforms.
For example, on a 64-bit machine where sizeof(vtable pointer) is 8, sizeof(UnicodeString) = 64 would leave space for (64 - sizeof(vtable pointer) - 2) / U_SIZEOF_UCHAR = (64 - 8 - 2) / 2 = 27 char16_ts stored inside the object.
The minimum object size on a 64-bit machine would be 4 * sizeof(pointer) = 4 * 8 = 32 bytes, and the internal buffer would hold up to 11 char16_ts in that case.
#define US_INV icu::UnicodeString::kInvariant |
Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.
About invariant characters see utypes.h. This constructor has no runtime dependency on conversion code and is therefore recommended over ones taking a charset name string (where the empty string "" indicates invariant-character conversion).
typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode) |