ICU 78.1
78.1
|
C++ header-only API: C++ iterators over Unicode strings (=UTF-8/16/32 if well-formed). More...
#include "unicode/utypes.h"
#include <iterator>
#include <string>
#include <string_view>
#include <type_traits>
#include "unicode/utf16.h"
#include "unicode/utf8.h"
#include "unicode/uversion.h"
Go to the source code of this file.
Typedefs | |
typedef enum UTFIllFormedBehavior | UTFIllFormedBehavior |
Some defined behaviors for handling ill-formed Unicode strings. More... | |
template<typename Iter > | |
using | U_HEADER_ONLY_NAMESPACE::prv::iter_value_t = typename std::iterator_traits< Iter >::value_type |
template<typename Iter > | |
using | U_HEADER_ONLY_NAMESPACE::prv::iter_difference_t = typename std::iterator_traits< Iter >::difference_type |
Enumerations | |
enum | UTFIllFormedBehavior { UTF_BEHAVIOR_NEGATIVE , UTF_BEHAVIOR_FFFD , UTF_BEHAVIOR_SURROGATE } |
Some defined behaviors for handling ill-formed Unicode strings. More... | |
Functions | |
template<typename CP32 , UTFIllFormedBehavior behavior, typename UnitIter , typename LimitIter = UnitIter> | |
auto | U_HEADER_ONLY_NAMESPACE::utfIterator (UnitIter start, UnitIter p, LimitIter limit) |
UTFIterator factory function for start <= p < limit. More... | |
template<typename CP32 , UTFIllFormedBehavior behavior, typename UnitIter , typename LimitIter = UnitIter> | |
auto | U_HEADER_ONLY_NAMESPACE::utfIterator (UnitIter p, LimitIter limit) |
UTFIterator factory function for start = p < limit. More... | |
template<typename CP32 , UTFIllFormedBehavior behavior, typename UnitIter > | |
auto | U_HEADER_ONLY_NAMESPACE::utfIterator (UnitIter p) |
UTFIterator factory function for a start or limit sentinel. More... | |
template<typename CP32 , typename UnitIter > | |
auto | U_HEADER_ONLY_NAMESPACE::unsafeUTFIterator (UnitIter iter) |
UnsafeUTFIterator factory function. More... | |
Variables | |
template<typename Iter > | |
constexpr bool | U_HEADER_ONLY_NAMESPACE::prv::forward_iterator |
template<typename Iter > | |
constexpr bool | U_HEADER_ONLY_NAMESPACE::prv::bidirectional_iterator |
template<typename Range > | |
constexpr bool | U_HEADER_ONLY_NAMESPACE::prv::range = range_type<Range>::value |
template<typename CP32 , UTFIllFormedBehavior behavior> | |
constexpr UTFStringCodePointsAdaptor< CP32, behavior > | U_HEADER_ONLY_NAMESPACE::utfStringCodePoints |
Range adaptor function object returning a UTFStringCodePoints object that represents a "range" of code points in a code unit range, which validates while decoding. More... | |
template<typename CP32 > | |
constexpr UnsafeUTFStringCodePointsAdaptor< CP32 > | U_HEADER_ONLY_NAMESPACE::unsafeUTFStringCodePoints |
Range adaptor function object returning an UnsafeUTFStringCodePoints object that represents a "range" of code points in a code unit range. More... | |
C++ header-only API: C++ iterators over Unicode strings (=UTF-8/16/32 if well-formed).
Sample code:
Definition in file utfiterator.h.
using U_HEADER_ONLY_NAMESPACE::prv::iter_difference_t = typedef typename std::iterator_traits<Iter>::difference_type |
This API is for internal use only.
Definition at line 203 of file utfiterator.h.
using U_HEADER_ONLY_NAMESPACE::prv::iter_value_t = typedef typename std::iterator_traits<Iter>::value_type |
This API is for internal use only.
Definition at line 199 of file utfiterator.h.
typedef enum UTFIllFormedBehavior UTFIllFormedBehavior |
Some defined behaviors for handling ill-formed Unicode strings.
This is a template parameter for UTFIterator and related classes.
When a validating UTFIterator encounters an ill-formed code unit sequence, then CodeUnits.codePoint() is a value according to this parameter.
enum UTFIllFormedBehavior |
Some defined behaviors for handling ill-formed Unicode strings.
This is a template parameter for UTFIterator and related classes.
When a validating UTFIterator encounters an ill-formed code unit sequence, then CodeUnits.codePoint() is a value according to this parameter.
Enumerator | |
---|---|
UTF_BEHAVIOR_NEGATIVE | Returns a negative value (-1=U_SENTINEL) instead of a code point. If the CP32 template parameter for the relevant classes is an unsigned type, then the negative value becomes 0xffffffff=UINT32_MAX.
|
UTF_BEHAVIOR_FFFD | Returns U+FFFD Replacement Character.
|
UTF_BEHAVIOR_SURROGATE | UTF-8: Not allowed; UTF-16: returns the unpaired surrogate; UTF-32: returns the surrogate code point, or U+FFFD if out of range.
|
Definition at line 149 of file utfiterator.h.
auto U_HEADER_ONLY_NAMESPACE::unsafeUTFIterator | ( | UnitIter | iter | ) |
UnsafeUTFIterator factory function.
Deduces the UnitIter template parameter from the input.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t |
UnitIter | Can usually be omitted/deduced: An iterator (often a pointer) that returns a code unit type: UTF-8: char or char8_t or uint8_t; UTF-16: char16_t or uint16_t or (on Windows) wchar_t; UTF-32: char32_t or UChar32=int32_t or (on Linux) wchar_t |
iter | code unit iterator |
Definition at line 2454 of file utfiterator.h.
References U_HEADER_ONLY_NAMESPACE::unsafeUTFIterator().
Referenced by U_HEADER_ONLY_NAMESPACE::unsafeUTFIterator().
auto U_HEADER_ONLY_NAMESPACE::utfIterator | ( | UnitIter | p | ) |
UTFIterator factory function for a start or limit sentinel.
Deduces the UnitIter template parameter from the input. Requires UnitIter to be copyable.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t |
behavior | How to handle ill-formed Unicode strings |
UnitIter | Can usually be omitted/deduced: An iterator (often a pointer) that returns a code unit type: UTF-8: char or char8_t or uint8_t; UTF-16: char16_t or uint16_t or (on Windows) wchar_t; UTF-32: char32_t or UChar32=int32_t or (on Linux) wchar_t |
p | code unit iterator. When using a code unit sentinel, then that sentinel also works as a sentinel for the code point iterator. |
Definition at line 1735 of file utfiterator.h.
References U_HEADER_ONLY_NAMESPACE::utfIterator().
auto U_HEADER_ONLY_NAMESPACE::utfIterator | ( | UnitIter | p, |
LimitIter | limit | ||
) |
UTFIterator factory function for start = p < limit.
Deduces the UnitIter and LimitIter template parameters from the inputs.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t |
behavior | How to handle ill-formed Unicode strings |
UnitIter | Can usually be omitted/deduced: An iterator (often a pointer) that returns a code unit type: UTF-8: char or char8_t or uint8_t; UTF-16: char16_t or uint16_t or (on Windows) wchar_t; UTF-32: char32_t or UChar32=int32_t or (on Linux) wchar_t |
LimitIter | Either the same as UnitIter, or an iterator sentinel type. |
p | start and current-position code unit iterator |
limit | limit (exclusive-end) code unit iterator. When using a code unit sentinel (UnitIter≠LimitIter), then that sentinel also works as a sentinel for the code point iterator. |
Definition at line 1705 of file utfiterator.h.
References U_HEADER_ONLY_NAMESPACE::utfIterator().
auto U_HEADER_ONLY_NAMESPACE::utfIterator | ( | UnitIter | start, |
UnitIter | p, | ||
LimitIter | limit | ||
) |
UTFIterator factory function for start <= p < limit.
Deduces the UnitIter and LimitIter template parameters from the inputs. Only enabled if UnitIter is a (multi-pass) forward_iterator or better.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t |
behavior | How to handle ill-formed Unicode strings |
UnitIter | Can usually be omitted/deduced: An iterator (often a pointer) that returns a code unit type: UTF-8: char or char8_t or uint8_t; UTF-16: char16_t or uint16_t or (on Windows) wchar_t; UTF-32: char32_t or UChar32=int32_t or (on Linux) wchar_t |
LimitIter | Either the same as UnitIter, or an iterator sentinel type. |
start | start code unit iterator |
p | current-position code unit iterator |
limit | limit (exclusive-end) code unit iterator. When using a code unit sentinel (UnitIter≠LimitIter), then that sentinel also works as a sentinel for the code point iterator. |
Definition at line 1678 of file utfiterator.h.
References U_HEADER_ONLY_NAMESPACE::utfIterator().
Referenced by U_HEADER_ONLY_NAMESPACE::utfIterator().
|
constexpr |
This API is for internal use only.
Definition at line 214 of file utfiterator.h.
|
constexpr |
This API is for internal use only.
Definition at line 207 of file utfiterator.h.
|
constexpr |
This API is for internal use only.
Definition at line 232 of file utfiterator.h.
|
constexpr |
Range adaptor function object returning an UnsafeUTFStringCodePoints object that represents a "range" of code points in a code unit range.
The string must be well-formed. Deduces the Range template parameter from the input, taking into account the value category: the code units will be referenced if possible, and moved if necessary.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t |
Range | A C++ "range" of Unicode UTF-8/16/32 code units |
unitRange | input range |
Definition at line 2604 of file utfiterator.h.
|
constexpr |
Range adaptor function object returning a UTFStringCodePoints object that represents a "range" of code points in a code unit range, which validates while decoding.
Deduces the Range template parameter from the input, taking into account the value category: the code units will be referenced if possible, and moved if necessary.
CP32 | Code point type: UChar32 (=int32_t) or char32_t or uint32_t; should be signed if UTF_BEHAVIOR_NEGATIVE |
behavior | How to handle ill-formed Unicode strings |
Range | A C++ "range" of Unicode UTF-8/16/32 code units |
unitRange | input range |
Definition at line 1894 of file utfiterator.h.