Module icu_properties::props
source · Expand description
This module defines all available properties.
Properties may be empty marker types and implement BinaryProperty
, or enumerations1
and implement EnumeratedProperty
.
BinaryProperty
s are queried through a CodePointSetData
,
while EnumeratedProperty
s are queried through CodePointMapData
.
In addition, some EnumeratedProperty
s also implement ParseableEnumeratedProperty
or
NamedEnumeratedProperty
. For these properties, PropertyParser
,
PropertyNamesLong
, and PropertyNamesShort
can be constructed.
either Rust
enum
s, or Ruststruct
s with associated constants (open enums) ↩
Structs§
- Characters with the
Alphabetic
orDecimal_Number
property. - Alphabetic characters.
- ASCII characters commonly used for the representation of hexadecimal numbers.
- Characters and character sequences intended for general-purpose, independent, direct input.
- Enumerated property Bidi_Class
- Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
- Characters that are mirrored in bidirectional text.
- This is a bitpacked combination of the
Bidi_Mirroring_Glyph
,Bidi_Mirrored
, andBidi_Paired_Bracket_Type
properties. - Horizontal whitespace characters
- Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
- Characters which are ignored for casing purposes.
- Characters that are either the source of a case mapping or in the target of a case mapping.
- Uppercase, lowercase, and titlecase characters.
- Characters whose normalized forms are not stable under case folding.
- Characters which may change when they undergo case mapping.
- Characters whose normalized forms are not stable under a
toLowercase
mapping. - Characters which are not identical to their
NFKC_Casefold
mapping. - Characters whose normalized forms are not stable under a
toTitlecase
mapping. - Characters whose normalized forms are not stable under a
toUppercase
mapping. - Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
- For programmatic determination of default ignorable code points.
- Deprecated characters.
- Characters that linguistically modify the meaning of another character to which they apply.
- Enumerated property East_Asian_Width.
- Characters that are emoji.
- Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
- Characters that are emoji modifiers.
- Characters that can serve as a base for emoji modifiers.
- Characters that have emoji presentation by default.
- Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
- Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
- Characters that are excluded from composition.
- Groupings of multiple General_Category property values.
- Error value for
impl TryFrom<u8> for GeneralCategory
. - Visible characters.
- Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”.
- Enumerated property Grapheme_Cluster_Break.
- Property used to define “Grapheme extender”.
- Deprecated property.
- Enumerated property Hangul_Syllable_Type
- Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
- Deprecated property.
- Characters that can come after the first character in an identifier.
- Characters that can begin an identifier.
- Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
- Characters used in Ideographic Description Sequences.
- Characters used in Ideographic Description Sequences.
- Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
- Format control characters which have specific functions for control of cursive joining and ligation.
- Enumerated property Joining_Type. See Section 9.2, Arabic Cursive Joining in The Unicode Standard for the summary of each property value.
- Enumerated property Line_Break.
- A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
- Lowercase characters.
- Characters used in mathematical notation.
- Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
- Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
- Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
- Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
- Code points permanently reserved for internal use.
- Characters used as syntax in patterns (such as regular expressions).
- Characters used as whitespace in patterns (such as regular expressions).
- A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
- Printable characters (visible characters and whitespace).
- Punctuation characters that function as quotation marks.
- Characters used in the definition of Ideographic Description Sequences.
- Regional indicator characters,
U+1F1E6..U+1F1FF
. - Enumerated property Script.
- Characters that are starters in terms of Unicode normalization and combining character sequences.
- Enumerated property Sentence_Break. See “Default Sentence Boundary Specification” in UAX #29 for the summary of each property value: https://www.unicode.org/reports/tr29/#Default_Word_Boundaries.
- Punctuation characters that generally mark the end of sentences.
- Characters with a “soft dot”, like i or j.
- Punctuation characters that generally mark the end of textual units.
- A property which specifies the exact set of Unified CJK Ideographs in the standard.
- Uppercase characters.
- Characters that are Variation Selectors.
- Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements.
- Enumerated property Word_Break.
- Hexadecimal digits This is defined for POSIX compatibility.
- Characters that can come after the first character in an identifier.
- Characters that can begin an identifier.
Enums§
- The enum represents Bidi_Paired_Bracket_Type.
- Enumerated property General_Category.
Traits§
- A binary Unicode character property.
- An Emoji set as defined by
Unicode Technical Standard #51
. - A Unicode character property that assigns a value to each code point.
- A property whose value names can be represented as strings.
- A property whose value names can be parsed from strings.