Crate icu_properties[][src]

Expand description

icu_properties is one of the ICU4X components.

This component provides definitions of Unicode Properties and APIs for retrieving property data in an appropriate data structure.

APIs that return a UnicodeSet exist for binary properties and certain enumerated properties. See the sets module for more details.

APIs that return a CodePointTrie exist for certain enumerated properties. See the maps module for more details.


Property data as UnicodeSets

use icu::properties::{sets, GeneralCategory};

let provider = icu_testdata::get_provider();

// A binary property as a `UnicodeSet`

let payload =
        .expect("The data should be valid");
let data_struct = payload.get();
let emoji = &data_struct.inv_list;

assert!(emoji.contains('🎃'));  // U+1F383 JACK-O-LANTERN
assert!(!emoji.contains('木'));  // U+6728

// An individual enumerated property value as a `UnicodeSet`

let payload =
    sets::get_for_general_category(&provider, GeneralCategory::LineSeparator)
        .expect("The data should be valid");
let data_struct = payload.get();
let line_sep = &data_struct.inv_list;


Property data as CodePointTries

use icu::properties::{maps, Script};

let provider = icu_testdata::get_provider();

let payload =
        .expect("The data should be valid");
let data_struct = payload.get();
let script = &data_struct.code_point_trie;

assert_eq!(script.get('🎃' as u32), Script::Common);  // U+1F383 JACK-O-LANTERN
assert_eq!(script.get('木' as u32), Script::Han);  // U+6728


The functions in this module return a CodePointTrie representing, for each code point in the entire range of code points, the property values for a particular Unicode property.

Data provider struct definitions for this ICU4X component.

The functions in this module return a UnicodeSet containing the set of characters with a particular Unicode property.


Enumerated property East_Asian_Width.

Enumerated property General_Category.

Enumerated property Grapheme_Cluster_Break.

Enumerated property Line_Break.

Enumerated property Script.

Enumerated property Sentence_Break. See “Default Sentence Boundary Specification” in UAX #29 for the summary of each property value:

Enumerated property Word_Break.


Selection constants for Unicode properties. These constants are used to select one of the Unicode properties. See UProperty in ICU4C.

Enumerated Unicode general category types. GeneralSubcategory only supports specific subcategories (eg UppercaseLetter). It does not support grouped categories (eg Letter). For grouped categories, use GeneralCategory.