Crate icu::normalizer
source · Expand description
Normalizing text into Unicode Normalization Forms.
This module is published as its own crate (icu_normalizer
)
and as part of the icu
crate. See the latter for more details on the ICU4X project.
§Functionality
The top level of the crate provides normalization of input into the four normalization forms defined in UAX #15: Unicode Normalization Forms: NFC, NFD, NFKC, and NFKD.
Three kinds of contiguous inputs are supported: known-well-formed UTF-8 (&str
), potentially-not-well-formed UTF-8,
and potentially-not-well-formed UTF-16. Additionally, an iterator over char
can be wrapped in a normalizing iterator.
The uts46
module provides the combination of mapping and normalization operations for UTS #46: Unicode IDNA
Compatibility Processing. This functionality is not meant to be used by
applications directly. Instead, it is meant as a building block for a full implementation of UTS #46, such as the
idna
crate.
The properties
module provides the non-recursive canonical decomposition operation on a per char
basis and
the canonical compositon operation given two char
s. It also provides access to the Canonical Combining Class
property. These operations are primarily meant for HarfBuzz via the
icu_harfbuzz
crate.
Notably, this normalizer does not provide the normalization “quick check” that can result in “maybe” in addition to “yes” and “no”. The normalization checks provided by this crate always give a definitive non-“maybe” answer.
§Examples
let nfc = icu_normalizer::ComposingNormalizerBorrowed::new_nfc();
assert_eq!(nfc.normalize("a\u{0308}"), "ä");
assert!(nfc.is_normalized("ä"));
let nfd = icu_normalizer::DecomposingNormalizerBorrowed::new_nfd();
assert_eq!(nfd.normalize("ä"), "a\u{0308}");
assert!(!nfd.is_normalized("ä"));
Modules§
- Access to the Unicode properties or property-based operations that are required for NFC and NFD.
- 🚧 [Unstable] Data provider struct definitions for this ICU4X component.
- Bundles the part of UTS 46 that makes sense to implement as a normalization.
Structs§
- A normalizer for performing composing normalization.
- Borrowed version of a normalizer for performing composing normalization.
- An iterator adaptor that turns an
Iterator
overchar
into a lazily-decomposed and then canonically composedchar
sequence. - A normalizer for performing decomposing normalization.
- Borrowed version of a normalizer for performing decomposing normalization.
- An iterator adaptor that turns an
Iterator
overchar
into a lazily-decomposedchar
sequence.