1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).
//! Bundles the part of UTS 46 that makes sense to implement as a
//! normalization.
//!
//! This is meant to be used as a building block of an UTS 46
//! implementation, such as the `idna` crate.
use crate::CanonicalCompositionsV1Marker;
use crate::CanonicalDecompositionDataV1Marker;
use crate::CanonicalDecompositionTablesV1Marker;
use crate::CompatibilityDecompositionTablesV1Marker;
use crate::ComposingNormalizer;
use crate::ComposingNormalizerBorrowed;
use crate::Uts46DecompositionSupplementV1Marker;
use icu_provider::DataError;
use icu_provider::DataProvider;
// Implementation note: Despite merely wrapping a `ComposingNormalizer`,
// having a `Uts46Mapper` serves two purposes:
//
// 1. Denying public access to parts of the `ComposingNormalizer` API
// that don't work when the data contains markers for ignorables.
// 2. Providing a place where additional iterator pre-processing or
// post-processing can take place if needed in the future. (When
// writing this, it looked like such processing was needed but
// now isn't needed after all.)
/// A borrowed version of a mapper that knows how to performs the
/// subsets of UTS 46 processing documented on the methods.
#[derive(Debug)]
pub struct Uts46MapperBorrowed<'a> {
normalizer: ComposingNormalizerBorrowed<'a>,
}
#[cfg(feature = "compiled_data")]
impl Default for Uts46MapperBorrowed<'static> {
fn default() -> Self {
Self::new()
}
}
impl Uts46MapperBorrowed<'static> {
/// Cheaply converts a [`Uts46MapperBorrowed<'static>`] into a [`Uts46Mapper`].
///
/// Note: Due to branching and indirection, using [`Uts46Mapper`] might inhibit some
/// compile-time optimizations that are possible with [`Uts46MapperBorrowed`].
pub const fn static_to_owned(self) -> Uts46Mapper {
Uts46Mapper {
normalizer: self.normalizer.static_to_owned(),
}
}
/// Construct with compiled data.
#[cfg(feature = "compiled_data")]
pub const fn new() -> Self {
Uts46MapperBorrowed {
normalizer: ComposingNormalizerBorrowed::new_uts46(),
}
}
}
impl Uts46MapperBorrowed<'_> {
/// Returns an iterator adaptor that turns an `Iterator` over `char`
/// into an iterator yielding a `char` sequence that gets the following
/// operations from the "Map" and "Normalize" steps of the "Processing"
/// section of UTS 46 lazily applied to it:
///
/// 1. The _ignored_ characters are ignored.
/// 2. The _mapped_ characters are mapped.
/// 3. The _disallowed_ characters are replaced with U+FFFD,
/// which itself is a disallowed character.
/// 4. The _deviation_ characters are treated as _mapped_ or _valid_
/// as appropriate.
/// 5. The _disallowed_STD3_valid_ characters are treated as allowed.
/// 6. The _disallowed_STD3_mapped_ characters are treated as
/// _mapped_.
/// 7. The result is normalized to NFC.
///
/// Notably:
///
/// * The STD3 or WHATWG ASCII deny list should be implemented as a
/// post-processing step.
/// * Transitional processing is not performed. Transitional mapping
/// would be a pre-processing step, but transitional processing is
/// deprecated, and none of Firefox, Safari, or Chrome use it.
pub fn map_normalize<'delegate, I: Iterator<Item = char> + 'delegate>(
&'delegate self,
iter: I,
) -> impl Iterator<Item = char> + 'delegate {
self.normalizer
.normalize_iter_private(iter, crate::IgnorableBehavior::Ignored)
}
/// Returns an iterator adaptor that turns an `Iterator` over `char`
/// into an iterator yielding a `char` sequence that gets the following
/// operations from the NFC check and statucs steps of the "Validity
/// Criteria" section of UTS 46 lazily applied to it:
///
/// 1. The _ignored_ characters are treated as _disallowed_.
/// 2. The _mapped_ characters are mapped.
/// 3. The _disallowed_ characters are replaced with U+FFFD,
/// which itself is a disallowed character.
/// 4. The _deviation_ characters are treated as _mapped_ or _valid_
/// as appropriate.
/// 5. The _disallowed_STD3_valid_ characters are treated as allowed.
/// 6. The _disallowed_STD3_mapped_ characters are treated as
/// _mapped_.
/// 7. The result is normalized to NFC.
///
/// Notably:
///
/// * The STD3 or WHATWG ASCII deny list should be implemented as a
/// post-processing step.
/// * Transitional processing is not performed. Transitional mapping
/// would be a pre-processing step, but transitional processing is
/// deprecated, and none of Firefox, Safari, or Chrome use it.
/// * The output needs to be compared with input to see if anything
/// changed. This check catches failures to adhere to the normalization
/// and status requirements. In particular, this comparison results
/// in _mapped_ characters resulting in error like "Validity Criteria"
/// requires.
pub fn normalize_validate<'delegate, I: Iterator<Item = char> + 'delegate>(
&'delegate self,
iter: I,
) -> impl Iterator<Item = char> + 'delegate {
self.normalizer
.normalize_iter_private(iter, crate::IgnorableBehavior::ReplacementCharacter)
}
}
/// A mapper that knows how to performs the subsets of UTS 46 processing
/// documented on the methods.
#[derive(Debug)]
pub struct Uts46Mapper {
normalizer: ComposingNormalizer,
}
#[cfg(feature = "compiled_data")]
impl Default for Uts46Mapper {
fn default() -> Self {
Self::new().static_to_owned()
}
}
impl Uts46Mapper {
/// Constructs a borrowed version of this type for more efficient querying.
pub fn as_borrowed(&self) -> Uts46MapperBorrowed<'_> {
Uts46MapperBorrowed {
normalizer: self.normalizer.as_borrowed(),
}
}
/// Construct with compiled data.
#[cfg(feature = "compiled_data")]
#[allow(clippy::new_ret_no_self)]
pub const fn new() -> Uts46MapperBorrowed<'static> {
Uts46MapperBorrowed::new()
}
/// Construct with provider.
#[doc = icu_provider::gen_any_buffer_unstable_docs!(UNSTABLE, Self::new)]
pub fn try_new<D>(provider: &D) -> Result<Self, DataError>
where
D: DataProvider<CanonicalDecompositionDataV1Marker>
+ DataProvider<Uts46DecompositionSupplementV1Marker>
+ DataProvider<CanonicalDecompositionTablesV1Marker>
+ DataProvider<CompatibilityDecompositionTablesV1Marker>
// UTS 46 tables merged into CompatibilityDecompositionTablesV1Marker
+ DataProvider<CanonicalCompositionsV1Marker>
+ ?Sized,
{
let normalizer = ComposingNormalizer::try_new_uts46_unstable(provider)?;
Ok(Uts46Mapper { normalizer })
}
}