Struct icu_locale_core::LanguageIdentifier

source ·
pub struct LanguageIdentifier {
    pub language: Language,
    pub script: Option<Script>,
    pub region: Option<Region>,
    pub variants: Variants,
}
Expand description

A core struct representing a Unicode BCP47 Language Identifier.

§Ordering

This type deliberately does not implement Ord or PartialOrd because there are multiple possible orderings. Depending on your use case, two orderings are available:

  1. A string ordering, suitable for stable serialization: LanguageIdentifier::strict_cmp
  2. A struct ordering, suitable for use with a BTreeSet: LanguageIdentifier::total_cmp

See issue: https://github.com/unicode-org/icu4x/issues/1215

§Parsing

Unicode recognizes three levels of standard conformance for any language identifier:

  • well-formed - syntactically correct
  • valid - well-formed and only uses registered language, region, script and variant subtags…
  • canonical - valid and no deprecated codes or structure.

At the moment parsing normalizes a well-formed language identifier converting _ separators to - and adjusting casing to conform to the Unicode standard.

Any syntactically invalid subtags will cause the parsing to fail with an error.

This operation normalizes syntax to be well-formed. No legacy subtag replacements is performed. For validation and canonicalization, see LocaleCanonicalizer.

§Examples

Simple example:

use icu::locale::{
    langid,
    subtags::{language, region},
};

let li = langid!("en-US");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, None);
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.len(), 0);

More complex example:

use icu::locale::{
    langid,
    subtags::{language, region, script, variant},
};

let li = langid!("eN_latn_Us-Valencia");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, Some(script!("Latn")));
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.get(0), Some(&variant!("valencia")));

Fields§

§language: Language

Language subtag of the language identifier.

§script: Option<Script>

Script subtag of the language identifier.

§region: Option<Region>

Region subtag of the language identifier.

§variants: Variants

Variant subtags of the language identifier.

Implementations§

source§

impl LanguageIdentifier

source

pub fn try_from_str(s: &str) -> Result<Self, ParseError>

A constructor which takes a utf8 slice, parses it and produces a well-formed LanguageIdentifier.

§Examples
use icu::locale::LanguageIdentifier;

LanguageIdentifier::try_from_str("en-US").expect("Parsing failed");
source

pub fn try_from_utf8(code_units: &[u8]) -> Result<Self, ParseError>

source

pub fn try_from_locale_bytes(v: &[u8]) -> Result<Self, ParseError>

A constructor which takes a utf8 slice which may contain extension keys, parses it and produces a well-formed LanguageIdentifier.

§Examples
use icu::locale::{langid, LanguageIdentifier};

let li = LanguageIdentifier::try_from_locale_bytes(b"en-US-x-posix")
    .expect("Parsing failed.");

assert_eq!(li, langid!("en-US"));

This method should be used for input that may be a locale identifier. All extensions will be lost.

source

pub const fn default() -> Self

Const-friendly version of Default::default.

source

pub const fn is_default(&self) -> bool

Whether this language identifier equals Self::default.

source

pub fn normalize_utf8(input: &[u8]) -> Result<Cow<'_, str>, ParseError>

Normalize the language identifier (operating on UTF-8 formatted byte slices)

This operation will normalize casing and the separator.

§Examples
use icu::locale::LanguageIdentifier;

assert_eq!(
    LanguageIdentifier::normalize("pL_latn_pl").as_deref(),
    Ok("pl-Latn-PL")
);
source

pub fn normalize(input: &str) -> Result<Cow<'_, str>, ParseError>

Normalize the language identifier (operating on strings)

This operation will normalize casing and the separator.

§Examples
use icu::locale::LanguageIdentifier;

assert_eq!(
    LanguageIdentifier::normalize("pL_latn_pl").as_deref(),
    Ok("pl-Latn-PL")
);
source

pub fn strict_cmp(&self, other: &[u8]) -> Ordering

Compare this LanguageIdentifier with BCP-47 bytes.

The return value is equivalent to what would happen if you first converted this LanguageIdentifier to a BCP-47 string and then performed a byte comparison.

This function is case-sensitive and results in a total order, so it is appropriate for binary search. The only argument producing Ordering::Equal is self.to_string().

§Examples

Sorting a list of langids with this method requires converting one of them to a string:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;
use writeable::Writeable;

// Random input order:
let bcp47_strings: &[&str] = &[
    "ar-Latn",
    "zh-Hant-TW",
    "zh-TW",
    "und-fonipa",
    "zh-Hant",
    "ar-SA",
];

let mut langids = bcp47_strings
    .iter()
    .map(|s| s.parse().unwrap())
    .collect::<Vec<LanguageIdentifier>>();
langids.sort_by(|a, b| {
    let b = b.write_to_string();
    a.strict_cmp(b.as_bytes())
});
let strict_cmp_strings = langids
    .iter()
    .map(|l| l.to_string())
    .collect::<Vec<String>>();

// Output ordering, sorted alphabetically
let expected_ordering: &[&str] = &[
    "ar-Latn",
    "ar-SA",
    "und-fonipa",
    "zh-Hant",
    "zh-Hant-TW",
    "zh-TW",
];

assert_eq!(expected_ordering, strict_cmp_strings);
source

pub fn total_cmp(&self, other: &Self) -> Ordering

Compare this LanguageIdentifier with another LanguageIdentifier field-by-field. The result is a total ordering sufficient for use in a BTreeSet.

Unlike LanguageIdentifier::strict_cmp, the ordering may or may not be equivalent to string ordering, and it may or may not be stable across ICU4X releases.

§Examples

This method returns a nonsensical ordering derived from the fields of the struct:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;

// Input strings, sorted alphabetically
let bcp47_strings: &[&str] = &[
    "ar-Latn",
    "ar-SA",
    "und-fonipa",
    "zh-Hant",
    "zh-Hant-TW",
    "zh-TW",
];
assert!(bcp47_strings.windows(2).all(|w| w[0] < w[1]));

let mut langids = bcp47_strings
    .iter()
    .map(|s| s.parse().unwrap())
    .collect::<Vec<LanguageIdentifier>>();
langids.sort_by(LanguageIdentifier::total_cmp);
let total_cmp_strings = langids
    .iter()
    .map(|l| l.to_string())
    .collect::<Vec<String>>();

// Output ordering, sorted arbitrarily
let expected_ordering: &[&str] = &[
    "ar-SA",
    "ar-Latn",
    "und-fonipa",
    "zh-TW",
    "zh-Hant",
    "zh-Hant-TW",
];

assert_eq!(expected_ordering, total_cmp_strings);

Use a wrapper to add a LanguageIdentifier to a BTreeSet:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;
use std::collections::BTreeSet;

#[derive(PartialEq, Eq)]
struct LanguageIdentifierTotalOrd(LanguageIdentifier);

impl Ord for LanguageIdentifierTotalOrd {
    fn cmp(&self, other: &Self) -> Ordering {
        self.0.total_cmp(&other.0)
    }
}

impl PartialOrd for LanguageIdentifierTotalOrd {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

let _: BTreeSet<LanguageIdentifierTotalOrd> = unimplemented!();
source

pub fn normalizing_eq(&self, other: &str) -> bool

Compare this LanguageIdentifier with a potentially unnormalized BCP-47 string.

The return value is equivalent to what would happen if you first parsed the BCP-47 string to a LanguageIdentifier and then performed a structural comparison.

§Examples
use icu::locale::LanguageIdentifier;

let bcp47_strings: &[&str] = &[
    "pl-LaTn-pL",
    "uNd",
    "UnD-adlm",
    "uNd-GB",
    "UND-FONIPA",
    "ZH",
];

for a in bcp47_strings {
    assert!(a.parse::<LanguageIdentifier>().unwrap().normalizing_eq(a));
}
source§

impl LanguageIdentifier

source

pub fn to_string(&self) -> String

Converts the given value to a String.

Under the hood, this uses an efficient [Writeable] implementation. However, in order to avoid allocating a string, it is more efficient to use [Writeable] directly.

Trait Implementations§

source§

impl Bake for LanguageIdentifier

source§

fn bake(&self, env: &CrateEnv) -> TokenStream

Returns a TokenStream that would evaluate to self. Read more
source§

impl Clone for LanguageIdentifier

source§

fn clone(&self) -> LanguageIdentifier

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for LanguageIdentifier

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for LanguageIdentifier

source§

fn default() -> LanguageIdentifier

Returns the “default value” for a type. Read more
source§

impl<'de> Deserialize<'de> for LanguageIdentifier

source§

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl Display for LanguageIdentifier

This trait is implemented for compatibility with fmt!. To create a string, [Writeable::write_to_string] is usually more efficient.

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl From<&LanguageIdentifier> for (Language, Option<Script>, Option<Region>)

Convert from a LanguageIdentifier to an LSR tuple.

§Examples

use icu::locale::{
    langid,
    subtags::{language, region, script},
};

let lid = langid!("en-Latn-US");
let (lang, script, region) = (&lid).into();

assert_eq!(lang, language!("en"));
assert_eq!(script, Some(script!("Latn")));
assert_eq!(region, Some(region!("US")));
source§

fn from(langid: &LanguageIdentifier) -> Self

Converts to this type from the input type.
source§

impl From<&LanguageIdentifier> for LocalePreferences

source§

fn from(lid: &LanguageIdentifier) -> Self

Converts to this type from the input type.
source§

impl From<(Language, Option<Script>, Option<Region>)> for LanguageIdentifier

Convert from an LSR tuple to a LanguageIdentifier.

§Examples

use icu::locale::{
    langid,
    subtags::{language, region, script},
    LanguageIdentifier,
};

let lang = language!("en");
let script = script!("Latn");
let region = region!("US");
assert_eq!(
    LanguageIdentifier::from((lang, Some(script), Some(region))),
    langid!("en-Latn-US")
);
source§

fn from(lsr: (Language, Option<Script>, Option<Region>)) -> Self

Converts to this type from the input type.
source§

impl From<Language> for LanguageIdentifier

§Examples

use icu::locale::{langid, subtags::language, LanguageIdentifier};

assert_eq!(LanguageIdentifier::from(language!("en")), langid!("en"));
source§

fn from(language: Language) -> Self

Converts to this type from the input type.
source§

impl From<LanguageIdentifier> for Locale

source§

fn from(id: LanguageIdentifier) -> Self

Converts to this type from the input type.
source§

impl From<Locale> for LanguageIdentifier

source§

fn from(loc: Locale) -> Self

Converts to this type from the input type.
source§

impl From<Option<Region>> for LanguageIdentifier

§Examples

use icu::locale::{langid, subtags::region, LanguageIdentifier};

assert_eq!(
    LanguageIdentifier::from(Some(region!("US"))),
    langid!("und-US")
);
source§

fn from(region: Option<Region>) -> Self

Converts to this type from the input type.
source§

impl From<Option<Script>> for LanguageIdentifier

§Examples

use icu::locale::{langid, subtags::script, LanguageIdentifier};

assert_eq!(
    LanguageIdentifier::from(Some(script!("latn"))),
    langid!("und-Latn")
);
source§

fn from(script: Option<Script>) -> Self

Converts to this type from the input type.
source§

impl FromStr for LanguageIdentifier

source§

type Err = ParseError

The associated error which can be returned from parsing.
source§

fn from_str(s: &str) -> Result<Self, Self::Err>

Parses a string s to return a value of this type. Read more
source§

impl Hash for LanguageIdentifier

source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
source§

impl PartialEq for LanguageIdentifier

source§

fn eq(&self, other: &LanguageIdentifier) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Serialize for LanguageIdentifier

source§

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more
source§

impl Writeable for LanguageIdentifier

source§

fn write_to<W: Write + ?Sized>(&self, sink: &mut W) -> Result

Writes a string to the given sink. Errors from the sink are bubbled up. The default implementation delegates to write_to_parts, and discards any Part annotations.
source§

fn writeable_length_hint(&self) -> LengthHint

Returns a hint for the number of UTF-8 bytes that will be written to the sink. Read more
source§

fn write_to_string(&self) -> Cow<'_, str>

Creates a new String with the data from this Writeable. Like ToString, but smaller and faster. Read more
§

fn write_to_parts<S>(&self, sink: &mut S) -> Result<(), Error>
where S: PartsWrite + ?Sized,

Write bytes and Part annotations to the given sink. Errors from the sink are bubbled up. The default implementation delegates to write_to, and doesn’t produce any Part annotations.
source§

impl Eq for LanguageIdentifier

source§

impl StructuralPartialEq for LanguageIdentifier

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T> ToString for T
where T: Display + ?Sized,

source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

§

impl<T> ErasedDestructor for T
where T: 'static,