Struct icu_capi::segmenter_word::ffi::WordSegmenter
source · pub struct WordSegmenter(/* private fields */);
Expand description
An ICU4X word-break segmenter, capable of finding word breakpoints in strings.
Implementations§
source§impl WordSegmenter
impl WordSegmenter
sourcepub fn create_auto() -> Box<WordSegmenter>
pub fn create_auto() -> Box<WordSegmenter>
Construct an WordSegmenter
with automatically selecting the best available LSTM
or dictionary payload data, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_auto_with_provider(
provider: &DataProvider,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_auto_with_provider( provider: &DataProvider, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with automatically selecting the best available LSTM
or dictionary payload data, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_auto_with_content_locale(
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_auto_with_content_locale( locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with automatically selecting the best available LSTM
or dictionary payload data, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_auto_with_content_locale_and_provider(
provider: &DataProvider,
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_auto_with_content_locale_and_provider( provider: &DataProvider, locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with automatically selecting the best available LSTM
or dictionary payload data, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_lstm() -> Box<WordSegmenter>
pub fn create_lstm() -> Box<WordSegmenter>
Construct an WordSegmenter
with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_lstm_with_provider(
provider: &DataProvider,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_lstm_with_provider( provider: &DataProvider, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_lstm_with_content_locale(
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_lstm_with_content_locale( locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_lstm_with_content_locale_and_provider(
provider: &DataProvider,
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_lstm_with_content_locale_and_provider( provider: &DataProvider, locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_dictionary() -> Box<WordSegmenter>
pub fn create_dictionary() -> Box<WordSegmenter>
Construct an WordSegmenter
with with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_dictionary_with_provider(
provider: &DataProvider,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_dictionary_with_provider( provider: &DataProvider, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_dictionary_with_content_locale(
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_dictionary_with_content_locale( locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_dictionary_with_content_locale_and_provider(
provider: &DataProvider,
locale: &Locale,
) -> Result<Box<WordSegmenter>, DataError>
pub fn create_dictionary_with_content_locale_and_provider( provider: &DataProvider, locale: &Locale, ) -> Result<Box<WordSegmenter>, DataError>
Construct an WordSegmenter
with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
sourcepub fn segment_utf8<'a>(
&'a self,
input: &'a DiplomatStr,
) -> Box<WordBreakIteratorUtf8<'a>>
pub fn segment_utf8<'a>( &'a self, input: &'a DiplomatStr, ) -> Box<WordBreakIteratorUtf8<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
sourcepub fn segment_utf16<'a>(
&'a self,
input: &'a DiplomatStr16,
) -> Box<WordBreakIteratorUtf16<'a>>
pub fn segment_utf16<'a>( &'a self, input: &'a DiplomatStr16, ) -> Box<WordBreakIteratorUtf16<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
sourcepub fn segment_latin1<'a>(
&'a self,
input: &'a [u8],
) -> Box<WordBreakIteratorLatin1<'a>>
pub fn segment_latin1<'a>( &'a self, input: &'a [u8], ) -> Box<WordBreakIteratorLatin1<'a>>
Segments a Latin-1 string.
Auto Trait Implementations§
impl Freeze for WordSegmenter
impl RefUnwindSafe for WordSegmenter
impl Send for WordSegmenter
impl Sync for WordSegmenter
impl Unpin for WordSegmenter
impl UnwindSafe for WordSegmenter
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more