Struct icu::segmenter::SentenceSegmenter
source · pub struct SentenceSegmenter { /* private fields */ }
Expand description
Supports loading sentence break data, and creating sentence break iterators for different string encodings.
§Examples
Segment a string:
use icu::segmenter::SentenceSegmenter;
let segmenter = SentenceSegmenter::new();
let breakpoints: Vec<usize> =
segmenter.segment_str("Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);
Segment a Latin1 byte string:
use icu::segmenter::SentenceSegmenter;
let segmenter = SentenceSegmenter::new();
let breakpoints: Vec<usize> =
segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);
Successive boundaries can be used to retrieve the sentences. In particular, the first boundary is always 0, and the last one is the length of the segmented text in code units.
use itertools::Itertools;
let text = "Ceci tuera cela. Le livre tuera l’édifice.";
let sentences: Vec<&str> = segmenter
.segment_str(text)
.tuple_windows()
.map(|(i, j)| &text[i..j])
.collect();
assert_eq!(
&sentences,
&["Ceci tuera cela. ", "Le livre tuera l’édifice."]
);
Implementations§
source§impl SentenceSegmenter
impl SentenceSegmenter
sourcepub fn new() -> SentenceSegmenter
pub fn new() -> SentenceSegmenter
Constructs a SentenceSegmenter
with an invariant locale and compiled data.
✨ Enabled with the compiled_data
Cargo feature.
sourcepub fn try_new_with_any_provider(
provider: &(impl AnyProvider + ?Sized),
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_any_provider( provider: &(impl AnyProvider + ?Sized), ) -> Result<SentenceSegmenter, DataError>
A version of [Self :: new
] that uses custom data provided by an AnyProvider
.
sourcepub fn try_new_with_buffer_provider(
provider: &(impl BufferProvider + ?Sized),
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_buffer_provider( provider: &(impl BufferProvider + ?Sized), ) -> Result<SentenceSegmenter, DataError>
A version of [Self :: new
] that uses custom data provided by a BufferProvider
.
✨ Enabled with the serde
feature.
sourcepub fn try_new_unstable<D>(provider: &D) -> Result<SentenceSegmenter, DataError>
pub fn try_new_unstable<D>(provider: &D) -> Result<SentenceSegmenter, DataError>
A version of Self::new
that uses custom data provided by a DataProvider
.
sourcepub fn try_new_with_options(
options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_options( options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
Constructs a SentenceSegmenter
for a given options and using compiled data.
✨ Enabled with the compiled_data
Cargo feature.
sourcepub fn try_new_with_options_with_any_provider(
provider: &(impl AnyProvider + ?Sized),
options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_options_with_any_provider( provider: &(impl AnyProvider + ?Sized), options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
A version of [Self :: try_new_with_options
] that uses custom data provided by an AnyProvider
.
sourcepub fn try_new_with_options_with_buffer_provider(
provider: &(impl BufferProvider + ?Sized),
options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_options_with_buffer_provider( provider: &(impl BufferProvider + ?Sized), options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
A version of [Self :: try_new_with_options
] that uses custom data provided by a BufferProvider
.
✨ Enabled with the serde
feature.
sourcepub fn try_new_with_options_unstable<D>(
provider: &D,
options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
pub fn try_new_with_options_unstable<D>( provider: &D, options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
A version of Self::try_new_with_options
that uses custom data provided by a DataProvider
.
sourcepub fn segment_str<'l, 's>(
&'l self,
input: &'s str,
) -> SentenceBreakIterator<'l, 's, RuleBreakTypeUtf8> ⓘ
pub fn segment_str<'l, 's>( &'l self, input: &'s str, ) -> SentenceBreakIterator<'l, 's, RuleBreakTypeUtf8> ⓘ
Creates a sentence break iterator for an str
(a UTF-8 string).
There are always breakpoints at 0 and the string length, or only at 0 for the empty string.
sourcepub fn segment_utf8<'l, 's>(
&'l self,
input: &'s [u8],
) -> SentenceBreakIterator<'l, 's, RuleBreakTypePotentiallyIllFormedUtf8> ⓘ
pub fn segment_utf8<'l, 's>( &'l self, input: &'s [u8], ) -> SentenceBreakIterator<'l, 's, RuleBreakTypePotentiallyIllFormedUtf8> ⓘ
Creates a sentence break iterator for a potentially ill-formed UTF8 string
Invalid characters are treated as REPLACEMENT CHARACTER
There are always breakpoints at 0 and the string length, or only at 0 for the empty string.
sourcepub fn segment_latin1<'l, 's>(
&'l self,
input: &'s [u8],
) -> SentenceBreakIterator<'l, 's, RuleBreakTypeLatin1> ⓘ
pub fn segment_latin1<'l, 's>( &'l self, input: &'s [u8], ) -> SentenceBreakIterator<'l, 's, RuleBreakTypeLatin1> ⓘ
Creates a sentence break iterator for a Latin-1 (8-bit) string.
There are always breakpoints at 0 and the string length, or only at 0 for the empty string.
sourcepub fn segment_utf16<'l, 's>(
&'l self,
input: &'s [u16],
) -> SentenceBreakIterator<'l, 's, RuleBreakTypeUtf16> ⓘ
pub fn segment_utf16<'l, 's>( &'l self, input: &'s [u16], ) -> SentenceBreakIterator<'l, 's, RuleBreakTypeUtf16> ⓘ
Creates a sentence break iterator for a UTF-16 string.
There are always breakpoints at 0 and the string length, or only at 0 for the empty string.
Trait Implementations§
source§impl Debug for SentenceSegmenter
impl Debug for SentenceSegmenter
source§impl Default for SentenceSegmenter
impl Default for SentenceSegmenter
source§fn default() -> SentenceSegmenter
fn default() -> SentenceSegmenter
Auto Trait Implementations§
impl Freeze for SentenceSegmenter
impl RefUnwindSafe for SentenceSegmenter
impl Send for SentenceSegmenter
impl Sync for SentenceSegmenter
impl Unpin for SentenceSegmenter
impl UnwindSafe for SentenceSegmenter
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more