Interface Segments
-
public interface Segments
An interface that represents the segmentation results, including the APIs for iteration therein, that are yielded from passing an inputCharSequence
to aSegmenter
.The segmentation results can be provided either as the segmentation boundary indices ({code int}s) or as segments, which are represented by the
Segment
class. In turn, theSegment
object can also provide the subsequence of the original input that it represents.Example:
Segmenter wordSeg = LocalizedSegmenter.builder() .setLocale(ULocale.forLanguageTag("de")) .setSegmentationType(SegmentationType.WORD) .build(); Segments segments = wordSeg.segment("Das 21ste Jahrh. ist das beste."); List<CharSequence> words = segments.subSequences().collect(Collectors.toList());
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
Segments.IterationDirection
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default IntStream
boundaries()
Returns all segmentation boundaries, starting from the beginning and moving forwards.IntStream
boundariesAfter(int i)
Returns all segmentation boundaries after the provided index.IntStream
boundariesBackFrom(int i)
Returns all segmentation boundaries on or before the provided index.boolean
isBoundary(int i)
Returns whether offseti
is a segmentation boundary.Segment
segmentAt(int i)
Returns the segment that contains indexi
.default Stream<Segment>
segments()
Returns aStream
of allSegment
s in the source sequence.Stream<Segment>
segmentsBefore(int i)
Returns aStream
of allSegment
s in the source sequence where all segment limitsl
satisfyl ≤ i
.Stream<Segment>
segmentsFrom(int i)
Returns aStream
of allSegment
s in the source sequence where all segment limitsl
satisfyi < l
.default Stream<CharSequence>
subSequences()
Returns aStream
of theCharSequence
s for all of the segments in the source sequence.
-
-
-
Method Detail
-
subSequences
default Stream<CharSequence> subSequences()
Returns aStream
of theCharSequence
s for all of the segments in the source sequence. Start from the beginning of the sequence and iterate forwards until the end.- Returns:
- a
Stream
of allSegments
in the source sequence. - Status:
- Draft ICU 78.
-
segmentAt
Segment segmentAt(int i)
Returns the segment that contains indexi
. Containment is inclusive of the start index and exclusive of the limit index.Specifically, the containing segment is defined as the segment with start
s
and limitl
such thats ≤ i < l
.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- A segment that either starts at or contains index
i
- Throws:
IndexOutOfBoundsException
- ifi
is less than 0 or greater than or equal to the length of the inputCharSequence
to theSegmenter
- Status:
- Draft ICU 78.
-
segments
default Stream<Segment> segments()
Returns aStream
of allSegment
s in the source sequence. Start with the first and iterate forwards until the end of the sequence.This is equivalent to
segmentsFrom(0)
.- Returns:
- a
Stream
of allSegments
in the source sequence. - Status:
- Draft ICU 78.
-
segmentsFrom
Stream<Segment> segmentsFrom(int i)
Returns aStream
of allSegment
s in the source sequence where all segment limitsl
satisfyi < l
. Iteration moves forwards.This means that the first segment in the stream is the same as what is returned by
segmentAt(i)
.The word "from" is used here to mean "at or after", with the semantics of "at" for a
Segment
defined bysegmentAt(int)
}. We cannot describe the segments all as being "after" since the first segment might containi
in the middle, meaning that in the forward direction, its start position precedesi
.segmentsFrom
andsegmentsBefore(int)
create a partitioning of the space of allSegment
s.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- a
Stream
of allSegment
s at or afteri
- Status:
- Draft ICU 78.
-
segmentsBefore
Stream<Segment> segmentsBefore(int i)
Returns aStream
of allSegment
s in the source sequence where all segment limitsl
satisfyl ≤ i
. Iteration moves backwards.This means that the all segments in the stream come before the one that is returned by
segmentAt(i)
. A segment is not considered to contain indexi
if {code i} is equal to limitl
. Thus, "before" encapsulates the invariantl ≤ i
.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- a
Stream
of allSegment
s beforei
- Status:
- Draft ICU 78.
-
isBoundary
boolean isBoundary(int i)
Returns whether offseti
is a segmentation boundary. Throws an exception wheni
is not a valid index position for the source sequence.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- Returns whether offset
i
is a segmentation boundary. - Throws:
IllegalArgumentException
- ifi
is less than 0 or greater than the length of the inputCharSequence
to theSegmenter
- Status:
- Draft ICU 78.
-
boundaries
default IntStream boundaries()
Returns all segmentation boundaries, starting from the beginning and moving forwards.Note:
boundaries() != boundariesAfter(0)
. This difference naturally results from the strict inequality condition in boundariesAfter, and the fact that 0 is the first boundary returned from the start of an input sequence.- Returns:
- An
IntStream
of all segmentation boundaries, starting at the first boundary with index 0, and moving forwards in the input sequence. - Status:
- Draft ICU 78.
-
boundariesAfter
IntStream boundariesAfter(int i)
Returns all segmentation boundaries after the provided index. Iteration moves forwards.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- An
IntStream
of all boundariesb
such thatb > i
- Status:
- Draft ICU 78.
-
boundariesBackFrom
IntStream boundariesBackFrom(int i)
Returns all segmentation boundaries on or before the provided index. Iteration moves backwards.The phrase "back from" is used to indicate both that: 1) boundaries are "on or before" the input index; 2) the direction of iteration is backwards (towards the beginning). "on or before" indicates that the result set is
b
whereb ≤ i
, which is a weak inequality, while "before" might suggest the strict inequalityb < i
.boundariesBackFrom
andboundariesAfter(int)
create a partitioning of the space of all boundaries.- Parameters:
i
- index in the inputCharSequence
to theSegmenter
- Returns:
- An
IntStream
of all boundariesb
such thatb ≤ i
- Status:
- Draft ICU 78.
-
-