Interface Segments
-
public interface SegmentsAn interface that represents the segmentation results, including the APIs for iteration therein, that are yielded from passing an inputCharSequenceto aSegmenter.The segmentation results can be provided either as the segmentation boundary indices ({code int}s) or as segments, which are represented by the
Segmentclass. In turn, theSegmentobject can also provide the subsequence of the original input that it represents.Example:
Segmenter wordSeg = LocalizedSegmenter.builder() .setLocale(ULocale.forLanguageTag("de")) .setSegmentationType(SegmentationType.WORD) .build(); Segments segments = wordSeg.segment("Das 21ste Jahrh. ist das beste."); List<CharSequence> words = segments.subSequences().collect(Collectors.toList());
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classSegments.IterationDirection
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default IntStreamboundaries()Returns all segmentation boundaries, starting from the beginning and moving forwards.IntStreamboundariesAfter(int i)Returns all segmentation boundaries after the provided index.IntStreamboundariesBackFrom(int i)Returns all segmentation boundaries on or before the provided index.booleanisBoundary(int i)Returns whether offsetiis a segmentation boundary.SegmentsegmentAt(int i)Returns the segment that contains indexi.default Stream<Segment>segments()Returns aStreamof allSegments in the source sequence.Stream<Segment>segmentsBefore(int i)Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyl ≤ i.Stream<Segment>segmentsFrom(int i)Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyi < l.default Stream<CharSequence>subSequences()Returns aStreamof theCharSequences for all of the segments in the source sequence.
-
-
-
Method Detail
-
subSequences
default Stream<CharSequence> subSequences()
Returns aStreamof theCharSequences for all of the segments in the source sequence. Start from the beginning of the sequence and iterate forwards until the end.- Returns:
- a
Streamof allSegmentsin the source sequence. - Status:
- Draft ICU 78.
-
segmentAt
Segment segmentAt(int i)
Returns the segment that contains indexi. Containment is inclusive of the start index and exclusive of the limit index.Specifically, the containing segment is defined as the segment with start
sand limitlsuch thats ≤ i < l.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- A segment that either starts at or contains index
i - Throws:
IndexOutOfBoundsException- ifiis less than 0 or greater than or equal to the length of the inputCharSequenceto theSegmenter- Status:
- Draft ICU 78.
-
segments
default Stream<Segment> segments()
Returns aStreamof allSegments in the source sequence. Start with the first and iterate forwards until the end of the sequence.This is equivalent to
segmentsFrom(0).- Returns:
- a
Streamof allSegmentsin the source sequence. - Status:
- Draft ICU 78.
-
segmentsFrom
Stream<Segment> segmentsFrom(int i)
Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyi < l. Iteration moves forwards.This means that the first segment in the stream is the same as what is returned by
segmentAt(i).The word "from" is used here to mean "at or after", with the semantics of "at" for a
Segmentdefined bysegmentAt(int)}. We cannot describe the segments all as being "after" since the first segment might containiin the middle, meaning that in the forward direction, its start position precedesi.segmentsFromandsegmentsBefore(int)create a partitioning of the space of allSegments.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- a
Streamof allSegments at or afteri - Status:
- Draft ICU 78.
-
segmentsBefore
Stream<Segment> segmentsBefore(int i)
Returns aStreamof allSegments in the source sequence where all segment limitslsatisfyl ≤ i. Iteration moves backwards.This means that the all segments in the stream come before the one that is returned by
segmentAt(i). A segment is not considered to contain indexiif {code i} is equal to limitl. Thus, "before" encapsulates the invariantl ≤ i.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- a
Streamof allSegments beforei - Status:
- Draft ICU 78.
-
isBoundary
boolean isBoundary(int i)
Returns whether offsetiis a segmentation boundary. Throws an exception wheniis not a valid index position for the source sequence.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- Returns whether offset
iis a segmentation boundary. - Throws:
IllegalArgumentException- ifiis less than 0 or greater than the length of the inputCharSequenceto theSegmenter- Status:
- Draft ICU 78.
-
boundaries
default IntStream boundaries()
Returns all segmentation boundaries, starting from the beginning and moving forwards.Note:
boundaries() != boundariesAfter(0). This difference naturally results from the strict inequality condition in boundariesAfter, and the fact that 0 is the first boundary returned from the start of an input sequence.- Returns:
- An
IntStreamof all segmentation boundaries, starting at the first boundary with index 0, and moving forwards in the input sequence. - Status:
- Draft ICU 78.
-
boundariesAfter
IntStream boundariesAfter(int i)
Returns all segmentation boundaries after the provided index. Iteration moves forwards.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- An
IntStreamof all boundariesbsuch thatb > i - Status:
- Draft ICU 78.
-
boundariesBackFrom
IntStream boundariesBackFrom(int i)
Returns all segmentation boundaries on or before the provided index. Iteration moves backwards.The phrase "back from" is used to indicate both that: 1) boundaries are "on or before" the input index; 2) the direction of iteration is backwards (towards the beginning). "on or before" indicates that the result set is
bwhereb ≤ i, which is a weak inequality, while "before" might suggest the strict inequalityb < i.boundariesBackFromandboundariesAfter(int)create a partitioning of the space of all boundaries.- Parameters:
i- index in the inputCharSequenceto theSegmenter- Returns:
- An
IntStreamof all boundariesbsuch thatb ≤ i - Status:
- Draft ICU 78.
-
-