Interface Segments


  • public interface Segments
    An interface that represents the segmentation results, including the APIs for iteration therein, that are yielded from passing an input CharSequence to a Segmenter.

    The segmentation results can be provided either as the segmentation boundary indices ({code int}s) or as segments, which are represented by the Segment class. In turn, the Segment object can also provide the subsequence of the original input that it represents.

    Example:

     Segmenter wordSeg =
         LocalizedSegmenter.builder()
             .setLocale(ULocale.forLanguageTag("de"))
             .setSegmentationType(SegmentationType.WORD)
             .build();
    
     Segments segments = wordSeg.segment("Das 21ste Jahrh. ist das beste.");
    
     List<CharSequence> words = segments.subSequences().collect(Collectors.toList());
     
    See Also:
    Segmenter, Segment
    Status:
    Draft ICU 78.
    • Method Detail

      • subSequences

        default Stream<CharSequence> subSequences()
        Returns a Stream of the CharSequences for all of the segments in the source sequence. Start from the beginning of the sequence and iterate forwards until the end.
        Returns:
        a Stream of all Segments in the source sequence.
        Status:
        Draft ICU 78.
      • segmentAt

        Segment segmentAt​(int i)
        Returns the segment that contains index i. Containment is inclusive of the start index and exclusive of the limit index.

        Specifically, the containing segment is defined as the segment with start s and limit l such that s ≤ i < l.

        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        A segment that either starts at or contains index i
        Throws:
        IndexOutOfBoundsException - if i is less than 0 or greater than or equal to the length of the input CharSequence to the Segmenter
        Status:
        Draft ICU 78.
      • segments

        default Stream<Segment> segments()
        Returns a Stream of all Segments in the source sequence. Start with the first and iterate forwards until the end of the sequence.

        This is equivalent to segmentsFrom(0).

        Returns:
        a Stream of all Segments in the source sequence.
        Status:
        Draft ICU 78.
      • segmentsFrom

        Stream<Segment> segmentsFrom​(int i)
        Returns a Stream of all Segments in the source sequence where all segment limits l satisfy i < l. Iteration moves forwards.

        This means that the first segment in the stream is the same as what is returned by segmentAt(i).

        The word "from" is used here to mean "at or after", with the semantics of "at" for a Segment defined by segmentAt(int)}. We cannot describe the segments all as being "after" since the first segment might contain i in the middle, meaning that in the forward direction, its start position precedes i.

        segmentsFrom and segmentsBefore(int) create a partitioning of the space of all Segments.

        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        a Stream of all Segments at or after i
        Status:
        Draft ICU 78.
      • segmentsBefore

        Stream<Segment> segmentsBefore​(int i)
        Returns a Stream of all Segments in the source sequence where all segment limits l satisfy l ≤ i. Iteration moves backwards.

        This means that the all segments in the stream come before the one that is returned by segmentAt(i). A segment is not considered to contain index i if {code i} is equal to limit l. Thus, "before" encapsulates the invariant l ≤ i.

        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        a Stream of all Segments before i
        Status:
        Draft ICU 78.
      • isBoundary

        boolean isBoundary​(int i)
        Returns whether offset i is a segmentation boundary. Throws an exception when i is not a valid index position for the source sequence.
        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        Returns whether offset i is a segmentation boundary.
        Throws:
        IllegalArgumentException - if i is less than 0 or greater than the length of the input CharSequence to the Segmenter
        Status:
        Draft ICU 78.
      • boundaries

        default IntStream boundaries()
        Returns all segmentation boundaries, starting from the beginning and moving forwards.

        Note: boundaries() != boundariesAfter(0). This difference naturally results from the strict inequality condition in boundariesAfter, and the fact that 0 is the first boundary returned from the start of an input sequence.

        Returns:
        An IntStream of all segmentation boundaries, starting at the first boundary with index 0, and moving forwards in the input sequence.
        Status:
        Draft ICU 78.
      • boundariesAfter

        IntStream boundariesAfter​(int i)
        Returns all segmentation boundaries after the provided index. Iteration moves forwards.
        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        An IntStream of all boundaries b such that b > i
        Status:
        Draft ICU 78.
      • boundariesBackFrom

        IntStream boundariesBackFrom​(int i)
        Returns all segmentation boundaries on or before the provided index. Iteration moves backwards.

        The phrase "back from" is used to indicate both that: 1) boundaries are "on or before" the input index; 2) the direction of iteration is backwards (towards the beginning). "on or before" indicates that the result set is b where b ≤ i, which is a weak inequality, while "before" might suggest the strict inequality b < i.

        boundariesBackFrom and boundariesAfter(int) create a partitioning of the space of all boundaries.

        Parameters:
        i - index in the input CharSequence to the Segmenter
        Returns:
        An IntStream of all boundaries b such that b ≤ i
        Status:
        Draft ICU 78.