ICU 76.1 76.1
|
Records lengths of string edits but not replacement text. More...
#include <edits.h>
Data Structures | |
struct | Iterator |
Access to the list of edits. More... | |
Public Member Functions | |
Edits () | |
Constructs an empty object. | |
Edits (const Edits &other) | |
Copy constructor. | |
Edits (Edits &&src) noexcept | |
Move constructor, might leave src empty. | |
~Edits () | |
Destructor. | |
Edits & | operator= (const Edits &other) |
Assignment operator. | |
Edits & | operator= (Edits &&src) noexcept |
Move assignment operator, might leave src empty. | |
void | reset () noexcept |
Resets the data but may not release memory. | |
void | addUnchanged (int32_t unchangedLength) |
Adds a no-change edit: a record for an unchanged segment of text. | |
void | addReplace (int32_t oldLength, int32_t newLength) |
Adds a change edit: a record for a text replacement/insertion/deletion. | |
UBool | copyErrorTo (UErrorCode &outErrorCode) const |
Sets the UErrorCode if an error occurred while recording edits. | |
int32_t | lengthDelta () const |
How much longer is the new text compared with the old text? | |
UBool | hasChanges () const |
int32_t | numberOfChanges () const |
Iterator | getCoarseChangesIterator () const |
Returns an Iterator for coarse-grained change edits (adjacent change edits are treated as one). | |
Iterator | getCoarseIterator () const |
Returns an Iterator for coarse-grained change and no-change edits (adjacent change edits are treated as one). | |
Iterator | getFineChangesIterator () const |
Returns an Iterator for fine-grained change edits (full granularity of change edits is retained). | |
Iterator | getFineIterator () const |
Returns an Iterator for fine-grained change and no-change edits (full granularity of change edits is retained). | |
Edits & | mergeAndAppend (const Edits &ab, const Edits &bc, UErrorCode &errorCode) |
Merges the two input Edits and appends the result to this object. | |
Records lengths of string edits but not replacement text.
Supports replacements, insertions, deletions in linear progression. Does not support moving/reordering of text.
There are two types of edits: change edits and no-change edits. Add edits to instances of this class using addReplace(int32_t, int32_t)
(for change edits) and addUnchanged(int32_t)
(for no-change edits). Change edits are retained with full granularity, whereas adjacent no-change edits are always merged together. In no-change edits, there is a one-to-one mapping between code points in the source and destination strings.
After all edits have been added, instances of this class should be considered immutable, and an Edits::Iterator
can be used for queries.
There are four flavors of Edits::Iterator:
getFineIterator()
retains full granularity of change edits. getFineChangesIterator()
retains full granularity of change edits, and when calling next() on the iterator, skips over no-change edits (unchanged regions). getCoarseIterator()
treats adjacent change edits as a single edit. (Adjacent no-change edits are automatically merged during the construction phase.) getCoarseChangesIterator()
treats adjacent change edits as a single edit, and when calling next() on the iterator, skips over no-change edits (unchanged regions). For example, consider the string "abcßDeF", which case-folds to "abcssdef". This string has the following fine edits:
and the following coarse edits (note how adjacent change edits get merged together):
The "fine changes" and "coarse changes" iterators will step through only the change edits when their Edits::Iterator::next()
methods are called. They are identical to the non-change iterators when their Edits::Iterator::findSourceIndex()
or Edits::Iterator::findDestinationIndex()
methods are used to walk through the string.
For examples of how to use this class, see the test TestCaseMapEditsIteratorDocs
in UCharacterCaseTest.java.
An Edits object tracks a separate UErrorCode, but ICU string transformation functions (e.g., case mapping functions) merge any such errors into their API's UErrorCode.
|
inline |
|
inlinenoexcept |
icu::Edits::~Edits | ( | ) |
Destructor.
Adds a change edit: a record for a text replacement/insertion/deletion.
Normally called from inside ICU string transformation functions, not user code.
Adds a no-change edit: a record for an unchanged segment of text.
Normally called from inside ICU string transformation functions, not user code.
UBool icu::Edits::copyErrorTo | ( | UErrorCode & | outErrorCode | ) | const |
Sets the UErrorCode if an error occurred while recording edits.
Preserves older error codes in the outErrorCode. Normally called from inside ICU string transformation functions, not user code.
outErrorCode | Set to an error code if it does not contain one already and an error occurred while recording edits. Otherwise unchanged. |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
Merges the two input Edits and appends the result to this object.
Consider two string transformations (for example, normalization and case mapping) where each records Edits in addition to writing an output string.
Edits ab reflect how substrings of input string a map to substrings of intermediate string b.
Edits bc reflect how substrings of intermediate string b map to substrings of output string c.
This function merges ab and bc such that the additional edits recorded in this object reflect how substrings of input string a map to substrings of output string c.
If unrelated Edits are passed in where the output string of the first has a different length than the input string of the second, then a U_ILLEGAL_ARGUMENT_ERROR is reported.
ab | reflects how substrings of input string a map to substrings of intermediate string b. |
bc | reflects how substrings of intermediate string b map to substrings of output string c. |
errorCode | ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.) |
|
inline |
Move assignment operator, might leave src empty.
This object will have the same contents that the source object had. The behavior is undefined if *this and src are the same object.
src | source edits |