Normalization filtered by a UnicodeSet. More...

#include <normalizer2.h>

Inheritance diagram for icu::FilteredNormalizer2:

Public Member Functions
	FilteredNormalizer2 (const Normalizer2 &n2, const UnicodeSet &filterSet)
	Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set. More...

	~FilteredNormalizer2 ()
	Destructor. More...

virtual UnicodeString &	normalize (const UnicodeString &src, UnicodeString &dest, UErrorCode &errorCode) const override
	Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string. More...

virtual void	normalizeUTF8 (uint32_t options, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode) const override
	Normalizes a UTF-8 string and optionally records how source substrings relate to changed and unchanged result substrings. More...

virtual UnicodeString &	normalizeSecondAndAppend (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const override
	Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string. More...

virtual UnicodeString &	append (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const override
	Appends the second string to the first string (merging them at the boundary) and returns the first string. More...

virtual UBool	getDecomposition (UChar32 c, UnicodeString &decomposition) const override
	Gets the decomposition mapping of c. More...

virtual UBool	getRawDecomposition (UChar32 c, UnicodeString &decomposition) const override
	Gets the raw decomposition mapping of c. More...

virtual UChar32	composePair (UChar32 a, UChar32 b) const override
	Performs pairwise composition of a & b and returns the composite if there is one. More...

virtual uint8_t	getCombiningClass (UChar32 c) const override
	Gets the combining class of c. More...

virtual UBool	isNormalized (const UnicodeString &s, UErrorCode &errorCode) const override
	Tests if the string is normalized. More...

virtual UBool	isNormalizedUTF8 (StringPiece s, UErrorCode &errorCode) const override
	Tests if the UTF-8 string is normalized. More...

virtual UNormalizationCheckResult	quickCheck (const UnicodeString &s, UErrorCode &errorCode) const override
	Tests if the string is normalized. More...

virtual int32_t	spanQuickCheckYes (const UnicodeString &s, UErrorCode &errorCode) const override
	Returns the end of the normalized substring of the input string. More...

virtual UBool	hasBoundaryBefore (UChar32 c) const override
	Tests if the character always has a normalization boundary before it, regardless of context. More...

virtual UBool	hasBoundaryAfter (UChar32 c) const override
	Tests if the character always has a normalization boundary after it, regardless of context. More...

virtual UBool	isInert (UChar32 c) const override
	Tests if the character is normalization-inert. More...

Public Member Functions inherited from icu::Normalizer2
	~Normalizer2 ()
	Destructor. More...

UnicodeString	normalize (const UnicodeString &src, UErrorCode &errorCode) const
	Returns the normalized form of the source string. More...

Public Member Functions inherited from icu::UObject
virtual	~UObject ()
	Destructor. More...

virtual UClassID	getDynamicClassID () const
	ICU4C "poor man's RTTI", returns a UClassID for the actual ICU class. More...

Additional Inherited Members
Static Public Member Functions inherited from icu::Normalizer2
static const Normalizer2 *	getNFCInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for Unicode NFC normalization. More...

static const Normalizer2 *	getNFDInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for Unicode NFD normalization. More...

static const Normalizer2 *	getNFKCInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for Unicode NFKC normalization. More...

static const Normalizer2 *	getNFKDInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for Unicode NFKD normalization. More...

static const Normalizer2 *	getNFKCCasefoldInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for Unicode toNFKC_Casefold() normalization which is equivalent to applying the NFKC_Casefold mappings and then NFC. More...

static const Normalizer2 *	getNFKCSimpleCasefoldInstance (UErrorCode &errorCode)
	Returns a Normalizer2 instance for a variant of Unicode toNFKC_Casefold() normalization which is equivalent to applying the NFKC_Simple_Casefold mappings and then NFC. More...

static const Normalizer2 *	getInstance (const char packageName, const char name, UNormalization2Mode mode, UErrorCode &errorCode)
	Returns a Normalizer2 instance which uses the specified data file (packageName/name similar to ucnv_openPackage() and ures_open()/ResourceBundle) and which composes or decomposes text according to the specified mode. More...

Detailed Description

Normalization filtered by a UnicodeSet.

Normalizes portions of the text contained in the filter set and leaves portions not contained in the filter set unchanged. Filtering is done via UnicodeSet::span(..., USET_SPAN_SIMPLE). Not-in-the-filter text is treated as "is normalized" and "quick check yes". This class implements all of (and only) the Normalizer2 API. An instance of this class is unmodifiable/immutable but is constructed and must be destructed by the owner.

Stable:: ICU 4.4

Definition at line 519 of file normalizer2.h.

Constructor & Destructor Documentation

◆ FilteredNormalizer2()

icu::FilteredNormalizer2::FilteredNormalizer2	(	const Normalizer2 &	n2,
		const UnicodeSet &	filterSet
	)

inline

Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set.

Both are aliased and must not be modified or deleted while this object is used. The filter set should be frozen; otherwise the performance will suffer greatly.

Parameters

n2	wrapped Normalizer2 instance
filterSet	UnicodeSet which determines the characters to be normalized

Stable:: ICU 4.4

Definition at line 531 of file normalizer2.h.

◆ ~FilteredNormalizer2()

icu::FilteredNormalizer2::~FilteredNormalizer2 ( )

Destructor.

Stable:: ICU 4.4

Member Function Documentation

◆ append()

virtual UnicodeString& icu::FilteredNormalizer2::append	(	UnicodeString &	first,
		const UnicodeString &	second,
		UErrorCode &	errorCode
	)		const

overridevirtual

Appends the second string to the first string (merging them at the boundary) and returns the first string.

The result is normalized if both the strings were normalized. The first and second strings must be different objects.

Parameters

first	string, should be normalized
second	string, should be normalized
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: first

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ composePair()

virtual UChar32 icu::FilteredNormalizer2::composePair	(	UChar32	a,
		UChar32	b
	)		const

overridevirtual

Performs pairwise composition of a & b and returns the composite if there is one.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters

a	A (normalization starter) code point.
b	Another code point.

Returns: The non-negative composite code point if there is one; otherwise a negative value.

Stable:: ICU 49

Reimplemented from icu::Normalizer2.

◆ getCombiningClass()

virtual uint8_t icu::FilteredNormalizer2::getCombiningClass ( UChar32 c ) const

overridevirtual

Gets the combining class of c.

The default implementation returns 0 but all standard implementations return the Unicode Canonical_Combining_Class value.

Parameters

c	code point

Returns: c's combining class

Stable:: ICU 49

Reimplemented from icu::Normalizer2.

◆ getDecomposition()

virtual UBool icu::FilteredNormalizer2::getDecomposition	(	UChar32	c,
		UnicodeString &	decomposition
	)		const

overridevirtual

Gets the decomposition mapping of c.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters

c	code point
decomposition	String object which will be set to c's decomposition mapping, if there is one.

Returns: true if c has a decomposition, otherwise false

Stable:: ICU 4.6

Implements icu::Normalizer2.

◆ getRawDecomposition()

virtual UBool icu::FilteredNormalizer2::getRawDecomposition	(	UChar32	c,
		UnicodeString &	decomposition
	)		const

overridevirtual

Gets the raw decomposition mapping of c.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters

c	code point
decomposition	String object which will be set to c's raw decomposition mapping, if there is one.

Returns: true if c has a decomposition, otherwise false

Stable:: ICU 49

Reimplemented from icu::Normalizer2.

◆ hasBoundaryAfter()

virtual UBool icu::FilteredNormalizer2::hasBoundaryAfter ( UChar32 c ) const

overridevirtual

Tests if the character always has a normalization boundary after it, regardless of context.

For details see the Normalizer2 base class documentation.

Parameters

c	character to test

Returns: true if c has a normalization boundary after it

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ hasBoundaryBefore()

virtual UBool icu::FilteredNormalizer2::hasBoundaryBefore ( UChar32 c ) const

overridevirtual

Tests if the character always has a normalization boundary before it, regardless of context.

For details see the Normalizer2 base class documentation.

Parameters

c	character to test

Returns: true if c has a normalization boundary before it

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ isInert()

virtual UBool icu::FilteredNormalizer2::isInert ( UChar32 c ) const

overridevirtual

Tests if the character is normalization-inert.

For details see the Normalizer2 base class documentation.

Parameters

c	character to test

Returns: true if c is normalization-inert

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ isNormalized()

virtual UBool icu::FilteredNormalizer2::isNormalized	(	const UnicodeString &	s,
		UErrorCode &	errorCode
	)		const

overridevirtual

Tests if the string is normalized.

For details see the Normalizer2 base class documentation.

Parameters

s	input string
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: true if s is normalized

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ isNormalizedUTF8()

virtual UBool icu::FilteredNormalizer2::isNormalizedUTF8	(	StringPiece	s,
		UErrorCode &	errorCode
	)		const

overridevirtual

Tests if the UTF-8 string is normalized.

Internally, in cases where the quickCheck() method would return "maybe" (which is only possible for the two COMPOSE modes) this method resolves to "yes" or "no" to provide a definitive result, at the cost of doing more work in those cases.

This works for all normalization modes. It is optimized for UTF-8 for all built-in modes except for FCD. The base class implementation converts to UTF-16 and calls isNormalized().

Parameters

s	UTF-8 input string
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: true if s is normalized

Stable:: ICU 60

Reimplemented from icu::Normalizer2.

◆ normalize()

virtual UnicodeString& icu::FilteredNormalizer2::normalize	(	const UnicodeString &	src,
		UnicodeString &	dest,
		UErrorCode &	errorCode
	)		const

overridevirtual

Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string.

The source and destination strings must be different objects.

Parameters

src	source string
dest	destination string; its contents is replaced with normalized src
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: dest

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ normalizeSecondAndAppend()

virtual UnicodeString& icu::FilteredNormalizer2::normalizeSecondAndAppend	(	UnicodeString &	first,
		const UnicodeString &	second,
		UErrorCode &	errorCode
	)		const

overridevirtual

Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string.

The result is normalized if the first string was normalized. The first and second strings must be different objects.

Parameters

first	string, should be normalized
second	string, will be normalized
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: first

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ normalizeUTF8()

virtual void icu::FilteredNormalizer2::normalizeUTF8	(	uint32_t	options,
		StringPiece	src,
		ByteSink &	sink,
		Edits *	edits,
		UErrorCode &	errorCode
	)		const

overridevirtual

Normalizes a UTF-8 string and optionally records how source substrings relate to changed and unchanged result substrings.

Implemented completely for most built-in modes except for FCD. The base class implementation converts to & from UTF-16 and does not support edits.

Parameters

options	Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src	Source UTF-8 string.
sink	A ByteSink to which the normalized UTF-8 result string is written. sink.Flush() is called at the end.
edits	Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be nullptr.
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Stable:: ICU 60

Reimplemented from icu::Normalizer2.

◆ quickCheck()

virtual UNormalizationCheckResult icu::FilteredNormalizer2::quickCheck	(	const UnicodeString &	s,
		UErrorCode &	errorCode
	)		const

overridevirtual

Tests if the string is normalized.

For details see the Normalizer2 base class documentation.

Parameters

s	input string
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: UNormalizationCheckResult

Stable:: ICU 4.4

Implements icu::Normalizer2.

◆ spanQuickCheckYes()

virtual int32_t icu::FilteredNormalizer2::spanQuickCheckYes	(	const UnicodeString &	s,
		UErrorCode &	errorCode
	)		const

overridevirtual

Returns the end of the normalized substring of the input string.

For details see the Normalizer2 base class documentation.

Parameters

s	input string
errorCode	Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)

Returns: "yes" span end index

Stable:: ICU 4.4

Implements icu::Normalizer2.

The documentation for this class was generated from the following file:

common/unicode/normalizer2.h

Public Member Functions

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ FilteredNormalizer2()

◆ ~FilteredNormalizer2()

Member Function Documentation

◆ append()

◆ composePair()

◆ getCombiningClass()

◆ getDecomposition()

◆ getRawDecomposition()

◆ hasBoundaryAfter()

◆ hasBoundaryBefore()

◆ isInert()

◆ isNormalized()

◆ isNormalizedUTF8()

◆ normalize()

◆ normalizeSecondAndAppend()

◆ normalizeUTF8()

◆ quickCheck()

◆ spanQuickCheckYes()