ICU 76

ICU is the premier library for software internationalization, used by a wide array of companies and organizations.

Release Overview

ICU 76 updates to Unicode 16 (blog), including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations.

It also updates to CLDR 46 (beta blog) locale data with new locales, significant updates to existing locales, and various additions and corrections. For example, the CLDR and Unicode default sort orders are now very nearly the same.

Most of the java.time (Temporal) types can now be formatted directly using the existing ICU4J date/time formatting classes.

There are some new APIs to make ICU easier to use with modern C++ and Java patterns. Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs, and usable on top of binary stable C APIs, which is a first for ICU.

The Java and C++ technology preview implementations of the (also in tech preview) CLDR MessageFormat 2.0 specification have been updated to match recent changes.

ICU 76 and CLDR 46 are major releases, including a new version of Unicode and major locale data improvements.

For more details, including migration issues, see below.

Please use the icu-support mailing list and/or find/submit error reports.

Version Number

The initial release has library version number 76.1.

If there are maintenance releases, they will be 76.2, 76.3, etc. (During ICU 76 development, the library version number was 76.0.x.)

Note: There may be additional commits on the maint/maint-76 branch that are not included in the prepackaged download files.

Common Changes

  • Unicode 16 (blog):
    • Adds five modern-use scripts: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar
    • Adds two historic scripts & almost 4000 additional Egyptian Hieroglyphs
    • Seven new emoji characters
    • Over 700 symbols from legacy computing environments
    • ICU line breaking improvements have been upstreamed into UAX #14
    • ICU 76 adds support for the new UCD property Modifier_Combining_Mark for UAX #53 Arabic Mark Rendering
    • ICU 76 also adds support for the UCD property Indic_Conjunct_Break which was new in Unicode 15.1. (ICU-22503)
    • IDNA: The handling of UseSTD3ASCIIRules was simplified. Some existing characters changed from disallowed (when that was only for compatibility with long-obsolete IDNA2003) to valid.
  • CLDR 46 (beta blog):
    • Significant data updates across all locales
    • Locales which are now at modern coverage level: Nigerian Pidgin, Tigrinya
    • Locales which are now at moderate coverage level: Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
    • New measurement units “night” and “light-speed”
    • Note: ICU 76 does not yet support portion-per-1e9 (aka per-billion). (See ICU-22781)
    • MessageFormat 2.0 tech preview updates
    • Language matching: Dropped the fallback mapping desired=”uk” → supported=”ru” (so that Ukrainian (uk) doesn’t fall back to Russian (ru))
    • Collation: Significant changes to the CLDR root collation (CLDR default sort order)
      • Realigned With DUCET: The order of groups of characters which sort below letters is now the same. In both sort orders, non-decimal-digit numeric characters now sort after decimal digits, and the CLDR root collation no longer tailors any currency symbols (making some of them sort like letter sequences, as in the DUCET). These changes eliminate sort order differences among almost all regular characters between the CLDR root collation and the DUCET.
      • Improved Han Radical-Stroke Order: The CLDR radical-stroke order now matches that of the Unicode Radical-Stroke Index; traditional vs. simplified forms of radicals are now distinguished on a lower level than the number of residual strokes. In alphabetic indexes for radical-stroke sort orders, only the traditional forms of radicals are now available as index characters.
  • Time zone data (tzdata) version 2024b (2024-sep). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
    • The Asia/Almaty time zone has become an alias following IANA TZ database changes.
    • CLDR added support for deprecated timezone codes by remapping: CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York, MST7MDT → America/Denver, PST8PDT → America/Los_Angeles (These IANA TZ changes were motivated by CLDR, see CLDR-17111)

ICU4C Specific Changes

  • API changes since ICU4C 75 (Markdown) / (HTML)
    • A UnicodeString can now be converted to & from UTF-16 standard string_view types (std::u16string_view, and on Windows to/from std::wstring_view) and other UTF-16 types (string literals, standard string classes). Several other member functions have been widened to accept standard UTF-16 types as well. (ICU-22843)
    • New APIs for colloquial iteration over the elements of a C++ UnicodeSet or a C USet. (ICU-22876)
      • For details and an example see the “C++ Header-Only APIs” section of the Migration Issues below.
    • New APIs for colloquial use of C++ Collator / C UCollator with standard C++ algorithms (e.g, sort) & data structures (e.g., map). (ICU-22879) (The UCollator wrappers are also C++ header-only APIs.)
    • Note: Some APIs were changed to accept a wider range of input types than before, but in the API change report they look like the old, stable signatures are removed, and like the wider signatures are added as “born stable”. For example, several UnicodeString constructors that take a raw pointer have been replaced with a signature that accepts such raw pointers but also additional input types.
    • Note: Similarly, the API change report appears to show removal+addition of certain UnicodeString::remove() and UnicodeString::removeBetween() overloads, but only the expression of one of their default parameter values has changed.
    • Many changes for more robust string and memory handling.

ICU4J Specific Changes

  • API Changes since ICU4J 75
    • Most of the java.time (Temporal) types can now be formatted directly using the existing ICU4J date/time formatting classes. (ICU-22853)
    • New APIs for colloquial iteration over the elements of a UnicodeSet. In addition to the existing ranges(), strings(), and UnicodeSet-is-an-Iterable, there is a new codePoints() (returns an Iterable), and new methods that return Streams (e.g., codePointStream() & rangeStream()). (ICU-22845)

Known Issues

  • None yet

Migration Issues

IDNA Default Option Changed to Nontransitional Processing

After all major browsers have switched to nontransitional processing, Unicode 15.1 (a year ago) changed the UTS #46 spec to declare transitional processing deprecated.

ICU 76 changes the “DEFAULT” API constants from 0 to UIDNA_NONTRANSITIONAL_TO_ASCII | UIDNA_NONTRANSITIONAL_TO_UNICODE.

ICU 76 does not change the behavior of using options value 0. (That would change the behavior of existing binaries linking with new ICU libraries.) However, when code is recompiled against a new version of ICU, and when it uses the DEFAULT constant, then it will pass these option flags into the factory method.

See ICU-22294

SimpleNumber::truncateStart() Removed

ICU 75 renamed the still-draft SimpleNumber::truncateStart() to setMaximumIntegerDigits(). ICU 76 removes the never-stable, original function. Same for the C API usnum_truncateStart(). (ICU-22900)

C++ Header-Only APIs

ICU 76 is the first version where we add what we call C++ header-only APIs. These are especially intended for users who rely on only binary stable DLL/library exports of C APIs (C++ APIs cannot be binary stable).

Please test these new APIs and let us know if you find problems — especially if you find a platform/compiler/options combination where the call site does end up calling into ICU DLL/library exports.

Remember that regular C++ APIs can be hidden by callers defining U_SHOW_CPLUSPLUS_API=0. The new header-only APIs can be separately enabled via U_SHOW_CPLUSPLUS_HEADER_API=1.

(GitHub query for U_SHOW_CPLUSPLUS_HEADER_API in public header files)

These are C++ definitions that are not exported by the ICU DLLs/libraries, are thus inlined into the calling code, and which may call ICU C APIs but not into ICU non-header-only C++ APIs.

The header-only APIs are defined in a nested header namespace. If entry point renaming is turned off (the main namespace is icu rather than icu_76 etc.), then the new U_HEADER_ONLY_NAMESPACE is icu::header.

(Link to the API proposal which introduced this concept)

For example, for iterating over the code point ranges in a USet (excluding the strings):

U_NAMESPACE_USE
using U_HEADER_NESTED_NAMESPACE::USetRanges;
LocalUSetPointer uset(uset_openPattern(u"[abcçカ🚴]", -1, &errorCode));
for (auto [start, end] : USetRanges(uset.getAlias())) {
    printf("uset.range U+%04lx..U+%04lx\n", (long)start, (long)end);
}
for (auto range : USetRanges(uset.getAlias())) {
    for (UChar32 c : range) {
        printf("uset.range.c U+%04lx\n", (long)c);
    }
}

(Implementation note: On most platforms, when compiling ICU itself, the U_HEADER_ONLY_NAMESPACE is icu::internal, so that any such symbols that get exported differ from the ones that calling code sees. On Windows, where DLL exports are explicit, the namespace is always the same, but these header-only APIs are not marked for export.)

ICU4C Platform Support

ICU4C requires C++17 and has been tested with up to C++20.

We routinely test on recent versions of Linux, macOS, and Windows.

We accept patches for other platforms.

For ICU 76, we have received a contribution to make ICU4C work again on z/OS, using a newer (clang-based) compiler. (ICU-22714 icu/pull/3008 + ICU-22916 icu/pull/3208)

Windows: The minimum supported version is Windows 7. (See How To Build And Install On Windows for more details.)

ICU4J Platform Support

ICU4J works on Java 8..21 (at least).

ICU4J should work on Android API level 21 and later but may require “library desugaring”.

Download

Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-76-rc

See the Source Code Setup page for how to download the ICU file tree directly from GitHub.

ICU locale data was generated from CLDR data equivalent to:

  • https://github.com/unicode-org/cldr/releases/tag/release-46-beta3
  • https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta3

Maven dependency: TODO

<dependency>
  <groupId>com.ibm.icu</groupId>
  <artifactId>icu4j</artifactId>
  <version>76.1</version>
</dependency>