Why Use ICU4J?

Summary

  • Fully implements current standards
    • Unicode collation, normalization, break iteration
    • Updated more frequently than Java
    • Full CLDR Locale data
  • Improved performance

Details

  • Normalization
    • Addresses lack of Unicode normalization support in Java 5
    • Addresses outdated Unicode normalization support in Java 6
  • Up-To-Date Unicode version
    • Java 5 & 6 are Unicode 4.0, while ICU 4.0 is Unicode 5.1
    • Characters added after Unicode 4.0 do not have character properties in Java
  • IDNA and StringPrep
    • Addresses lack of Internationalized Domain Name support in Java 5
    • Addresses generic stringprep (RFC3454) support. stringprep is required for supporting various internet protocols (NFS, LDAP…)
  • Collation
    • Provides Unicode standard compliant collation support
    • ICU Collator fully implements UTR#10, while the Java implementation is outdated and not compatible.
  • Provides ICU UnicodeSet for easy character range validation
    • much more flexible and convenient for validating identifiers/text tokens with a given syntax
    • full boolean operations (union, intersection, difference)
    • all Unicode properties supported
  • Locales
    • BCP47 (language tag) support in locale class (supporting “script”, 3-letter language codes, 3-digit region codes)
    • Locale data coverage - much better, many more locales, up-to-date
  • Broader charset converter coverage
    • In ICU4J 4.2, also output charset selection
    • Custom fallback in charset converter
  • Other features missing in the JDK
    • Dates:
      • Many more date formats: month+day, year+month,…
      • Date interval formats: “Dec 15-17, 2009”
      • APIs for returning time zone transitions
    • Other formatting
      • Plural formatting, including units: “1 hour” / “2 hours”
      • Rule based number format (“three thousand two hundred”)
      • Extensive Non-Gregorian calendar support
    • Transliterator (for flexible text/script transformations)
    • Collation-sensitive string search
    • Same data as ICU4C, allowing same behavior across programming languages
    • All Unicode character properties - over 80, Java provides access to only about 10
    • Thai wordbreak

Performance & Size

  • Instantiation times are comparable
    • Common instantiate and reuse model
    • ICU4J and Java both use caches to limit impact
  • Collation performance many times faster
    • sorting: 2 to 20 times faster
    • sort key generation: 1.5 to 4 times faster
    • sort key length: 2/3 to 1/4 the length of Java sort keys
  • Property access much faster (isLetter, isWhitespace,…)
  • Can easily produce scaled-down version (removing data)

API

  • Subclasses of JDK classes where possible
  • Drop-in (change of import) if not

Summary

  • ICU4J is not for you if

    • you have tight size constraints
    • you require the Java runtime behavior
  • ICU4J is for you if

    • you need full compliance with current standards
    • you need current or additional locale and property data
    • you need customizability
    • you need features missing from Java (normalization, collation,…)
    • you need better performance