Why Use ICU4J?
Summary
- Fully implements current standards
- Unicode collation, normalization, break iteration
- Updated more frequently than Java
- Full CLDR Locale data
- Improved performance
Details
- Normalization
- Addresses lack of Unicode normalization support in Java 5
- Addresses outdated Unicode normalization support in Java 6
- Up-To-Date Unicode version
- Java 5 & 6 are Unicode 4.0, while ICU 4.0 is Unicode 5.1
- Characters added after Unicode 4.0 do not have character properties in Java
- IDNA and StringPrep
- Addresses lack of Internationalized Domain Name support in Java 5
- Addresses generic stringprep (RFC3454) support. stringprep is required for supporting various internet protocols (NFS, LDAP…)
- Collation
- Provides Unicode standard compliant collation support
- ICU Collator fully implements UTR#10, while the Java implementation is outdated and not compatible.
- Provides ICU UnicodeSet for easy character range validation
- much more flexible and convenient for validating identifiers/text tokens with a given syntax
- full boolean operations (union, intersection, difference)
- all Unicode properties supported
- Locales
- BCP47 (language tag) support in locale class (supporting “script”, 3-letter language codes, 3-digit region codes)
- Locale data coverage - much better, many more locales, up-to-date
- Broader charset converter coverage
- In ICU4J 4.2, also output charset selection
- Custom fallback in charset converter
- Other features missing in the JDK
- Dates:
- Many more date formats: month+day, year+month,…
- Date interval formats: “Dec 15-17, 2009”
- APIs for returning time zone transitions
- Other formatting
- Plural formatting, including units: “1 hour” / “2 hours”
- Rule based number format (“three thousand two hundred”)
- Extensive Non-Gregorian calendar support
- Transliterator (for flexible text/script transformations)
- Collation-sensitive string search
- Same data as ICU4C, allowing same behavior across programming languages
- All Unicode character properties - over 80, Java provides access to only about 10
- Thai wordbreak
- Dates:
Performance & Size
- Instantiation times are comparable
- Common instantiate and reuse model
- ICU4J and Java both use caches to limit impact
- Collation performance many times faster
- sorting: 2 to 20 times faster
- sort key generation: 1.5 to 4 times faster
- sort key length: 2/3 to 1/4 the length of Java sort keys
- Property access much faster (isLetter, isWhitespace,…)
- Can easily produce scaled-down version (removing data)
API
- Subclasses of JDK classes where possible
- Drop-in (change of import) if not
Summary
-
ICU4J is not for you if
- you have tight size constraints
- you require the Java runtime behavior
-
ICU4J is for you if
- you need full compliance with current standards
- you need current or additional locale and property data
- you need customizability
- you need features missing from Java (normalization, collation,…)
- you need better performance