Unicode Technical Standard #35

Unicode Locale Data Markup Language (LDML)
Part 9: MessageFormat

Version	49 (draft)
Editors	Eemeli Aro, Addison Phillips and other CLDR committee members

For the full header, summary, and status, see Part 1: Core.

Summary

This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages.

This is a partial document, describing only those parts of the LDML that are relevant for message format. For the other parts of the LDML see the main LDML document and the links above.

Status

This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.

A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.

Please submit corrigenda and other comments with the CLDR bug reporting form [Bugs]. Related information that is useful in understanding this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For more information see About Unicode Technical Reports and the Specifications FAQ. Unicode Technical Reports are governed by the Unicode Terms of Use.

Parts

The LDML specification is divided into the following parts:

Part 1: Core (languages, locales, basic structure)
Part 2: General (display names & transforms, etc.)
Part 3: Numbers (number & currency formatting)
Part 4: Dates (date, time, time zone formatting)
Part 5: Collation (sorting, searching, grouping)
Part 6: Supplemental (supplemental data)
Part 7: Keyboards (keyboard mappings)
Part 8: Person Names (person names)
Part 9: MessageFormat (message format)
Appendix A: Modifications
Appendix B: Acknowledgments

Contents of Part 9, MessageFormat

Introduction
Syntax
message.abnf
Formatting
Errors
Default Functions
Unicode Namespace
- Unicode Namespace Options
  - u:id
  - u:dir
Interchange Data Model
Appendices

Introduction

One of the challenges in adapting software to work for users with different languages and cultures is the need for dynamic messages. Whenever a user interface needs to present data as part of a larger string, that data needs to be formatted (and the message may need to be altered) to make it culturally accepted and grammatically correct.

For example, if your US English (en-US) interface has a message like:

Your item had 1,023 views on April 3, 2023

You want the translated message to be appropriately formatted into French:

Votre article a eu 1 023 vues le 3 avril 2023

Or Japanese:

あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。

This specification defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages and APIs. This will enable the integration of existing internationalization APIs (such as the date and number formats shown above), grammatical matching (such as plurals or genders), as well as user-defined formats and message selectors.

The document is the successor to ICU MessageFormat.

Conformance

Everything in this specification is normative except for: sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Terminology and Conventions

A term looks like this when it is defined in this specification.

A reference to a term looks like this.

Examples are non-normative and styled like this.

Important

Text marked "Important" like this are normative.

Note

Notes are non-normative.

Stability Policy

Updates to this specification will not make any valid message become not valid.

Updates to this specification will not specify an error for any message that previously did not specify an error.

Updates to this specification will not specify the use of a fallback value for any message that previously did not specify a fallback value.

Updates to this specification will not change the syntactical meaning of any syntax defined in this specification.

Updates to this specification will not remove any default functions.

Updates to this specification will not remove any options or option values defined for default functions.

Important

Functions that are not marked Draft are Stable and subject to the provisions of this stability policy.

Functions or options marked as Draft are not stable. Their name, operands, and options/option values, and other requirements might change or be removed before being declared Stable in a future release.

Note

The foregoing policies are not a guarantee that the results of formatting will never change. Even when this specification or its implementation do not change, the function handlers for date formatting, number formatting and so on can change their results over time or behave differently due to local runtime differences in implementation or changes to locale data (such as due to the release of new CLDR versions).

Updates to this specification will only reserve, define, or require identifiers which are reserved identifiers.

Future versions of this specification will not introduce changes to the data model that would result in a data model representation based on this version being invalid.

For example, existing interfaces or fields will not be removed.

Important

This stability policy allows any of the following, non-exhaustive list, of changes in future versions of this specification:

Future versions may define new syntax and structures that would not be supported by this version of the specification.
Future versions may add additional structure or meaning to existing syntax.
Future versions may define new keywords.
Future versions may make previously invalid messages valid.
Future versions may define additional default functions. or may reserve the names of functions for the purposes of interoperability.
Future versions may define additional options to existing functions.
Future versions may define additional option values for existing options.
Future versions may deprecate (but not remove) keywords, functions, options, or option values.
Future versions of this specification may introduce changes to the data model that would result in future data model representations not being valid for implementations of this version of the data model.
- For example, a future version could introduce a new keyword, whose data model representation would be a new interface that is not recognized by this version's data model.

Syntax

This section defines the formal grammar describing the syntax of a single message.

Design Goals

This section is non-normative.

The design goals of the syntax specification are as follows:

The syntax should leverage the familiarity with ICU MessageFormat 1.0 in order to lower the barrier to entry and increase the chance of adoption. At the same time, the syntax should fix the pain points of ICU MessageFormat 1.0.
- Non-Goal: Be backwards-compatible with the ICU MessageFormat 1.0 syntax.
The syntax inside translatable content should be easy to understand for humans. This includes making it clear which parts of the message body are translatable content, which parts inside it are placeholders for expressions, as well as making the selection logic predictable and easy to reason about.
- Non-Goal: Make the syntax intuitive enough for non-technical translators to hand-edit. Instead, we assume that most translators will work with MessageFormat by means of GUI tooling, CAT workbenches etc.
The syntax surrounding translatable content should be easy to write and edit for developers, localization engineers, and easy to parse by machines.
The syntax should make a single message easily embeddable inside many container formats: .properties, YAML, XML, inlined as string literals in programming languages, etc. This includes a future MessageResource specification.
- Non-Goal: Support unnecessary escape sequences, which would theirselves require additional escaping when embedded. Instead, we tolerate direct use of nearly all characters (including line breaks, control characters, etc.) and rely upon escaping in those outer formats to aid human comprehension (e.g., depending upon container format, a U+000A LINE FEED might be represented as \n, \012, \x0A, \u000A, \U0000000A, 
  , &NewLine;, %0A, <LF>, or something else entirely).

Design Restrictions

This section is non-normative.

The syntax specification takes into account the following design restrictions:

Whitespace outside the translatable content should be insignificant. It should be possible to define a message entirely on a single line with no ambiguity, as well as to format it over multiple lines for clarity.
The syntax should define as few special characters and sigils as possible. Note that this necessitates extra care when presenting messages for human consumption, because they may contain invisible characters such as U+200B ZERO WIDTH SPACE, control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD), unassigned code points, unpaired surrogates (U+D800 through U+DFFF), and other potentially confusing content.

Messages and their Syntax

The purpose of MessageFormat is to allow content to vary at runtime. This variation might be due to placing a value into the content or it might be due to selecting a different bit of content based on some data value or it might be due to a combination of the two.

MessageFormat calls the template for a given formatting operation a message.

The values passed in at runtime (which are to be placed into the content or used to select between different content items) are called external variables. The author of a message can also assign local variables, including variables that modify external variables.

This part of the MessageFormat specification defines the syntax for a message, along with the concepts and terminology needed when processing a message during the formatting of a message at runtime.

The complete formal syntax of a message is described by the ABNF.

Well-formed vs. Valid Messages

A message is well-formed if it satisfies all the rules of the grammar. Attempting to parse a message that is not well-formed will result in a Syntax Error.

A message is valid if it is well-formed and also meets the additional content restrictions and semantic requirements about its structure defined below for declarations, matcher, and options. Attempting to parse a message that is not valid will result in a Data Model Error.

The Message

A message is the complete template for a specific message formatting request.

A variable is a name associated to a resolved value.

An external variable is a variable whose name and initial value are supplied by the caller to MessageFormat or available in the formatting context. Only an external variable can appear as an operand in an input declaration.

A local variable is a variable created as the result of a local declaration.

Note

This syntax is designed to be embeddable into many different programming languages and formats. As such, it avoids constructs, such as character escapes, that are specific to any given file format or processor. In particular, it avoids using quote characters common to many file formats and formal languages so that these do not need to be escaped in the body of a message.

Note

Text and quoted literals allow unpaired surrogate code points (U+D800 to U+DFFF). This is for compatibility with formats or data structures that use the UTF-16 encoding and do not check for unpaired surrogates. (Strings in Java or JavaScript are examples of this.) Unpaired surrogate code points are likely an indication of mistakes or errors in the creation, serialization, or processing of the message. Many processes will convert them to � U+FFFD REPLACEMENT CHARACTER during processing or display. Implementations not based on UTF-16 might not be able to represent a message containing such code points.

Note

In general (and except where required by the syntax), whitespace carries no meaning in the structure of a message. While many of the examples in this spec are written on multiple lines, the formatting shown is primarily for readability.

Example This message:
.local $foo   =   { |horse| }
{{You have a {$foo}!}}
Can also be written as:
.local $foo={|horse|}{{You have a {$foo}!}}
An exception to this is: whitespace inside a pattern is always significant.

Note

The MessageFormat syntax assumes that each message will be displayed with a left-to-right display order and be processed in the logical character order. The syntax permits the use of right-to-left characters in identifiers, literals, and other values. This can result in confusion when viewing the message or users might incorrectly insert bidi controls or marks that negatively affect the output of the message.

To assist with this, the syntax permits the use of various controls and strongly-directional markers in both optional and required whitespace in a message, as well was encouraging the use of isolating controls with expressions and quoted patterns. See: whitespace (below) for more information.

A message can be a simple message or it can be a complex message.

message = simple-message / complex-message

A simple message contains a single pattern, with restrictions on its first non-whitespace character. An empty string is a valid simple message.

Whitespace at the start or end of a simple message is significant, and a part of the text of the message.

simple-message = o [simple-start pattern]
simple-start   = simple-start-char / escaped-char / placeholder

A complex message is any message that contains declarations, a matcher, or both. A complex message always begins with either a keyword that has a . prefix or a quoted pattern and consists of:

an optional list of declarations, followed by
a complex body

Whitespace at the start or end of a complex message is not significant, and does not affect the processing of the message.

complex-message = o *(declaration o) complex-body o

Declarations

A declaration binds a variable identifier to a value within the scope of a message. This variable can then be used in other expressions within the same message. Declarations are optional: many messages will not contain any declarations.

An input-declaration binds a variable to an external input value. The variable-expression of an input-declaration MAY include a function that is applied to the external value.

A local-declaration binds a variable to the resolved value of an expression.

declaration       = input-declaration / local-declaration
input-declaration = input o variable-expression
local-declaration = local s variable o "=" o expression

Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid and will produce a Duplicate Declaration error during processing:

A declaration MUST NOT bind a variable that appears as a variable anywhere within a previous declaration.
An input-declaration MUST NOT bind a variable that appears anywhere within the function of its variable-expression.
A local-declaration MUST NOT bind a variable that appears in its expression.

A local-declaration MAY overwrite an external input value as long as the external input value does not appear in a previous declaration.

Note

These restrictions only apply to declarations. A placeholder can apply a different function to a variable than one applied to the same variable named in a declaration. For example, this message is valid:

.input {$var :number maximumFractionDigits=0}
.local $var2 = {$var :number maximumFractionDigits=2}
.match $var2
0 {{The selector can apply a different function to {$var} for the purposes of selection}}
* {{A placeholder in a pattern can apply a different function to {$var :number maximumFractionDigits=3}}}

(See the Errors section for examples of invalid messages)

Complex Body

The complex body of a complex message is the part that will be formatted. The complex body consists of either a quoted pattern or a matcher.

complex-body = quoted-pattern / matcher

Pattern

A pattern contains a sequence of text and placeholders to be formatted as a unit. Unless there is an error, resolving a message always results in the formatting of a single pattern.

pattern = *(text-char / escaped-char / placeholder)

A pattern MAY be empty.

A pattern MAY contain an arbitrary number of placeholders to be evaluated during the formatting process.

Quoted Pattern

A quoted pattern is a pattern that is "quoted" to prevent interference with other parts of the message. A quoted pattern starts with a sequence of two U+007B LEFT CURLY BRACKET {{ and ends with a sequence of two U+007D RIGHT CURLY BRACKET }}.

quoted-pattern = "{{" pattern "}}"

A quoted pattern MAY be empty.

An empty quoted pattern:
{{}}

Text

text is the translateable content of a pattern. Any Unicode code point is allowed, except for U+0000 NULL.

The characters U+005C REVERSE SOLIDUS \, U+007B LEFT CURLY BRACKET {, and U+007D RIGHT CURLY BRACKET } MUST be escaped as \\, \{, and \} respectively.

In the ABNF, text is represented by non-empty sequences of simple-start-char, text-char, escaped-char, and s. The production simple-start-char represents the first non-whitespace in a simple message and matches text-char except for not allowing U+002E FULL STOP ..

Whitespace in text, including tabs, spaces, and newlines is significant and MUST be preserved during formatting.

simple-start-char = %x01-08        ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
                  / %x0B-0C        ; omit CR (%x0D)
                  / %x0E-1F        ; omit SP (%x20)
                  / %x21-2D        ; omit . (%x2E)
                  / %x2F-5B        ; omit \ (%x5C)
                  / %x5D-7A        ; omit { (%x7B)
                  / %x7C           ; omit } (%x7D)
                  / %x7E-2FFF      ; omit IDEOGRAPHIC SPACE (%x3000)
                  / %x3001-10FFFF
text-char         = %x01-5B        ; omit NULL (%x00) and \ (%x5C)
                  / %x5D-7A        ; omit { (%x7B)
                  / %x7C           ; omit } (%x7D)
                  / %x7E-10FFFF
quoted-char       = %x01-5B        ; omit NULL (%x00) and \ (%x5C)
                  / %x5D-7B        ; omit | (%x7C)
                  / %x7D-10FFFF

Note

Unpaired surrogate code points (U+D800 through U+DFFF inclusive) are allowed for compatibility with UTF-16 based implementations that do not check for this encoding error.

When a pattern is quoted by embedding the pattern in curly brackets, the resulting message can be embedded into various formats regardless of the container's whitespace trimming rules. Otherwise, care must be taken to ensure that pattern-significant whitespace is preserved.

Example In a Java .properties file, the values hello and hello2 both contain an identical message which consists of a single pattern. This pattern consists of text with exactly three spaces before and after the word "Hello":
hello = {{   Hello   }}
hello2=\   Hello  \

Placeholder

A placeholder is an expression or markup that appears inside of a pattern and which will be replaced during the formatting of a message.

placeholder = expression / markup

Matcher

A matcher is the complex body of a message that allows runtime selection of the pattern to use for formatting. This allows the form or content of a message to vary based on values determined at runtime.

A matcher consists of the keyword .match followed by at least one selector and at least one variant.

When the matcher is processed, the result will be a single pattern that serves as the template for the formatting process.

A message can only be considered valid if the following requirements are satisfied; otherwise, a corresponding Data Model Error will be produced during processing:

Variant Key Mismatch: The number of keys on each variant MUST be equal to the number of selectors.
Missing Fallback Variant: At least one variant MUST exist whose keys are all equal to the "catch-all" key *.
Missing Selector Annotation: Each selector MUST be a variable that directly or indirectly references a declaration with a function.
Duplicate Variant: Each variant MUST use a list of keys that is unique from that of all other variants in the message. Literal keys are compared by their string values, not their syntactical appearance.

matcher         = match-statement s variant *(o variant)
match-statement = match 1*(s selector)

A message with a matcher:

.input {$count :number}
.match $count
one {{You have {$count} notification.}}
*   {{You have {$count} notifications.}}

A message containing a matcher formatted on a single line:
.local $os = {:platform} .match $os windows {{Settings}} * {{Preferences}}

Selector

A selector is a variable whose resolved value ranks or excludes the variants based on the value of the corresponding key in each variant. The combination of selectors in a matcher thus determines which pattern will be used during formatting.

selector = variable

There MUST be at least one selector in a matcher. There MAY be any number of additional selectors.

A message with a single selector that uses a custom function :ns:hasCase which is a selector that allows the message to choose a pattern based on grammatical case:
.local $hasCase = {$userName :ns:hasCase}
.match $hasCase
vocative {{Hello, {$userName :ns:person case=vocative}!}}
accusative {{Please welcome {$userName :ns:person case=accusative}!}}
* {{Hello!}}

A message with two selectors:

.input {$numLikes :integer}
.input {$numShares :integer}
.match $numLikes $numShares
0   0   {{Your item has no likes and has not been shared.}}
0   one {{Your item has no likes and has been shared {$numShares} time.}}
0   *   {{Your item has no likes and has been shared {$numShares} times.}}
one 0   {{Your item has {$numLikes} like and has not been shared.}}
one one {{Your item has {$numLikes} like and has been shared {$numShares} time.}}
one *   {{Your item has {$numLikes} like and has been shared {$numShares} times.}}
*   0   {{Your item has {$numLikes} likes and has not been shared.}}
*   one {{Your item has {$numLikes} likes and has been shared {$numShares} time.}}
*   *   {{Your item has {$numLikes} likes and has been shared {$numShares} times.}}

Variant

A variant is a quoted pattern associated with a list of keys in a matcher. Each variant MUST begin with a sequence of keys, and terminate with a valid quoted pattern. The number of keys in each variant MUST match the number of selectors in the matcher.

Each key is separated from each other by whitespace. Whitespace is permitted but not required between the last key and the quoted pattern.

variant = key *(s key) o quoted-pattern
key     = literal / "*"

Key

A key is a value in a variant for use by a selector when ranking or excluding variants during the matcher process. A key can be either a literal value or the "catch-all" key *.

The catch-all key is a special key, represented by *, that matches all values for a given selector.

Note

To represent a key consisting of the character * U+002A ASTERISK, use a quoted literal:

.input {$value :string}
.match $value
|*| {{Matches the string *}}
*   {{Matches any other string}}

The value of each literal key MUST be treated as if it were in Unicode Normalization Form C ("NFC"). Two literal keys are considered equal if their string values are canonically equivalent strings, that is, if they consist of the same sequence of Unicode code points after Unicode Normalization Form C has been applied to both.

Expressions

An expression is a part of a message that will be determined during the message's formatting.

An expression MUST begin with U+007B LEFT CURLY BRACKET { and end with U+007D RIGHT CURLY BRACKET }. An expression MUST NOT be empty. An expression cannot contain another expression. An expression MAY contain one more attributes.

A literal-expression contains a literal, optionally followed by a function.

A variable-expression contains a variable, optionally followed by a function.

A function-expression contains a function without an operand.

expression          = literal-expression
                    / variable-expression
                    / function-expression
literal-expression  = "{" o literal [s function] *(s attribute) o "}"
variable-expression = "{" o variable [s function] *(s attribute) o "}"
function-expression = "{" o function *(s attribute) o "}"

There are several types of expression that can appear in a message. All expressions share a common syntax. The types of expression are:

The value of a local-declaration
A kind of placeholder in a pattern

Additionally, an input-declaration can contain a variable-expression.

Examples of different types of expression

Declarations:

.input {$x :ns:func option=value}
.local $y = {|This is an expression|}

Placeholders:

This placeholder contains a literal expression: {|literal|}
This placeholder contains a variable expression: {$variable}
This placeholder references a function on a variable: {$variable :ns:func with=options}
This placeholder contains a function expression with a variable-valued option: {:ns:func option=$variable}

Operand

An operand is the literal of a literal-expression or the variable of a variable-expression.

Function

A function is named functionality in an expression. Functions are used to evaluate, format, select, or otherwise process data values during formatting.

A function can appear in an expression by itself or following a single operand. When following an operand, the operand serves as input to the function.

The resolution of a function relies on an implementation-defined function handler. Some functions can be used both as a selector as well as in a placeholder; others are only valid in one of these positions. Functions also differ in their requirements on the operand and options that they accept. See Function Resolution and Default Functions for more information.

A function starts with a prefix sigil : followed by an identifier. The identifier MAY be followed by one or more options. Options are not required.

function = ":" identifier *(s option)

A message with a function operating on the variable $now:
It is now {$now :datetime}.

Options

An option is a key-value pair containing a named argument that is passed to a function.

An option has an identifier and an option value. The identifier is separated from the option value by an U+003D EQUALS SIGN = along with optional whitespace. The option value can be either a literal or a variable.

Multiple options are permitted in a function. Options are separated from the preceding function identifier and from each other by whitespace. Each option's identifier MUST be unique within the function: a function with duplicate option identifiers is not valid and will produce a Duplicate Option Name error during processing.

The order of options is not significant.

option = identifier o "=" o (literal / variable)

Examples of functions with options

A message using the :date function. The option length has the literal long as its value:
Today is {$now :date length=long}!

A message using the :date function. The option length has a variable $dateLength as its value:
Today is {$now :date length=$dateLength}!

Markup

Markup placeholders are pattern parts that can be used to represent non-language parts of a message, such as inline elements or styling that should apply to a span of parts.

Markup MUST begin with U+007B LEFT CURLY BRACKET { and end with U+007D RIGHT CURLY BRACKET }. Markup MAY contain one more attributes.

Markup comes in three forms:

Markup-open starts with U+0023 NUMBER SIGN # and represents an opening element within the message, such as markup used to start a span. It MAY include options.

Markup-standalone starts with U+0023 NUMBER SIGN # and has a U+002F SOLIDUS / immediately before its closing } representing a self-closing or standalone element within the message. It MAY include options.

Markup-close starts with U+002F SOLIDUS / and is a pattern part ending a span.

markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}"  ; open and standalone
       / "{" o "/" identifier *(s option) *(s attribute) o "}"  ; close

A message with one button markup span and a standalone img markup element:
{#button}Submit{/button} or {#img alt=Cancel src=|../cancel.jpg| /}.

A message containing markup that uses options to pair two closing markup placeholders to the one open markup placeholder:
{#ansi attr=|bold,italic|}Bold and italic{/ansi attr=bold} italic only {/ansi attr=italic} no formatting.}

A markup-open can appear without a corresponding markup-close. A markup-close can appear without a corresponding markup-open. Markup placeholders can appear in any order without making the message invalid. However, specifications or implementations defining markup might impose requirements on the pairing, ordering, or contents of markup during formatting.

Attributes

An attribute is an identifier with an optional value that appears in an expression or in markup. During formatting, attributes have no effect, and they can be treated as code comments.

Attributes are prefixed by a U+0040 COMMERCIAL AT @ sign, followed by an identifier. An attribute MAY have a literal value which is separated from the identifier by an U+003D EQUALS SIGN = along with optional whitespace.

Multiple attributes are permitted in an expression or markup. Each attribute is separated by whitespace.

Each attribute's identifier SHOULD be unique within the expression or markup: all but the last attribute with the same identifier are ignored. The order of attributes is not otherwise significant.

attribute = "@" identifier [o "=" o literal]

Examples of expressions and markup with attributes:

A message including a literal that should not be translated:
In French, "{|bonjour| @translate=no}" is a greeting
A message with markup that can be copied:
Have a {#span @can-copy}great and wonderful{/span @can-copy} birthday!

Other Syntax Elements

This section defines common elements used to construct messages.

Keywords

A keyword is a reserved token that has a unique meaning in the message syntax.

The following three keywords are defined: .input, .local, and .match. Keywords are always lowercase and start with U+002E FULL STOP ..

input = %s".input"
local = %s".local"
match = %s".match"

Literals

A literal is a character sequence that appears outside of text in various parts of a message. A literal can appear as a key value, as the operand of a literal-expression, or as an option value. A literal MAY include any Unicode code point except for U+0000 NULL.

All code points are preserved.

Important

Most text, including that produced by common keyboards and input methods, is already encoded in the canonical form known as Unicode Normalization Form C ("NFC"). A few languages, legacy character encoding conversions, or operating environments can result in literal values that are not in this form. Some uses of literals in MessageFormat, notably as the value of keys, apply NFC to the literal value during processing or comparison. While there is no requirement that the literal value actually be entered in a normalized form, users are cautioned to employ the same character sequences for equivalent values and, whenever possible, ensure literals are in NFC.

A quoted literal begins and ends with U+005E VERTICAL BAR |. The characters \ and | within a _quoted literal_ MUST be escaped as \\ and \|.

Note

Unpaired surrogate code points (U+D800 through U+DFFF inclusive) are allowed in quoted literals for compatibility with UTF-16 based implementations that do not check for this encoding error.

An unquoted literal is a literal that does not require the | quotes around it to be distinct from the rest of the message syntax. An unquoted literal MAY be used when the string value of the literal matches the unquoted-literal production. It will thus contain no whitespace (nor certain other characters). Implementations MUST NOT distinguish between quoted literals and unquoted literals that have the same sequence of code points.

Unquoted literals can contain any characters also valid in name, less name's additional restrictions on the first character.

literal          = quoted-literal / unquoted-literal
quoted-literal   = "|" *(quoted-char / escaped-char) "|"
unquoted-literal = 1*name-char

The string value of a literal for unquoted literals is the text content of that literal; or for quoted literals, the text content of that literal after removing the enclosing | characters then unescaping any escaped characters.

Names and Identifiers

A name is a character sequence used in an identifier or as the name for a variable or the value of an unquoted literal.

A name can be preceded or followed by bidirectional marks or isolating controls to aid in presenting names that contain right-to-left or neutral characters. These characters are not part of the value of the name and MUST be treated as if they were not present when matching name or identifier strings or unquoted literal values.

Variable names are prefixed with $.

Two names are considered equal if they are canonically equivalent strings, that is, if they consist of the same sequence of Unicode code points after Unicode Normalization Form C ("NFC") has been applied to both.

The names are immutable identifiers.

Note

Implementations are not required to normalize all names. Comparisons of name values only need be done "as-if" normalization has occured. Since most text in the wild is already in NFC and since checking for NFC is fast and efficient, implementations can often substitute checking for actually applying normalization to name values.

Note

External variables can be passed in that are not valid names. Such variables cannot be referenced in a message, but are not otherwise errors.

An identifier is a character sequence that identifies a function, markup, or option. Each identifier consists of a name optionally preceeded by a namespace. When present, the namespace is separated from the name by a U+003A COLON :. Built-in functions and their options do not have a namespace identifier.

The namespace u (U+0075 LATIN SMALL LETTER U) is reserved for future standardization.

Function identifiers are prefixed with :. Markup identifiers are prefixed with # or /. Option identifiers have no prefix.

Examples:

A variable:
This has a {$variable}
A default function:
This has an {42 :integer}
A function from the ns namespace:
This has a {:ns:function}
Options with and without a namespace:
This has {:ns:function option=value ns:option=value}

Support for namespaces and their interpretation is implementation-defined in this release.

variable   = "$" name
option     = identifier o "=" o (literal / variable)

identifier = [namespace ":"] name
namespace  = name
name       = [bidi] name-start *name-char [bidi]
name-start = ALPHA
                                    ;          omit Cc: %x0-1F, Whitespace: « », Ascii: «!"#$%&'()*»
                  / %x2B            ; «+»      omit Ascii: «,-./0123456789:;<=>?@» «[\]^»
                  / %x5F            ; «_»      omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~»
                  / %xA1-61B        ;          omit BidiControl: %x61C
                  / %x61D-167F      ;          omit Whitespace: %x1680
                  / %x1681-1FFF     ;          omit Whitespace: %x2000-200A
                  / %x200B-200D     ;          omit BidiControl: %x200E-200F
                  / %x2010-2027     ;          omit Whitespace: %x2028-2029 %x202F, BidiControl: %x202A-202E
                  / %x2030-205E     ;          omit Whitespace: %x205F
                  / %x2060-2065     ;          omit BidiControl: %x2066-2069
                  / %x206A-2FFF     ;          omit Whitespace: %x3000
                  / %x3001-D7FF     ;          omit Cs: %xD800-DFFF
                  / %xE000-FDCF     ;          omit NChar: %xFDD0-FDEF
                  / %xFDF0-FFFD     ;          omit NChar: %xFFFE-FFFF
                  / %x10000-1FFFD   ;          omit NChar: %x1FFFE-1FFFF
                  / %x20000-2FFFD   ;          omit NChar: %x2FFFE-2FFFF
                  / %x30000-3FFFD   ;          omit NChar: %x3FFFE-3FFFF
                  / %x40000-4FFFD   ;          omit NChar: %x4FFFE-4FFFF
                  / %x50000-5FFFD   ;          omit NChar: %x5FFFE-5FFFF
                  / %x60000-6FFFD   ;          omit NChar: %x6FFFE-6FFFF
                  / %x70000-7FFFD   ;          omit NChar: %x7FFFE-7FFFF
                  / %x80000-8FFFD   ;          omit NChar: %x8FFFE-8FFFF
                  / %x90000-9FFFD   ;          omit NChar: %x9FFFE-9FFFF
                  / %xA0000-AFFFD   ;          omit NChar: %xAFFFE-AFFFF
                  / %xB0000-BFFFD   ;          omit NChar: %xBFFFE-BFFFF
                  / %xC0000-CFFFD   ;          omit NChar: %xCFFFE-CFFFF
                  / %xD0000-DFFFD   ;          omit NChar: %xDFFFE-DFFFF
                  / %xE0000-EFFFD   ;          omit NChar: %xEFFFE-EFFFF
                  / %xF0000-FFFFD   ;          omit NChar: %xFFFFE-FFFFF
                  / %x100000-10FFFD ;          omit NChar: %x10FFFE-10FFFF
name-char  = name-start / DIGIT / "-" / "."

Note

Syntactically, the definitions of identifier and name-char provide backwards compatibility over time by allowing a stable, wide range of characters. So when there is a new character in a version of Unicode, it can be used in any conformant implementation of MessageFormat. The definition currently excludes:

Most ASCII except for letters and characters used for numbers
- This avoids conflicts with syntax characters, and reserves some characters for future syntax.
Bidirectional controls (Bidi_C)
Control characters (GC=Cc, but not Format characters: GC=Cf)
Whitespace characters (WSpace)
Surrogate code points (GC=Cs)
Non-Characters (NChar)

A reserved identifier is one that satisfies the following conditions:

Includes no namespace or uses a namespace consisting of a single letter in the ranges a-z and A-Z.
Has a name that matches the following ABNF:

reserved-identifier = ALPHA *[ALPHA / DIGIT / "." / "-" / "_"]

A custom identifier is any identifier that is not a reserved identifier.

Note

Choose a custom identifier for any functions, markup, or attributes not defined by this specification. Use a namespace in a custom identifier to identify a function that is not a default function or when defining a custom option for a default function.

Variable names are encouraged to use reserved identifiers. Option names for custom functions are encouraged to use reserved identifiers.

The syntax allows a wide range of characters in names and identifiers. Implementers and authors of functions and messages, including functions, options, and variables, SHOULD avoid creating names that could produce confusion or harm usability by choosing names consistent with the following guidelines. MessageFormat tools, such as linters, SHOULD warn when names chosen by users violate these constraints.

Unicode Default Identifier Syntax

Unicode General Security Profile for Identifiers

Escape Sequences

An escape sequence is a two-character sequence starting with U+005C REVERSE SOLIDUS \.

An escape sequence allows the appearance of lexically meaningful characters in the body of text or quoted literal sequences. Each escape sequence represents the literal character immediately following the initial \.

escaped-char = backslash ( backslash / "{" / "|" / "}" )
backslash    = %x5C ; U+005C REVERSE SOLIDUS "\"

Note

The escaped-char rule allows escaping some characters in places where they do not need to be escaped, such as braces in a quoted literal. For example, |foo {bar}| and |foo \{bar\}| are synonymous.

When writing or generating a message, escape sequences SHOULD NOT be used unless required by the syntax. That is, inside literals only escape | and inside patterns only escape { and }.

Whitespace

Outside of the text parts of patterns and outside of quoted literals the syntax limits whitespace characters to the following: U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (new line), U+000D CARRIAGE RETURN, U+3000 IDEOGRAPHIC SPACE, or U+0020 SPACE.

In the text parts of patterns and in quoted literals, whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax.

There are two whitespace productions in the syntax. Optional whitespace is whitespace that is not required by the syntax, but which users might want to include to increase the readability of a message. Required whitespace is whitespace that is required by the syntax.

Both types of whitespace optionally permit the use of the bidirectional isolate controls and certain strongly directional marks. These can assist users in presenting messages that contain right-to-left text, literals, or names (including those for functions, options, option values, and keys)

Messages that contain right-to-left (aka RTL) characters SHOULD use one of the following mechanisms to make messages display intelligibly in plain-text editors:

Use paired isolating bidi controls U+2066 LEFT-TO-RIGHT ISOLATE ("LRI") and U+2069 POP DIRECTIONAL ISOLATE ("PDI") as permitted by the ABNF around parts of any message containing RTL characters:
- inside of placeholder markers { and }
- outside quoted-pattern markers {{ and }}
- outside of variable, function, markup, or attribute, including the identifying sigil (e.g. <LRI>$var</PDI> or <LRI>:ns:name</PDI>)
Use the 'local-effect' bidi marks U+061C ARABIC LETTER MARK, U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK as permitted by the ABNF before or after identifiers, names, unquoted literals, or option values, especially when the values contain a mix of neutral, weakly directional, and strongly directional characters.

Important

Always take care not to add bidirectional controls or marks where they would be semantically significant or where they would unintentionally become part of the message's output:

do not put them inside of a literal except when they are part of the value, (instead put them outside of literal quotes, such as <LRM>|...|<LRM>)
do not put them inside quoted patterns except when they are part of the text, (instead put them outside of quoted patterns, such as <LRI>{{...}}<PDI>)
do not put them outside placeholders, (instead put them inside the placeholder, such as {<LRI>$foo :number<PDI>})

Controls placed inside literal quotes or quoted patterns are part of the literal or pattern. Controls in a pattern will appear in the output of the message. Controls inside literal quotes are part of the literal and will be considered in operations such as matching a key to a selector.

Note

Users cannot be expected to create or manage bidirectional controls or marks in messages, since the characters are invisible and can be difficult to manage. Tools (such as resource editors or translation editors) and other implementations of MessageFormat serialization are strongly encouraged to provide paired isolates around any right-to-left syntax as described above so that messages display appropriately as plain text.

These definitions of whitespace implement UAX#31 Requirement R3a-2. It is a profile of R3a-1 in that specification because:

The following pattern whitespace characters are not allowed: U+000B FORM FEED, U+000C VERTICAL TABULATION, U+0085 NEXT LINE, U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR.
The character U+3000 IDEOGRAPHIC SPACE is interpreted as whitespace.
The following directional marks and isolates are treated as ignorable format controls: U+061C ARABIC LETTER MARK, U+200E LEFT-TO-RIGHT MARK, U+200F RIGHT-TO-LEFT MARK, U+2066 LEFT-TO-RIGHT ISOLATE, U+2067 RIGHT-TO-LEFT ISOLATE, U+2068 FIRST STRONG ISOLATE, and U+2069 POP DIRECTIONAL ISOLATE. (The character U+061C is an addition according to R3a.)

Note

The character U+3000 IDEOGRAPHIC SPACE is included in whitespace for compatibility with certain East Asian keyboards and input methods, in which users might accidentally create these characters in a message.

; Required whitespace
s = *bidi ws o

; Optional whitespace
o = *(s / bidi)

; Bidirectional marks and isolates
; ALM / LRM / RLM / LRI, RLI, FSI & PDI
bidi = %x061C / %x200E / %x200F / %x2066-2069

; Whitespace characters
ws = SP / HTAB / CR / LF / %x3000

Complete ABNF

The grammar is formally defined in message.abnf using the ABNF notation [STD68], including the modifications found in RFC 7405.

RFC7405 defines a variation of ABNF that is case-sensitive. Some ABNF tools are only compatible with the specification found in RFC 5234. To make message.abnf compatible with that version of ABNF, replace the rules of the same name with this block:

input = %x2E.69.6E.70.75.74  ; ".input"
local = %x2E.6C.6F.63.61.6C  ; ".local"
match = %x2E.6D.61.74.63.68  ; ".match"

message.abnf

message           = simple-message / complex-message

simple-message    = o [simple-start pattern]
simple-start      = simple-start-char / escaped-char / placeholder
pattern           = *(text-char / escaped-char / placeholder)
placeholder       = expression / markup

complex-message   = o *(declaration o) complex-body o
declaration       = input-declaration / local-declaration
complex-body      = quoted-pattern / matcher

input-declaration = input o variable-expression
local-declaration = local s variable o "=" o expression

quoted-pattern    = "{{" pattern "}}"

matcher           = match-statement s variant *(o variant)
match-statement   = match 1*(s selector)
selector          = variable
variant           = key *(s key) o quoted-pattern
key               = literal / "*"

; Expressions
expression          = literal-expression
                    / variable-expression
                    / function-expression
literal-expression  = "{" o literal [s function] *(s attribute) o "}"
variable-expression = "{" o variable [s function] *(s attribute) o "}"
function-expression = "{" o function *(s attribute) o "}"

markup = "{" o "#" identifier *(s option) *(s attribute) o ["/"] "}"  ; open and standalone
       / "{" o "/" identifier *(s option) *(s attribute) o "}"  ; close

; Expression and literal parts
function       = ":" identifier *(s option)
option         = identifier o "=" o (literal / variable)

attribute      = "@" identifier [o "=" o literal]

variable       = "$" name

literal          = quoted-literal / unquoted-literal
quoted-literal   = "|" *(quoted-char / escaped-char) "|"
unquoted-literal = 1*name-char

; Keywords; Note that these are case-sensitive
input = %s".input"
local = %s".local"
match = %s".match"

; Names and identifiers
identifier = [namespace ":"] name
namespace  = name
name       = [bidi] name-start *name-char [bidi]
name-start = ALPHA
                                    ;          omit Cc: %x0-1F, Whitespace: SPACE, Ascii: «!"#$%&'()*»
                  / %x2B            ; «+»      omit Ascii: «,-./0123456789:;<=>?@» «[\]^»
                  / %x5F            ; «_»      omit Cc: %x7F-9F, Whitespace: %xA0, Ascii: «`» «{|}~»
                  / %xA1-61B        ;          omit BidiControl: %x61C
                  / %x61D-167F      ;          omit Whitespace: %x1680
                  / %x1681-1FFF     ;          omit Whitespace: %x2000-200A
                  / %x200B-200D     ;          omit BidiControl: %x200E-200F
                  / %x2010-2027     ;          omit Whitespace: %x2028-2029 %x202F, BidiControl: %x202A-202E
                  / %x2030-205E     ;          omit Whitespace: %x205F
                  / %x2060-2065     ;          omit BidiControl: %x2066-2069
                  / %x206A-2FFF     ;          omit Whitespace: %x3000
                  / %x3001-D7FF     ;          omit Cs: %xD800-DFFF
                  / %xE000-FDCF     ;          omit NChar: %xFDD0-FDEF
                  / %xFDF0-FFFD     ;          omit NChar: %xFFFE-FFFF
                  / %x10000-1FFFD   ;          omit NChar: %x1FFFE-1FFFF
                  / %x20000-2FFFD   ;          omit NChar: %x2FFFE-2FFFF
                  / %x30000-3FFFD   ;          omit NChar: %x3FFFE-3FFFF
                  / %x40000-4FFFD   ;          omit NChar: %x4FFFE-4FFFF
                  / %x50000-5FFFD   ;          omit NChar: %x5FFFE-5FFFF
                  / %x60000-6FFFD   ;          omit NChar: %x6FFFE-6FFFF
                  / %x70000-7FFFD   ;          omit NChar: %x7FFFE-7FFFF
                  / %x80000-8FFFD   ;          omit NChar: %x8FFFE-8FFFF
                  / %x90000-9FFFD   ;          omit NChar: %x9FFFE-9FFFF
                  / %xA0000-AFFFD   ;          omit NChar: %xAFFFE-AFFFF
                  / %xB0000-BFFFD   ;          omit NChar: %xBFFFE-BFFFF
                  / %xC0000-CFFFD   ;          omit NChar: %xCFFFE-CFFFF
                  / %xD0000-DFFFD   ;          omit NChar: %xDFFFE-DFFFF
                  / %xE0000-EFFFD   ;          omit NChar: %xEFFFE-EFFFF
                  / %xF0000-FFFFD   ;          omit NChar: %xFFFFE-FFFFF
                  / %x100000-10FFFD ;          omit NChar: %x10FFFE-10FFFF
name-char  = name-start / DIGIT / "-" / "."

; Restrictions on characters in various contexts
simple-start-char = %x01-08        ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
                  / %x0B-0C        ; omit CR (%x0D)
                  / %x0E-1F        ; omit SP (%x20)
                  / %x21-2D        ; omit . (%x2E)
                  / %x2F-5B        ; omit \ (%x5C)
                  / %x5D-7A        ; omit { (%x7B)
                  / %x7C           ; omit } (%x7D)
                  / %x7E-2FFF      ; omit IDEOGRAPHIC SPACE (%x3000)
                  / %x3001-10FFFF
text-char         = %x01-5B        ; omit NULL (%x00) and \ (%x5C)
                  / %x5D-7A        ; omit { (%x7B)
                  / %x7C           ; omit } (%x7D)
                  / %x7E-10FFFF
quoted-char       = %x01-5B        ; omit NULL (%x00) and \ (%x5C)
                  / %x5D-7B        ; omit | (%x7C)
                  / %x7D-10FFFF

; Character escapes
escaped-char = backslash ( backslash / "{" / "|" / "}" )
backslash    = %x5C ; U+005C REVERSE SOLIDUS "\"

; Required whitespace
s = *bidi ws o

; Optional whitespace
o = *(ws / bidi)

; Bidirectional marks and isolates
; ALM / LRM / RLM / LRI, RLI, FSI & PDI
bidi = %x061C / %x200E / %x200F / %x2066-2069

; Whitespace characters
ws = SP / HTAB / CR / LF / %x3000

Formatting

This section defines the behavior of a MessageFormat implementation when formatting a message for display in a user interface, or for some later processing.

To start, we presume that a message has either been parsed from its syntax or created from a data model description. If the resulting message is not well-formed, a Syntax Error is emitted. If the resulting message is well-formed but is not valid, a Data Model Error is emitted.

The formatting of a message is defined by the following operations:

Pattern Selection determines which of a message's patterns is formatted. For a message with no selectors, this is simple as there is only one pattern. With selectors, this will depend on their resolution.
Formatting takes the resolved values of the text and placeholder parts of the selected pattern, and produces the formatted result for the message. Depending on the implementation, this result could be a single concatenated string, an array of objects, an attributed string, or some other locally appropriate data type.
Expression and Markup Resolution determines the value of an expression or markup, with reference to the current formatting context. This can include multiple steps, such as looking up the value of a variable and calling formatting functions. The form of the resolved value is implementation defined and the value might not be evaluated or formatted yet. However, it needs to be "formattable", i.e. it contains everything required by the eventual formatting.

The resolution of text is rather straightforward, and is detailed under literal resolution.

Implementations are not required to expose the expression resolution and pattern selection operations to their users, or even use them in their internal processing, as long as the final formatting result is made available to users and the observable behavior of the formatting matches that described here.

Attributes MUST NOT have any effect on the formatted output of a message, nor be made available to function handlers.

Important

This specification does not require either eager or lazy expression resolution of message parts; do not construe any requirement in this document as requiring either.

Implementations are not required to evaluate all parts of a message when parsing, processing, or formatting. In particular, an implementation MAY choose not to evaluate or resolve the value of a given expression until it is actually used by a selection or formatting process. However, when an expression is resolved, it MUST behave as if all preceding declarations affecting variables referenced by that expression have already been evaluated in the order in which the relevant declarations appear in the message. An implementation MUST ensure that every expression in a message is evaluated at most once.

Important

Implementations with lazy evaluation MUST NOT use a call-by-name evaluation strategy. Instead, they must evaluate expressions at most once ("call-by-need"). This is to prevent expressions from having different values when used in different parts of a given message. Function handlers are not necessarily pure: they can access external mutable state such as the current system clock time. Thus, evaluating the same expression more than once could yield different results. That behavior violates this specification.

Important

Implementations and users SHOULD NOT create function handlers that mutate external program state, particularly since such a function handler can present a remote execution hazard.

Formatting Context

A message's formatting context represents the data and procedures that are required for the message's expression resolution, pattern selection and formatting.

At a minimum, it includes:

Information on the current locale, potentially including a fallback chain of locales. This will be passed on to formatting functions.
Information on the base directionality of the message and its text tokens. This will be used by strategies for bidirectional isolation, and can be used to set the base direction of the message upon display.
An input mapping of string identifiers to values, defining variable values that are available during variable resolution. This is often determined by a user-provided argument of a formatting function call.
A mapping of string identifiers to the function handlers that are available during function resolution.
Optionally, a fallback string to use for the message if it is not valid.

Implementations MAY include additional fields in their formatting context.

Resolved Values

A resolved value is the result of resolving a text, literal, variable, expression, or markup. The resolved value is determined using the formatting context. The form of the resolved value is implementation-defined.

In a declaration, the resolved value of an expression is bound to a variable, which makes it available for use in later expressions and markup options.

For example, in
.input {$a :number minimumFractionDigits=3}
.local $b = {$a :integer useGrouping=never}
.match $a
0 {{The value is zero.}}
* {{Without grouping separators, the value {$a} is rendered as {$b}.}}
the resolved value bound to $a is used as the operand of the :integer function when resolving the value of the variable $b, as a selector in the .match statement, as well as for formatting the placeholder {$a}.

In an input-declaration, the variable operand of the variable-expression identifies not only the name of the external input value, but also the variable to which the resolved value of the variable-expression is bound.

In a pattern, the resolved value of an expression or markup is used in its formatting. To support the Default Bidi Strategy, the resolved value of each expression SHOULD include information about the directionality of its formatted string representation, as well as a flag to indicate whether its formatted representation requires isolation from the surrounding text. (See "Handling Bidirectional Text".)

For each option value, the resolved value MUST indicate if the value was directly set with a literal, as opposed to being resolved from a variable. This is to allow function handlers to require specific options to be set using literals.

For example, the default functions :number and :integer require that the option select be set with a literal option value (plural, ordinal, or exact).

The form that resolved values take is implementation-dependent, and different implementations MAY choose to perform different levels of resolution.

While this specification does not require it, a resolved value could be implemented by requiring each function handler to return a value matching the following interface:
interface MessageValue {
  formatToString(): string
  formatToX(): X // where X is an implementation-defined type
  unwrap(): unknown
  resolvedOptions(): { [key: string]: MessageValue }
  match(key: string): boolean
  betterThan(key1: string, key2: string): boolean
  directionality(): 'LTR' | 'RTL' | 'unknown'
  isolate(): boolean
  isLiteralOptionValue(): boolean
}
With this approach:

An expression could be used as a placeholder if calling the formatToString() or formatToX() method of its resolved value did not emit an error.

A variable could be used as a selector if calling the match(key) and betterThan(key1, key2) methods of its resolved value did not emit an error.

The resolved value of an expression could be used as an operand or option value if calling the unwrap() method of its resolved value did not emit an error. (This requires an intermediate variable declaration.) In this use case, the resolvedOptions() method could also provide a set of option values that could be taken into account by the called function.

The unwrap() method returns the function-specific result of the function's operation. For example, the handlers for the following functions might behave as follows:

The handler for the default function :number returns a value whose unwrap() method returns the implementation-defined numeric value of the operand.

The handler for a custom :uppercase function might return a value whose unwrap() method returns an uppercase string in place of the original operand value.

The handler for a custom function that extracts a field from a data structure might return a value whose unwrap() method returns the extracted value.

Other functions' handlers might return a value whose unwrap() method returns the original operand value.

The directionality(), isolate(), and isLiteralOptionValue() methods fulfill requirements and recommendations mentioned elsewhere in this specification.

Extensions of the base MessageValue interface could be provided for different data types, such as numbers or strings, for which the unknown return type of unwrap() and the generic MessageValue type used in resolvedOptions() could be narrowed appropriately. An implementation could also allow MessageValue values to be passed in as input variables, or automatically wrap each variable as a MessageValue to provide a uniform interface for custom functions.

Expression and Markup Resolution

Expressions are used in declarations and patterns. Markup is only used in patterns. Options are used in expressions and markup.

Expression Resolution

Expression resolution determines the value of an expression. Depending on the presence or absence of a variable or literal operand and a function, the resolved value of the expression is determined as follows:

If the expression contains a function, its resolved value is defined by function resolution.

Else, if the expression consists of a variable, its resolved value is defined by variable resolution. An implementation MAY perform additional processing when resolving the value of an expression that consists only of a variable.

For example, it could apply function resolution using a function and a set of options chosen based on the value or type of the variable. So, given a message like this:
Today is {$date}
If the value passed in the variable were a date object, such as a JavaScript Date or a Java java.util.Date or java.time.Temporal, the implementation could interpret the placeholder {$date} as if the pattern included the function :datetime with some set of default options.

Else, the expression consists of a literal. Its resolved value is defined by literal resolution.

Note

This means that a literal value with no function is always treated as a string. To represent values that are not strings as a literal, a function needs to be provided:

.local $aNumber = {1234 :number}
.local $aDate = {|2023-08-30| :datetime}
.local $aFoo = {|some foo| :ns:foo}
{{You have {42 :number}}}

Literal Resolution

Literal resolution : The resolved value of a text or a literal contains the character sequence of the text or literal after any character escape has been converted to the escaped character.

When a literal is used as an operand or as an option value, the formatting function MUST treat its resolved value the same whether its value was originally a quoted literal or an unquoted literal.

For example, the option foo=42 and the option foo=|42| are treated as identical.

For example, in a JavaScript formatter, the resolved value of a text or a literal could have the following implementation:

class MessageLiteral implements MessageValue {
  constructor(value: string) {
    this.formatToString = () => value;
    this.getValue = () => value;
  }
  resolvedOptions: () => ({});
  match(_key: string) {
    throw Error("Selection on unannotated literals is not supported");
  }
}

Variable Resolution

Variable resolution : To resolve the value of a variable, its name is used to identify either a local variable or an input variable. If a declaration exists for the variable, its resolved value is used. Otherwise, the variable is an implicit reference to an input value, and its value is looked up from the formatting context input mapping.

The resolution of a variable fails if no value is identified for its name. If this happens, an Unresolved Variable error is emitted and a fallback value is used as the resolved value of the variable.

If the resolved value identified for the variable name is a fallback value, a fallback value is used as the resolved value of the variable.

The fallback value representation of a variable has a string representation consisting of the U+0024 DOLLAR SIGN $ followed by the name of the variable.

Function Resolution

Function resolution : To resolve an expression with a function, the following steps are taken:

If the expression includes an operand, resolve its value. If this is a fallback value, return a fallback value as the resolved value of the expression.
Resolve the identifier of the function and find the appropriate function handler to call. If the implementation cannot find the function handler, or if the identifier includes a namespace that the implementation does not support, emit an Unknown Function error and return a fallback value as the resolved value of the expression.

Implementations are not required to implement namespaces or support functions other than the default functions.
Perform option resolution.
Determine the function context for calling the function handler.

The function context contains the context necessary for the function handler to resolve the expression. This includes:
- The current locale, potentially including a fallback chain of locales.
- The base directionality of the expression. By default, this is undefined or empty.
If the resolved mapping of options includes any u: options options supported by the implementation, process them as specified. Such u: options MAY be removed from the resolved mapping of options.
Call the function handler with the following arguments:
- The function context.
- The resolved mapping of options.
- If the expression includes an operand, its resolved value.
The form that resolved operand and option values take is implementation-defined.

An implementation MAY pass additional arguments to the function handler, as long as reasonable precautions are taken to keep the function interface simple and minimal, and avoid introducing potential security vulnerabilities.
If the call succeeds, resolve the value of the expression as the result of that function call. The value MUST NOT be marked as a literal option value.

If the call fails or does not return a valid value, emit the appropriate Message Function Error for the failure.

Implementations MAY provide a mechanism for the function handler to provide additional detail about internal failures. Specifically, if the cause of the failure was that the datatype, value, or format of the operand did not match that expected by the function, the function SHOULD cause a Bad Operand error to be emitted.

In all failure cases, return a fallback value as the resolved value of the expression.

Function Handler

A function handler is an implementation-defined process such as a function or method which accepts a set of arguments and returns a resolved value. A function handler is required to resolve a function.

An implementation MAY define its own functions and their handlers. An implementation MAY allow custom functions to be defined by users.

Implementations that provide a means for defining custom functions MUST provide a means for function handlers to return resolved values that contain enough information to be used as operands or option values in subsequent expressions.

The resolved value returned by a function handler MAY be different from the value of the operand of the function. It MAY be an implementation specified type. It is not required to be the same type as the operand.

A function handler MAY include resolved options in its resolved value. The resolved options MAY be different from the options of the function.

A function handler SHOULD emit a Bad Operand error for operands whose resolved value or type is not supported.

Function handler access to the formatting context MUST be minimal and read-only, and execution time SHOULD be limited.

Implementation-defined functions SHOULD use an implementation-defined namespace.

Markup Resolution

Markup resolution determines the value of markup. Unlike functions, the resolution of markup is not customizable.

The resolved value of markup includes the following fields:

The type of the markup: open, standalone, or close
The identifier of the markup
The resolved mapping of options after option resolution.

If the resolved mapping of options includes any u: options options supported by the implementation, process them as specified. Such u: options MAY be removed from the resolved mapping of options.

The resolution of markup MUST always succeed. (Any errors emitted by option resolution are non-fatal.)

Option Resolution

Option resolution is the process of computing the options for a given expression or markup. Option resolution results in a mapping of string identifiers to resolved values. The order of options MUST NOT be significant.

For example, the following message treats both both placeholders identically:
{$x :ns:func option1=foo option2=bar} {$x :ns:func option2=bar option1=foo}

For each option:

Let res be a new empty mapping.
For each option:
1. Let id be the string value of the identifier of the option.
2. Let rv be the resolved value of the option value.
3. If rv is a fallback value:
  1. Emit a Bad Option error, if supported.
4. Else:
  1. If the option value consists of a literal:
    1. Mark rv as a literal option value.
  2. Set res[id] to be rv.
Return res.

Note

If the resolved value of an option value is a fallback value, the option is intentionally omitted from the mapping of resolved options.

The result of option resolution MUST be a (possibly empty) mapping of string identifiers to values; that is, errors MAY be emitted, but such errors MUST NOT be fatal. This mapping can be empty.

Note

The resolved value of a function operand can also include resolved option values. These are not included in the option resolution result, and need to be processed separately by a function handler.

Fallback Resolution

A fallback value is the resolved value for an expression or variable when that expression or variable fails to resolve. It contains a string representation that is used for its formatting. All options are removed.

The resolved value of text, literal, and markup MUST NOT be a fallback value.

A variable fails to resolve when no value is identified for its name. The string representation of its fallback value is U+0024 DOLLAR SIGN $ followed by the name of the variable.

An expression fails to resolve when:

A variable used as its operand resolves to a fallback value. Note that an expression does not necessarily fail to resolve if an option value resolves with a fallback value.
No function handler is found for a function identifier.
Calling a function handler fails or does not return a valid value.

The string representation of the fallback value of an expression depends on its contents:

expression with a literal operand (either quoted or unquoted): U+007C VERTICAL LINE | followed by the value of the literal with escaping applied to U+005C REVERSE SOLIDUS \ and U+007C VERTICAL LINE |, and then by U+007C VERTICAL LINE |.

Examples: In a context where :ns:func fails to resolve, {42 :ns:func} resolves to a fallback value with a string representation |42| and {|C:\\| :ns:func} resolves to a fallback value with a string representation |C:\\|.
expression with variable operand: the fallback value representation of that variable, U+0024 DOLLAR SIGN $ followed by the name of the variable
Examples: In a context where $var fails to resolve, {$var} and {$var :number} both resolve to a fallback value with a string representation $var (even if :number fails to resolve).

In a context where :ns:func fails to resolve, the placeholder in .local $var = {|val| :ns:func} {{{$var}}} resolves to a fallback value with a string representation $var.

In a context where either :ns:now or :ns:pretty fails to resolve, the placeholder in
```
.local $time = {:ns:now format=iso8601}
{{{$time :ns:pretty}}}
```
resolves to a fallback value with a string representation $time.
function expression with no operand: U+003A COLON : followed by the function identifier

Example: In a context where :ns:func fails to resolve, {:ns:func} resolves to a fallback value with a string representation :ns:func.
Otherwise: the U+FFFD REPLACEMENT CHARACTER �

This is not currently used by any expression, but may apply in future revisions.

Options and attributes are not included in the fallback value.

Pattern selection is not supported for fallback values.

For example, in a JavaScript formatter the fallback value could have the following implementation, where source is one of the above-defined strings:
class MessageFallback implements MessageValue {
  constructor(source: string) {
    this.formatToString = () => `{${source}}`;
    this.getValue = () => undefined;
  }
  resolvedOptions: () => ({});
  match(_key: string) {
    throw Error("Selection on fallback values is not supported");
  }
}

Pattern Selection

If the message being formatted is not well-formed and valid, the result of pattern selection is a pattern consisting of a single fallback value using the message's fallback string defined in the formatting context or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER �.

If the message being formatted does not contain a matcher, the result of pattern selection is its pattern value.

When a message contains a matcher with one or more selectors, the implementation needs to determine which variant will be used to provide the pattern for the formatting operation. This is done by traversing the list of available variant statements and maintaining a provisional "best variant". Each subsequent variant is compared to the previous best variant according to its key values, yielding a single best variant.

Note

At least one variant is required to have all of its keys consist of the fallback value *. Some selectors might be implemented in a way that the key value * cannot be selected in a valid message. In other cases, this key value might be unreachable only in certain locales. This could result in the need in some locales to create one or more variants that do not make sense grammatically for that language.

For example, in the pl (Polish) locale, this message cannot reach the * variant:
.input {$num :integer}
.match $num
0    {{ }}
one  {{ }}
few  {{ }}
many {{ }}
*    {{Only used by fractions in Polish.}}

The number of keys in each variant MUST equal the number of selectors.

Each key corresponds to a selector by its position in the variant.

For example, in this message:
.input {$one :number}
.input {$two :number}
.input {$three :number}
.match $one $two $three
1 2 3 {{ ... }}
The first key 1 corresponds to the first selector ($one), the second key 2 to the second selector ($two), and the third key 3 to the third selector ($three).

This selection method is defined in more detail below. An implementation MAY use any pattern selection method, as long as its observable behavior matches the results of the method defined here.

Operations on Resolved Values

For a resolved value to support selection, the operations Match and BetterThan need to be defined on it.

If rv is a resolved value that supports selection, then Match(rv, k) returns true for any key k that matches rv and returns false otherwise. BetterThan(rv, k1, k2) returns true for any keys k1 and k2 for which Match(rv, k1) is true, Match(rv, k2) is true, and k1 is a better match than k2, and returns false otherwise. On any error, both operations return false.

Other than the Match(rv, k) and BetterThan(rv, k1, k2) operations on resolved values, the form of the resolved values is determined by each implementation, along with the manner of determining their support for selection.

Resolve Selectors

First, resolve the values of each selector:

Let res be a new empty list of resolved values that support selection.
For each selector sel, in source order,
1. Let rv be the resolved value of sel.
2. If selection is supported for rv:
  1. Append rv as the last element of the list res.
3. Else:
  1. Let nomatch be a resolved value for which Match(rv, k) is false for any key k.
  2. Append nomatch as the last element of the list res.
  3. Emit a Bad Selector error.

Compare Variants

Next, using res:

Let bestVariant be UNSET.
For each variant var of the message, in source order:
1. Let keys be the keys of var.
2. Let match be SelectorsMatch(res, keys).
3. If match is false:
  1. Continue the loop.
4. If bestVariant is UNSET.
  1. Set bestVariant to var.
5. Else:
  1. Let bestVariantKeys be the keys of bestVariant.
  2. If SelectorsCompare(res, keys, bestVariantKeys) is true:
    1. Set bestVariant to var.
Assert that bestVariant is not UNSET.
Select the pattern of bestVariant.

SelectorsMatch

SelectorsMatch(selectors, keys) is defined as follows, where selectors is a list of resolved values and keys is a list of keys:

Let i be 0.
For each key key in keys:
1. If key is not the catch-all key '*'
  1. Let k be NormalizeKey(key).
  2. Let sel be the ith element of selectors.
  3. If Match(sel, k) is false:
    1. Return false.
2. Set i to i + 1.
Return true.

SelectorsCompare

SelectorsCompare(selectors, keys1, keys2) is defined as follows, where selectors is a list of resolved values and keys1 and keys2 are lists of keys.

Let i be 0.
For each key key1 in keys1:
1. Let key2 be the ith element of keys2.
2. If key1 is the catch-all key '*' and key2 is not the catch-all key:
  1. Return false.
3. If key1 is not the catch-all key '*' and key2 is the catch-all key:
  1. Return true.
4. If key1 and key2 are both the catch-all key '*'
  1. Set i to i + 1.
  2. Continue the loop.
5. Let k1 be NormalizeKey(key1).
6. Let k2 be NormalizeKey(key2).
7. If k1 and k2 consist of the same sequence of Unicode code points, then:
  1. Set i to i + 1.
  2. Continue the loop.
8. Let sel be the ith element of selectors.
9. Let result be BetterThan(sel, k1, k2).
10. Return result.
Return false.

NormalizeKey

NormalizeKey(key) is defined as follows, where key is a key.

Let rv be the resolved value of key (see Literal Resolution.)
Let k be the string value of rv.
Let k1 be the result of applying Unicode Normalization Form C [UAX#15] to k.
Return k1.

For examples of how the algorithms work, see the appendix.

Formatting of the Selected Pattern

After pattern selection, each text and placeholder part of the selected pattern is resolved and formatted.

Resolved values cannot always be formatted by a given implementation. When such an error occurs during formatting, an appropriate Message Function Error is emitted and a fallback value is used for the placeholder with the error.

Implementations MAY represent the result of formatting using the most appropriate data type or structure. Some examples of these include:

A single string concatenated from the parts of the resolved pattern.
A string with associated attributes for portions of its text.
A flat sequence of objects corresponding to each resolved value.
A hierarchical structure of objects that group spans of resolved values, such as sequences delimited by markup-open and markup-close placeholders.

Implementations SHOULD provide formatting result types that match user needs, including situations that require further processing of formatted messages. Implementations SHOULD encourage users to consider a formatted localised string as an opaque data structure, suitable only for presentation.

When formatting to a string, the default representation of all markup MUST be an empty string. Implementations MAY offer functionality for customizing this, such as by emitting XML-ish tags for each markup.

Formatting Examples

This section is non-normative.

An implementation might choose to return an interstitial object so that the caller can "decorate" portions of the formatted value. In ICU4J, the NumberFormatter class returns a FormattedNumber object, so a pattern such as This is my number {42 :number} might return the character sequence This is my number followed by a FormattedNumber object representing the value 42 in the current locale.
A formatter in a web browser could format a message as a DOM fragment rather than as a representation of its HTML source.

Formatting Fallback Values

If the resolved pattern includes any fallback values and the formatting result is a concatenated string or a sequence of strings, the string representation of each fallback value MUST be the concatenation of a U+007B LEFT CURLY BRACKET {, the fallback value as a string, and a U+007D RIGHT CURLY BRACKET }.

For example, a message that is not well-formed would format to a string as {�}, unless a fallback string is defined in the formatting context, in which case that string would be used instead.

Handling Bidirectional Text

Messages contain text. Any text can be bidirectional text. That is, the text can can consist of a mixture of left-to-right and right-to-left spans of text. The display of bidirectional text is defined by the Unicode Bidirectional Algorithm [UAX9].

The directionality of the formatted message as a whole is provided by the formatting context.

Note

Keep in mind the difference between the formatted output of a message, which is the topic of this section, and the syntax of message prior to formatting. The processing of a message depends on the logical sequence of Unicode code points, not on the presentation of the message. Affordances to allow users appropriate control over the appearance of the message's syntax have been provided.

When a message is formatted, placeholders are replaced with their formatted representation. Applying the Unicode Bidirectional Algorithm to the text of a formatted message (including its formatted parts) can result in unexpected or undesirable spillover effects. Applying bidi isolation to each affected formatted value helps avoid this spillover in a formatted message.

Note that both the message and, separately, each placeholder need to have direction metadata for this to work. If an implementation supports formatting to something other than a string (such as a sequence of parts), the directionality of each formatted placeholder needs to be available to the caller.

If a formatted expression itself contains spans with differing directionality, its formatter SHOULD perform any necessary processing, such as inserting controls or isolating such parts to ensure that the formatted value displays correctly in a plain text context.

For example, an implementation could provide a :currency formatting function which inserts strongly directional characters, such as U+200F RIGHT-TO-LEFT MARK (RLM), U+200E LEFT-TO-RIGHT MARK (LRM), or U+061C ARABIC LETTER MARKER (ALM), to coerce proper display of the sign and currency symbol next to a formatted number. An example of this is formatting the value -1234.56 as the currency AED in the ar-AE locale. The formatted value appears like this:
‎-1,234.56 د.إ.‏
The code point sequence for this string, as produced by the ICU4J NumberFormat function, includes U+200F U+200E at the start and U+200F at the end of the string. If it did not do this, the same string would appear like this instead:

A bidirectional isolation strategy is functionality in the formatter's processing of a message that produces bidirectional output text that is ready for display.

The Default Bidi Strategy is a bidirectional isolation strategy that uses isolating Unicode control characters around placeholder's formatted values. It is primarily intended for use in plain-text strings, where markup or other mechanisms are not available. The Default Bidi Strategy MUST be the default bidirectional isolation strategy when formatting a message as a single string.

Implementations MAY provide other bidirectional isolation strategies.

Implementations MAY supply a bidirectional isolation strategy that performs no processing.

The Default Bidi Strategy is defined as follows:

Let out be the empty string.
Let msgdir be the directionality of the whole message, one of « 'LTR', 'RTL', 'unknown' ». These correspond to the message having left-to-right directionality, right-to-left directionality, and to the message's directionality not being known.
For each part part in pattern:
1. If part is a plain literal (text) part, append part to out.
2. Else if part is a markup placeholder:
  1. Let fmt be the formatted string representation of the resolved value of part. Note that this is normally the empty string.
  2. Append fmt to out.
3. Else:
  1. Let resval be the resolved value of part.
  2. Let fmt be the formatted string representation of resval.
  3. Let dir be the directionality of resval, one of « 'LTR', 'RTL', 'unknown' », with the same meanings as for msgdir.
  4. Let the boolean value isolate be True if the u:dir option of resval has a value other than 'inherit', or False otherwise.
  5. If dir is 'LTR':
    1. If msgdir is 'LTR' and isolate is False:
      1. Append fmt to out.
    2. Else:
      1. Append U+2066 LEFT-TO-RIGHT ISOLATE to out.
      2. Append fmt to out.
      3. Append U+2069 POP DIRECTIONAL ISOLATE to out.
  6. Else if dir is 'RTL':
    1. Append U+2067 RIGHT-TO-LEFT ISOLATE to out.
    2. Append fmt to out.
    3. Append U+2069 POP DIRECTIONAL ISOLATE to out.
  7. Else:
    1. Append U+2068 FIRST STRONG ISOLATE to out.
    2. Append fmt to out.
    3. Append U+2069 POP DIRECTIONAL ISOLATE to out.
Emit out as the formatted output of the message.

Note

As mentioned in the "Resolved Values" section, the representation of a resolved value can track everything needed to determine the directionality of the formatted string representation of a resolved value. Each function handler can have its own means for determining the directionality annotation on the resolved value it returns. Alternately, an implementation could simply determine directionality based on the locale.

Important

Directionality SHOULD NOT be determined by introspecting the character sequence in the formatted string representation of resval.

Errors

Errors can occur during the processing of a message. Some errors can be detected statically, such as those due to problems with message syntax, violations of requirements in the data model, or requirements defined by a function. Other errors might be detected during selection or formatting of a given message. Where available, the use of validation tools is recommended, as early detection of errors makes their correction easier.

Error Handling

Syntax Errors and Data Model Errors apply to all message processors, and MUST be emitted as soon as possible. The other error categories are only emitted during formatting, but it might be possible to detect them with validation tools.

During selection and formatting, expression handlers MUST only emit Message Function Errors.

Implementations do not have to check for or emit Resolution Errors or Message Function Errors in expressions that are not otherwise used by the message, such as placeholders in unselected patterns or declarations that are never referenced during formatting.

When formatting a message with one or more errors, an implementation MUST provide a mechanism to discover and identify at least one of the errors. The exact form of error signaling is implementation defined. Some examples include throwing an exception, returning an error code, or providing a function or method for enumerating any errors.

For all valid messages, an implementation MUST enable a user to get a formatted result. The formatted result might include fallback values such as when a placeholder's expression produced an error during formatting.

The two above requirements MAY be fulfilled by a single formatting method, or separately by more than one such method.

When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors MUST prioritise Syntax Errors and Data Model Errors over others.

When an error occurs while resolving a selector or calling MatchSelectorKeys with its resolved value, the selector MUST NOT match any variant key other than the catch-all * and a Bad Selector error MUST be emitted.

Syntax Errors

Syntax Errors occur when the syntax representation of a message is not well-formed.

Example invalid messages resulting in a Syntax Error:
{{Missing end braces
{{Missing one end brace}
Unknown {{expression}}
.local $var = {|no message body|}

Data Model Errors

Data Model Errors occur when a message is not valid due to violating one of the semantic requirements on its structure.

Variant Key Mismatch

A Variant Key Mismatch occurs when the number of keys on a variant does not equal the number of selectors.

Example invalid messages resulting in a Variant Key Mismatch error:

.input {$one :ns:func}
.match $one
1 2 {{Too many}}
* {{Otherwise}}

.input {$one :ns:func}
.input {$two :ns:func}
.match $one $two
1 2 {{Two keys}}
* {{Missing a key}}
* * {{Otherwise}}

Missing Fallback Variant

A Missing Fallback Variant error occurs when the message does not include a variant with only catch-all keys.

Example invalid messages resulting in a Missing Fallback Variant error:

.input {$one :ns:func}
.match $one
1 {{Value is one}}
2 {{Value is two}}

.input {$one :ns:func}
.input {$two :ns:func}
.match $one $two
1 * {{First is one}}
* 1 {{Second is one}}

Missing Selector Annotation

A Missing Selector Annotation error occurs when the message contains a selector that does not directly or indirectly reference a declaration with a function.

Examples of invalid messages resulting in a Missing Selector Annotation error:

.match $one
1 {{Value is one}}
* {{Value is not one}}

.local $one = {|The one|}
.match $one
1 {{Value is one}}
* {{Value is not one}}

.input {$one}
.match $one
1 {{Value is one}}
* {{Value is not one}}

Duplicate Declaration

A Duplicate Declaration error occurs when a variable is declared more than once. Note that an input variable is implicitly declared when it is first used, so explicitly declaring it after such use is also an error.

Examples of invalid messages resulting in a Duplicate Declaration error:

.input {$var :number maximumFractionDigits=0}
.input {$var :number minimumFractionDigits=0}
{{Redeclaration of the same variable}}

.local $var = {$ext :number maximumFractionDigits=0}
.input {$var :number minimumFractionDigits=0}
{{Redeclaration of a local variable}}

.input {$var :number minimumFractionDigits=0}
.local $var = {$ext :number maximumFractionDigits=0}
{{Redeclaration of an input variable}}

.input {$var :number minimumFractionDigits=$var2}
.input {$var2 :number}
{{Redeclaration of the implicit input variable $var2}}

.local $var = {$ext :ns:func}
.local $var = {$error}
.local $var2 = {$var2 :ns:error}
{{{$var} cannot be redefined. {$var2} cannot refer to itself}}

Duplicate Option Name

A Duplicate Option Name error occurs when the same identifier appears on the left-hand side of more than one option in the same expression.

Examples of invalid messages resulting in a Duplicate Option Name error:
Value is {42 :number style=percent style=decimal}
.local $foo = {horse :ns:func one=1 two=2 one=1}
{{This is {$foo}}}

Duplicate Variant

A Duplicate Variant error occurs when the same list of keys is used for more than one variant.

Examples of invalid messages resulting in a Duplicate Variant error:

.input {$var :string}
.match $var
* {{The first default}}
* {{The second default}}

.input {$x :string}
.input {$y :string}
.match $x $y
*   foo   {{The first "foo" variant}}
bar *     {{The "bar" variant}}
*   |foo| {{The second "foo" variant}}
*   *     {{The default variant}}

Resolution Errors

Resolution Errors occur when the runtime value of a part of a message cannot be determined.

Unresolved Variable

An Unresolved Variable error occurs when a variable reference cannot be resolved.

For example, attempting to format either of the following messages would result in an Unresolved Variable error if done within a context that does not provide for the variable reference $var to be successfully resolved:
The value is {$var}.
.input {$var :ns:func}
.match $var
1 {{The value is one.}}
* {{The value is not one.}}

Unknown Function

An Unknown Function error occurs when an expression includes a reference to a function which cannot be resolved.

For example, attempting to format either of the following messages would result in an Unknown Function error if done within a context that does not provide for the function :ns:func to be successfully resolved:
The value is {horse :ns:func}.
.local $horse = {|horse| :ns:func}
.match $horse
1 {{The value is one.}}
* {{The value is not one.}}

Bad Selector

A Bad Selector error occurs when a message includes a selector with a resolved value which does not support selection.

For example, attempting to format this message would result in a Bad Selector error:
.local $day = {|2024-05-01| :date}
.match $day
* {{The due date is {$day}}}

Message Function Errors

A Message Function Error is any error that occurs when calling a function handler or which depends on validation associated with a specific function.

Implementations SHOULD provide a way for function handlers to emit (or cause to be emitted) any of the types of error defined in this section. Implementations MAY also provide implementation-defined Message Function Error types.

For example, attempting to format any of the following messages might result in a Message Function Error if done within a context that

Provides for the variable reference $user to resolve to an object { name: 'Kat', id: 1234 },

Provides for the variable reference $field to resolve to a string 'address', and

Uses a :ns:get message function which requires its argument to be an object and an option field to be provided with a string value.

The exact type of Message Function Error is determined by the function handler.
Hello, {horse :ns:get field=name}!
Hello, {$user :ns:get}!
.local $id = {$user :ns:get field=id}
{{Hello, {$id :ns:get field=name}!}}
Your {$field} is {$id :ns:get field=$field}

Bad Operand

A Bad Operand error is any error that occurs due to the content or format of the operand, such as when the operand provided to a function during function resolution does not match one of the expected implementation-defined types for that function; or in which a literal operand value does not have the required format and thus cannot be processed into one of the expected implementation-defined types for that specific function.

For example, the following messages each produce a Bad Operand error because the literal |horse| does not match the number-literal production, which is a requirement of the function :number for its operand:
.local $horse = {|horse| :number}
{{You have a {$horse}.}}
.local $horse = {|horse| :number}
.match $horse
1 {{The value is one.}}
* {{The value is not one.}}

Bad Option

A Bad Option error is an error that occurs when there is an implementation-defined error with an option or an option value. These might include:

A required option is missing.
Mutually exclusive options are supplied.
An option value provided to a function during function resolution does not match one of the implementation-defined types or values for that function; or in which the string value of an option does not have the required format and thus cannot be processed into one of the expected implementation-defined types for that specific function.

For example, the following message might produce a Bad Option error because the literal foo does not match the production digit-size-option, which is a requirement of the function :number for its minimumFractionDigits option:
The answer is {42 :number minimumFractionDigits=foo}.

Bad Variant Key

A Bad Variant Key error is an error that occurs when a variant key does not match the expected implementation-defined format.

For example, the following message produces a Bad Variant Key error because horse is not a recognized plural category and does not match the number-literal production, which is a requirement of the :number function:
.local $answer = {42 :number}
.match $answer
1     {{The value is one.}}
horse {{The value is a horse.}}
*     {{The value is not one.}}

Unsupported Operation

A Unsupported Operation error is an implementation-specific error that occurs when a given option, option value, operand, or some combination of these are incompatible or not supported by a given function or its function handler.

Default Functions

This section defines the default functions which are REQUIRED for conformance with this specification, along with default functions that SHOULD be implemented to support additional functionality.

To accept a function means that an implementation MUST NOT emit an Unknown Function error for that function's identifier. To accept an option means that a function handler MUST NOT emit a Bad Option error for that option's identifier when used with the function it is defined for and MUST NOT emit a Bad Option error for any of the option values defined for that option. Accepting a function or its options does not mean that a particular output is produced. Implementations MAY emit an Unsupported Operation error for options or option values that they cannot support.

Functions can define options. An option can be REQUIRED or RECOMMENDED.

Implementations MUST accept each REQUIRED default function and MUST accept all options defined as REQUIRED for those functions.

Implementations SHOULD accept each RECOMMENDED default function. For each such function, the implementation MUST accept all options listed as REQUIRED for that function.

Implementations SHOULD accept options that are marked as RECOMMENDED.

Implementations MAY accept functions not defined in this specification. In addition, implementations SHOULD provide mechanisms for users to register and use user-defined functions and their associated function handlers. Functions not defined by any version of this specification SHOULD use an implementation-defined or user-defined namespace.

Implementations MAY implement additional options not defined by any version of this specification for default functions. Such options MUST use an implementation-specific namespace.

Implementations MAY accept, for options defined in this specification, option values which are not defined in this specification. However, such values might become defined with a different meaning in the future, including with a different, incompatible name or using an incompatible value space. Supporting implementation-specific option values for default functions is NOT RECOMMENDED.

Implementations MAY accept, for operands or options defined in this specification, values with implementation-defined types. Such values can be useful to users in cases where local usage and support exists (including cases in which details vary from those defined by Unicode and CLDR).

For example:

Implementations are encouraged to accept some native representation for currency amounts as the operand in the function :currency.

A Java implementation might accept a java.time.chrono.Chronology object as a value for the date/time override option calendar

Future versions of this specification MAY define additional options and option values, subject to the rules in the Stability Policy, for functions found in this specification. As implementations are permitted to ignore options that they do not support, it is possible to write messages using options not defined here which currently format with no error, but which could produce errors when formatted with a later edition of this specification. Therefore, using options not explicitly defined here is NOT RECOMMENDED.

String Value Selection and Formatting

The `:string` function

The function :string provides string selection and formatting.

`:string` Operands

The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported, or any literal value. All other values produce a Bad Operand error.

For example, in Java, implementations of the java.lang.CharSequence interface (such as java.lang.String or java.lang.StringBuilder), the type char, or the class java.lang.Character might be considered as the "implementation-defined types". Such an implementation might also support other classes via the method toString(). This might be used to enable selection of a enum value by name, for example.

Other programming languages would define string and character sequence types or classes according to their local needs, including, where appropriate, coercion to string.

`:string` Options

The function :string has no options.

Note

While :string has no built-in options, options in the u: namespace can be used. For example:

{$s :string u:dir=ltr u:id=my-string}

`:string` Resolved Value

The resolved value of an expression with a :string function contains the string value of the operand of the annotated expression, together with its resolved locale and directionality. None of the options set on the expression are part of the resolved value.

Selection with `:string`

When implementing Match(resolvedSelector, key) where resolvedSelector is the resolved value of a selector and key is a string, the :string selector function performs as described below.

Let compare be the string value of resolvedSelector in Unicode Normalization Form C (NFC) [UAX#15]
If key and compare consist of the same sequence of Unicode code points, then
1. Return true.
Return false.

When implementing BetterThan(resolvedSelector, key1, key2 where resolvedSelector is the resolved value of a selector and key1 and key2 are strings, the :string selector function performs as described below, as the BetterThan operation should only be called on keys that match.

Return false.

Note

Unquoted string literals in a variant do not include spaces. If users wish to match strings that include whitespace (including U+3000 IDEOGRAPHIC SPACE) to a key, the key needs to be quoted.

For example:

.input {$string :string}
.match $string
| space key | {{Matches the string " space key "}}
*             {{Matches the string "space key"}}

`:string` Formatting

The :string function returns the string value of the resolved value of the operand.

Important

The function :string does not perform Unicode Normalization of its formatted output. Users SHOULD encode messages and their parts in Unicode Normalization Form C (NFC) unless there is a very good reason not to.

Numeric Value Selection and Formatting

The `:number` function

The function :number is a selector and formatter for numeric values.

`:number` Operands

The function :number requires a numeric operand as its operand.

`:number` Options

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.

Note

The names of options and their option values were derived from the options in JavaScript's Intl.NumberFormat.

The following options are REQUIRED to be available on the function :number:

select (see Number Selection below)
- plural (default)
- ordinal
- exact
signDisplay
- auto (default)
- always
- exceptZero
- negative
- never
useGrouping
- auto (default)
- always
- never
- min2
minimumIntegerDigits
- digit size option, default: 1
minimumFractionDigits
- digit size option
maximumFractionDigits
- digit size option
minimumSignificantDigits
- digit size option
maximumSignificantDigits
- digit size option
trailingZeroDisplay
- auto (default)
- stripIfInteger
roundingPriority
- auto (default)
- morePrecision
- lessPrecision
roundingIncrement
- 1 (default), 2, 5, 10, 20, 25, 50, 100, 200, 250, 500, 1000, 2000, 2500, and 5000
roundingMode
- ceil
- floor
- expand
- trunc
- halfCeil
- halfFloor
- halfExpand (default)
- halfTrunc
- halfEven

If the operand of the expression is an implementation-defined type, such as the resolved value of an expression with a :number or :integer annotation, it can include option values. These are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand.

For example, the placeholder in this message:
.input {$n :number minimumFractionDigits=2 signDisplay=always}
{{{$n :number minimumFractionDigits=1}}}
would be formatted with the resolved options { minimumFractionDigits: '1', signDisplay: 'always' }.

`:number` Resolved Value

The resolved value of an expression with a :number function contains an implementation-defined numerical value of the operand of the annotated expression, together with the resolved options' values.

Selection with `:number`

The function :number performs selection as described in Number Selection below.

The `:integer` function

The function :integer is a selector and formatter for matching or formatting numeric values as integers.

`:integer` Operands

The function :integer requires a numeric operand as its operand.

`:integer` Options

Note

The names of options and their option values were derived from the options in JavaScript's Intl.NumberFormat.

The following options are REQUIRED to be available on the function :integer:

select (see Number Selection below)
- plural (default)
- ordinal
- exact
signDisplay
- auto (default)
- always
- exceptZero
- negative
- never
useGrouping
- auto (default)
- always
- never
- min2
minimumIntegerDigits
- digit size option, default: 1
maximumSignificantDigits
- digit size option

If the operand of the expression is an implementation-defined type, such as the resolved value of an expression with a :number or :integer annotation, it can include option values. In general, these are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand. Options with the following names are however discarded if included in the operand:

minimumFractionDigits
maximumFractionDigits
minimumSignificantDigits

`:integer` Resolved Value

The resolved value of an expression with an :integer function contains the implementation-defined integer value of the operand of the annotated expression, together with the resolved options' values.

Selection with `:integer`

The function :integer performs selection as described in Number Selection below.

The `:offset` function

The function :offset is a selector and formatter for matching or formatting numeric values to which an offset has been applied. The "offset" is a small integer adjustment of the operand's value.

This function is useful for selection and formatting of values that differ from the input value by a specified amount. For example, it can be used in a message such as this:
.input {$like_count :integer}
.local $others_count = {$like_count :offset subtract=1}
.match $like_count $others_count
0 *   {{Your post has no likes.}}
1 *   {{{$name} liked your post.}}
* one {{{$name} and {$others_count} other user liked your post.}}
* *   {{{$name} and {$others_count} other users liked your post.}}

Note

The purpose of this function is to supply compatibility with ICU's PluralFormat and its offset feature, also found in ICU MessageFormat.

`:offset` Operands

The function :offset requires a numeric operand as its operand.

`:offset` Options

The options on :offset are exclusive with each other, and exactly one option is always required. The options do not have default values.

The following options are REQUIRED to be available on the function :offset:

add
- digit size option
subtract
- digit size option

If no options or more than one option is set, or if an option value is not a digit size option, a Bad Option error is emitted and a fallback value used as the resolved value of the expression.

`:offset` Resolved Value

The resolved value of an expression with a :offset function contains the implementation-defined numeric value of the operand of the annotated expression.

If the add option is set, the numeric value of the resolved value is formed by incrementing the numeric value of the operand by the integer value of the digit size option.

If the subtract option is set, the numeric value of the resolved value is formed by decrementing the numeric value of the operand by the integer value of the digit size option.

If the operand of the expression is an implementation-defined numeric type, such as the resolved value of an expression with a :number or :integer annotation, it can include option values. These are included in the resolved option values of the expression. The :offset options are not included in the resolved option values.

Note

Implementations can encounter practical limits with :offset expressions, such as the result of adding two integers exceeding the storage or precision of some implementation-defined number type. In such cases, implementations can emit an Unsupported Operation error or they might just silently overflow the underlying data value.

Selection with `:offset`

The function :offset performs selection as described in Number Selection below.

The `:currency` function

The function :currency is a formatter for currency values, which is a specialized form of numeric formatting.

`:currency` Operands

The operand of the :currency function can be one of any number of implementation-defined types, each of which contains a numerical value and a currency; or it can be a numeric operand, as long as the option currency is provided. The option currency MUST NOT be used to override the currency of an implementation-defined type. Using this option in such a case results in a Bad Option error.

The value of the operand's currency MUST be either a string containing a well-formed Unicode Currency Identifier or an implementation-defined currency type. Currency codes are case-insensitive. A well-formed Unicode Currency Identifier matches the production currency_code in this ABNF:

currency_code = 3ALPHA

A numeric operand without a currency option results in a Bad Operand error.

Note

For example, in ICU4J, the type com.ibm.icu.util.CurrencyAmount can be used to set the amount and currency.

Note

The currency is only required to be well-formed rather than checked for validity. This allows new currency codes to be defined (there are many recent examples of this occuring). It also avoids requiring implementations to check currency codes for validity, although implementations are permitted to emit Bad Option or Bad Operand for invalid codes.

Note

For runtime environments that do not provide a ready-made data structure, class, or type for currency values, the implementation ought to provide a data structure, convenience function, or documentation on how to encode the value and currency code for formatting. For example, such an implementation might define a "currency operand" to include a key-value structure with specific keys to be the local currency operand, which might look like the following:

{
   "value": 123.45,
   "currency": "EUR"
}

`:currency` Options

Fraction digits for currency values behave differently than for other numeric formatters. The number of fraction digits displayed is usually set by the currency used. For example, USD uses 2 fraction digits, while JPY uses none. Setting some other number of fractionDigits allows greater precision display (such as when performing currency conversions or other specialized operations) or disabling fraction digits if set to 0.

The option trailingZeroDisplay has an option value stripIfInteger that is useful for displaying currencies with their fraction digits removed when the fraction part of the operand is zero. This is sometimes used in messages to make the displayed value omit the fraction part automatically.

For example, this message:
The special price is {$price :currency trailingZeroDisplay=stripIfInteger}.
When used with the value 5.00 USD in the en-US locale displays as:
The special price is $5.
But like this when when value is 5.01 USD:
The special price is $5.01.

Implementations MAY internally alias option values that they do not have data or a backing implementation for. Notably, the currencyDisplay option has a rich set of values that mirrors developments in CLDR data. Some implementations might not be able to produce all of these formats for every currency.

Note

Except where noted otherwise, the names of options and their option values were derived from the options in JavaScript's Intl.NumberFormat.

The following options are REQUIRED to be available on the function :currency:

currency
- well-formed Unicode Currency Identifier (no default)
currencySign
- accounting
- standard (default)
currencyDisplay
- narrowSymbol
- symbol (default)
- name
- code
- never (this is called hidden in ICU)
useGrouping
- auto (default)
- always
- never
- min2
minimumIntegerDigits
- digit size option, default: 1
fractionDigits (unlike number/integer formats, the fraction digits for currency formatting are fixed)
- auto (default) (the number of digits used by the currency)
- digit size option
minimumSignificantDigits
- digit size option
maximumSignificantDigits
- digit size option
trailingZeroDisplay
- auto (default)
- stripIfInteger
roundingPriority
- auto (default)
- morePrecision
- lessPrecision
roundingIncrement
- 1 (default), 2, 5, 10, 20, 25, 50, 100, 200, 250, 500, 1000, 2000, 2500, and 5000
roundingMode
- ceil
- floor
- expand
- trunc
- halfCeil
- halfFloor
- halfExpand (default)
- halfTrunc
- halfEven

If the operand of the expression is an implementation-defined type, such as the resolved value of an expression with a :currency annotation, it can include option values. These are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand.

For example, the placeholder in this message:
.input {$n :currency currency=USD trailingZeroDisplay=stripIfInteger}
{{{$n :currency currencySign=accounting}}}
would be formatted with the resolved options { currencySign: 'accounting', trailingZeroDisplay: 'stripIfInteger', currency: 'USD' }.

`:currency` Resolved Value

The resolved value of an expression with a :currency function contains an implementation-defined currency value of the operand of the annotated expression, together with the resolved options' values.

The `:percent` function

The function :percent is a selector and formatter for percent values.

`:percent` Operands

The function :percent requires a numeric operand as its operand.

When either selecting or formatting the expression, the numeric value of the operand is multiplied by 100.

`:percent` Options

Note

The names of options and their option values were derived from the options in JavaScript's Intl.NumberFormat.

The following options are REQUIRED to be available on the function :percent:

signDisplay
- auto (default)
- always
- exceptZero
- negative
- never
useGrouping
- auto (default)
- always
- never
- min2
minimumFractionDigits
- digit size option, default: 0
maximumFractionDigits
- digit size option, default: 0
minimumSignificantDigits
- digit size option
maximumSignificantDigits
- digit size option
trailingZeroDisplay
- auto (default)
- stripIfInteger
roundingPriority
- auto (default)
- morePrecision
- lessPrecision
roundingMode
- ceil
- floor
- expand
- trunc
- halfCeil
- halfFloor
- halfExpand (default)
- halfTrunc
- halfEven

The numeric value of the operand is multiplied by 100 at the start of formatting or selection. Each option is applied to the formatted (or selected) value rather than the unaltered value of the operand.

For example, this placeholder:
{0.1234 :percent maximumFractionDigits=1}
might be formatted as "12.3%" in an English locale.

If the operand of the expression is an implementation-defined type, such as the resolved value of an expression with a :number or :integer annotation, it can include option values. In general, these are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand. Options with the following names are however discarded if included in the operand:

minimumIntegerDigits
roundingIncrement
select

`:percent` Resolved Value

The resolved value of an expression with a :percent function contains an implementation-defined numerical value of the operand of the annotated expression together with the resolved options' values. The numerical value of the resolved value of the expression is the same as the numerical value of its operand; it is not multiplied by 100.

Selection with `:percent`

The function :percent performs selection as described in Number Selection below. This selection always uses the plural selection mode, and is performed on the numerical value of the operand multiplied by 100.

For example, this message:
.local $pct = {1 :percent}
.match $pct
1   {{Would match with 0.01 as the operand}}
100 {{Matches 💯}}
*   {{Otherwise}}
would be formatted as "Matches 💯".

The `:unit` function

Important

The function :unit has a status of Draft. It is proposed for inclusion in a future release of this specification and is not Stable.

The function :unit is proposed to be a RECOMMENDED formatter for unitized values, that is, for numeric values associated with a unit of measurement. This is a specialized form of numeric formatting.

`:unit` Operands

The operand of the :unit function can be one of any number of implementation-defined types, each of which contains a numerical value plus a unit or it can be a numeric operand, as long as the option unit is provided.

Valid values of the operand's unit are either a string containing a valid Unit Identifier or an implementation-defined unit type.

A numeric operand without a unit option results in a Bad Operand error.

Note

For example, in ICU4J, the type com.ibm.icu.util.Measure might be used as an operand for :unit because it contains the value and unit.

Note

For runtime environments that do not provide a ready-made data structure, class, or type for unit values, the implementation ought to provide a data structure, convenience function, or documentation on how to encode the value and unit for formatting. For example, such an implementation might define a "unit operand" to include a key-value structure with specific keys to be the local unit operand, which might look like the following:

{
   "value": 123.45,
   "unit": "kilometer-per-hour"
}

`:unit` Options

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the unit, the value of other options, or all of these.

The following options are REQUIRED to be available on the function :unit, unless otherwise indicated:

unit
- valid Unit Identifier (no default)
usage [RECOMMENDED]
- valid Unicode Unit Preference (no default, see Unit Conversion below)
unitDisplay
- short (default)
- narrow
- long
signDisplay
- auto (default)
- always
- exceptZero
- negative
- never
useGrouping
- auto (default)
- always
- never
- min2
minimumIntegerDigits
- digit size option, default: 1
minimumFractionDigits
- digit size option
maximumFractionDigits
- digit size option
minimumSignificantDigits
- digit size option
maximumSignificantDigits
- digit size option
roundingPriority
- auto (default)
- morePrecision
- lessPrecision
roundingIncrement
- 1 (default), 2, 5, 10, 20, 25, 50, 100, 200, 250, 500, 1000, 2000, 2500, and 5000
roundingMode
- ceil
- floor
- expand
- trunc
- halfCeil
- halfFloor
- halfExpand (default)
- halfTrunc
- halfEven

If the operand of the expression is an implementation-defined type, such as the resolved value of an expression with a :unit annotation, it can include option values. These are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand.

For example, the placeholder in this message:
.input {$n :unit unit=furlong minimumFractionDigits=2}
{{{$n :unit minimumIntegerDigits=1}}}
would have the resolved options: { unit: 'furlong', minimumFractionDigits: '2', minimumIntegerDigits: '1' }.

`:unit` Resolved Value

The resolved value of an expression with a :unit function consist of an implementation-defined unit value of the operand of the annotated expression, together with the resolved options and their resolved values.

Unit Conversion

Implementations MAY support conversion to the locale's preferred units via the usage option. Implementing this option is optional. Not all usage option values are compatible with a given unit. Implementations SHOULD emit an Unsupported Operation error if the requested conversion is not supported.

For example, trying to convert a length unit (such as "meters") to a volume usage (which might be a unit akin to "liters" or "gallons", depending on the locale) could produce an Unsupported Operation error.

Implementations MUST NOT substitute the unit without performing the associated conversion.

For example, consider the value:
{
   "value": 123.5,
   "unit": "meter"
}
The following message might convert the formatted result to U.S. customary units in the en-US locale:
You have {$v :unit usage=road maximumFractionDigits=0} to go.
This can produce "You have 405 feet to go."

Numeric Operands

A numeric operand is either an implementation-defined type or a literal whose contents match the following number-literal production. All other values produce a Bad Operand error.

number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT]

For example, in Java, any subclass of java.lang.Number plus the primitive types (byte, short, int, long, float, double, etc.) might be considered as the "implementation-defined numeric types". Implementations in other programming languages would define different types or classes according to their local needs.

Note

String values passed as variables in the formatting context's input mapping can be formatted as numeric values as long as their contents match the number-literal production.

For example, if the value of the variable num were the string -1234.567, it would behave identically to the local variable in this example:

.local $example = {|-1234.567| :number}
{{{$num :number} == {$example}}}

Note

Implementations are encouraged to provide support for compound types or data structures that provide additional semantic meaning to the formatting of number-like values. For example, in ICU4J, the type com.ibm.icu.util.Measure can be used to communicate a value that includes a unit or the type com.ibm.icu.util.CurrencyAmount can be used to set the currency and related options (such as the number of fraction digits).

Digit Size Options

Some options of number functions are defined to take a digit size option. The function handlers for number functions use these options to control aspects of numeric display such as the number of fraction, integer, or significant digits.

A digit size option is an option whose option value is interpreted by the function as a small integer greater than or equal to zero. Implementations MAY define upper and lower limits on the resolved value of a digit size option consistent with that implementation's practical limits.

In most cases, the value of a digit size option will be a string that encodes the value as a non-negative integer. Implementations MAY also accept implementation-defined types as the option value. When provided as a string, the representation of a digit size option matches the following ABNF:

digit-size-option = "0" / (("1"-"9") [DIGIT])

If the value of a digit size option does not evaluate as a non-negative integer, or if the value exceeds any implementation-defined and option-specific upper or lower limit, the implementation will emit a Bad Option Error and ignore the option. An implementation MAY replace a digit size option that exceeds an implementation-defined or option-specific upper or lower limit with an implementation-defined value rather than ignoring the option. Any such replacement value becomes the resolved value of that option.

For example, if an implementation imposed an upper limit of 20 on the option minimumIntegerDigits for the function :number then the resolved value of the option minimumIntegerDigits for both $x and $y in the following message would be 20:
.input {$x :number minimumIntegerDigits=999}
.local $y = {$x}
{{{$y}}}

Number Selection

The option value of the select option MUST be set by a literal. Allowing a variable option value for select would produce a message that is impossible to translate because the set of keys is tied to the selector chosen. If the option value is a variable or if the select option is set by an implementation-defined type used as an operand, a Bad Option Error is emitted and the resolved value of the expression MUST NOT support selection. The formatting of the resolved value is not affected by the select option.

Number selection has three modes:

exact selection matches the operand to explicit numeric keys exactly
plural selection matches the operand to explicit numeric keys exactly followed by a plural rule category if there is no explicit match
ordinal selection matches the operand to explicit numeric keys exactly followed by an ordinal rule category if there is no explicit match

When implementing Match(resolvedSelector, key) where resolvedSelector is the resolved value of a selector and key is a string, numeric selectors perform as described below.

Let exact be the serialized representation of the numeric value of resolvedSelector. (See Exact Literal Match Serialization for details)
Let keyword be a string which is the result of rule selection on resolvedSelector.
If the value of key matches the production number-literal, then
1. If key and exact consist of the same sequence of Unicode code points, then
  1. Return true.
2. Return false.
If key is one of the keywords zero, one, two, few, many, or other, then
1. If key and keyword consist of the same sequence of Unicode code points, then
  1. Return true.
2. Return false.
Emit a Bad Variant Key error.

When implementing BetterThan(resolvedSelector, key1, key2) where resolvedSelector is the resolved value of a selector and key1 and key2 are strings, numeric selectors perform as described below.

Assert that Match(resolvedSelector, key1) is true.
Assert that Match(resolvedSelector, key2) is true.
If the value of key1 matches the production number-literal, then
1. If the value of key2 does not match the production number-literal, then
  1. Return true.
Return false.

Note

Implementations are not required to implement this exactly as written. However, the observed behavior must be consistent with what is described here.

Default Value of `select` Option

The option value plural is the default for the option select because it is the most common use case for numeric selection. It can be used for exact value matches but also allows for the grammatical needs of languages using CLDR's plural rules. This might not be noticeable in the source language (particularly English), but can cause problems in target locales that the original developer is not considering.

For example, a naive developer might use a special message for the value 1 without considering a locale's need for a one plural:
.input {$var :number}
.match $var
1   {{You have one last chance}}
one {{You have {$var} chance remaining}}
*   {{You have {$var} chances remaining}}
The one variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such as two, few, and many.

Rule Selection

Rule selection is intended to support the grammatical matching needs of different languages/locales in order to support plural or ordinal numeric values.

If the select option value is exact, rule-based selection is not used. Otherwise rule selection matches the operand, as modified by function options, to exactly one of these keywords: zero, one, two, few, many, or other. The keyword other is the default.

Note

Since valid keys cannot be the empty string in a numeric expression, returning the empty string disables keyword selection.

The meaning of the keywords is locale-dependent and implementation-defined. A key that matches the rule-selected keyword is a stronger match than the fallback key * but a weaker match than any exact match key value.

The rules for a given locale might not produce all of the keywords. A given operand value might produce different keywords depending on the locale.

Apply the rules to the resolved value of the operand and the relevant function options, and return the resulting keyword. If no rules match, return other.

If the select option value is plural, the rules applied to selection SHOULD be the CLDR plural rule data of type cardinal. See charts for examples.

If the select option value is ordinal, the rules applied to selection SHOULD be the CLDR plural rule data of type ordinal. See charts for examples.

Example. In CLDR 44, the Czech (cs) plural rule set can be found here.

A message in Czech might be:
.input {$numDays :number}
.match $numDays
one  {{{$numDays} den}}
few  {{{$numDays} dny}}
many {{{$numDays} dne}}
*    {{{$numDays} dní}}
Using the rules found above, the results of various operand values might look like:

Operand value Keyword Formatted Message

1 one 1 den

2 few 2 dny

5 other 5 dní

22 few 22 dny

27 other 27 dní

2.4 many 2,4 dne

Operand value	Keyword	Formatted Message
1	`one`	1 den
2	`few`	2 dny
5	`other`	5 dní
22	`few`	22 dny
27	`other`	27 dní
2.4	`many`	2,4 dne

Exact Literal Match Serialization

If the numeric value of resolvedSelector is an integer and none of the following options are set for resolvedSelector, the serialized form of the numeric value MUST match the ABNF defined below for integer, representing its decimal value:

minimumFractionDigits
minimumIntegerDigits
minimumSignificantDigits
maximumSignificantDigits

integer = "0" / ["-"] ("1"-"9") *DIGIT

Otherwise, the serialized form of the numeric value is implementation-defined.

Important

The exact behavior of exact literal match is only well defined for integer values without leading zeros. Functions that use fraction digits or significant digits might work in specific implementation-defined ways. Users should avoid depending on these types of keys in message selection.

Date and Time Value Formatting

This subsection describes the functions and options for date/time formatting.

Important

The functions in this section have a status of Draft. They are proposed for inclusion in a future release and are not Stable. The options and option values used by :datetime, :date, and :time are based on [Semantic Skeletons], which are in technical preview. The set of options and option values will be extended by later versions of this specification.

Note

Selection based on date/time types is not required by this release of MessageFormat. Use care when defining implementation-specific selectors based on date/time types. The types of queries found in implementations such as java.time.TemporalAccessor are complex and user expectations might be inconsistent with good I18N practices.

The `:datetime` function

The function :datetime is used to format a date/time value. Its formatted result will always include both the date and the time, and optionally a timezone.

If no options are specified, this function defaults to the following:

{$d :datetime} is the same as
{$d :datetime dateFields=year-month-day timePrecision=minute}

Note

The formatting behavior of :datetime is inconsistent with Intl.DateTimeFormat in JavaScript and with {d,date} in ICU MessageFormat 1.0. This is because, unlike those implementations, :datetime is distinct from :date and :time.

`:datetime` Operands

The operand of the :datetime function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

`:datetime` Options

The following options are REQUIRED to be available on the function :datetime:

dateFields
- weekday
- day-weekday
- month-day
- month-day-weekday
- year-month-day (default)
- year-month-day-weekday
dateLength
- long
- medium (default)
- short
timePrecision
- hour
- minute (default)
- second
timeZoneStyle
- long
- short
Date/time override options

If the timeZoneStyle option is not included in the expression, its formatted result will not include a timezone indicator.

Except for date/time override options, each :datetime option value MUST be set by a literal. If such an option value is a variable, a Bad Option Error is emitted and the option is ignored when formatting the expression.

If the operand of the expression is an implementation-defined date/time type, it can include other option values. Any date/time override options of the operand are included in the resolved option values of the expression, with options on the expression taking priority over any options of the operand. Any operand options not matching the date/time override options are ignored.

`:datetime` Resolved Value

The resolved value of an expression with a :datetime function contains an implementation-defined date/time value of the operand of the annotated expression, together with the resolved options values.

The `:date` function

The function :date is used to format the date portion of date/time values.

If no options are specified, this function defaults to the following:

{$d :date} is the same as {$d :date fields=year-month-day length=medium}

`:date` Operands

The operand of the :date function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

`:date` Options

The following options are REQUIRED to be available on the function :date:

fields
- weekday
- day-weekday
- month-day
- month-day-weekday
- year-month-day (default)
- year-month-day-weekday
length
- long
- medium (default)
- short
Date/time override options

The fields and length option values MUST each be set by a literal. If such an option value is a variable, a Bad Option Error is emitted and the option is ignored when formatting the expression.

`:date` Resolved Value

The resolved value of an expression with a :date function is implementation-defined.

An implementation MAY emit a Bad Operand or Bad Option error (as appropriate) when a variable annotated directly or indirectly by a :date annotation is used as an operand or an option value.

The `:time` function

The function :time is used to format the time portion of date/time values. Its formatted result will always include the time, and optionally a timezone.

If no options are specified, this function defaults to the following:

{$t :time} is the same as {$t :time precision=minute}

`:time` Operands

The operand of the :time function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce a Bad Operand error.

`:time` Options

The following options are REQUIRED to be available on the function :time:

precision
- hour
- minute (default)
- second
timeZoneStyle
- long
- short
Date/time override options

If the timeZoneStyle option is not included in the expression, its formatted result will not include a timezone indicator.

The precision and timeZoneStyle option values MUST each be set by a literal. If such an option value is a variable, a Bad Option Error is emitted and the option is ignored when formatting the expression.

`:time` Resolved Value

The resolved value of an expression with a :time function is implementation-defined.

An implementation MAY emit a Bad Operand or Bad Option error (as appropriate) when a variable annotated directly or indirectly by a :time annotation is used as an operand or an option value.

Date and Time Operands

The operand of a date/time function is either an implementation-defined date/time type or a date/time literal value, as defined below. All other operand values produce a Bad Operand error.

A date/time literal value is a non-empty string consisting of an ISO 8601 date, or an ISO 8601 datetime optionally followed by a timezone offset. As implementations differ slightly in their parsing of such strings, ISO 8601 date and datetime values not matching the following regular expression MAY also be supported. Furthermore, matching this regular expression does not guarantee validity, given the variable number of days in each month.

(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)?

When the time is not present, implementations SHOULD use 00:00:00 as the time. When the offset is not present, implementations SHOULD use a floating time type (such as Java's java.time.LocalDateTime) to represent the time value. For more information, see Working with Timezones.

Important

The ABNF and syntax of Unicode MessageFormat do not formally define date/time literals. This means that a message can be syntactically valid but produce a Bad Operand error at runtime.

Note

String values passed as variables in the formatting context's input mapping can be formatted as date/time values as long as their contents are date/time literals.

For example, if the value of the variable now were the string 2024-02-06T16:40:00Z, it would behave identically to the local variable in this example:

.local $example = {|2024-02-06T16:40:00Z| :datetime}
{{{$now :datetime} == {$example}}}

Note

True time zone support in serializations is expected to coincide with the adoption of Temporal in JavaScript. The form of these serializations is known and is a de facto standard. Support for these extensions is expected to be required in the post-tech preview. See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/

Date and Time Override Options

Date/time override options are options that allow an expression to override values set by the current locale, or provided by the formatting context (such as the default time zone), or embedded in an implementation-defined date/time operand value.

Note

These options do not have default values because they are only to be used as overrides for locale-and-value dependent implementation-defined defaults.

The following option is REQUIRED to be available on the functions :datetime, :date, and :time.

timeZone
- A valid time zone identifier (see TZDB and LDML for information on identifiers)
- input
- UTC

The default value for timeZone is the default time zone provided by the formatting context.

The value input corresponds to the time zone of the operand. If it is used and the resolved value of the operand does not include a time zone or offset, a Bad Operand error is emitted and the default time zone is used to format the expression.

If the resolved value of the operand includes a time zone or offset, and the resolved value of the timeZone option is different from that, an implementation SHOULD convert the resolved value of the operand to the time zone indicated by the resolved value of the timeZone option. If such conversion is not supported, an implementation MAY alternatively emit a Bad Option error and use a fallback value as the resolved value of the expression.

The following option is REQUIRED to be available on the functions :datetime and :time:

hour12
- true
- false

The following option is RECOMMENDED to be available on the functions :datetime, :date, and :time.

calendar
- valid Unicode Calendar Identifier

Unicode Namespace

The u: namespace is reserved for the definition of options which affect the function context of the specific expressions in which they appear, or for the definition of options that are universally applicable rather than function-specific. It might also be used to define functions in a future release.

The CLDR Technical Committee of the Unicode Consortium manages the specification for this namespace, hence the namespace u:.

Unicode Namespace Options

This section describes u: options. When implemented, they apply to all functions and markup, including user-defined functions in that implementation.

`u:id`

Implementations providing a formatting target other than a concatenated string SHOULD support this option.

A string value that is included as an id or other suitable value in the formatted parts for the placeholder, or any other structured formatted results.

For example, u:id could be used to distinguish two otherwise matching placeholders from each other:
The first number was {$a :number u:id=first} and the second {$b :number u:id=second}.

Ignored when formatting a message to a string.

The u:id option value MUST be a literal or a variable whose resolved value is either a string or can be resolved to a string without error. For other values, a Bad Option error is emitted and the u:id option and its option value are ignored.

`u:dir`

Implementations SHOULD support this option.

Replaces the base directionality defined in the function context for this expression and applies bidirectional isolation to it.

If this option is set on markup, a Bad Option error is emitted and the u:dir option and its option value are ignored.

During processing, the u:dir option MUST be removed from the resolved mapping of options before calling the function handler. Its value is retained in the resolved value of the expression.

The u:dir option value MUST be one of the following literal values or a variable whose resolved value is one of the following strings:

ltr: left-to-right directionality
rtl: right-to-left directionality
auto: directionality determined from expression contents
inherit (default): directionality inherited from the message or from the resolved value of the operand without requiring isolation of the expression value.

For other values, a Bad Option error is emitted and the u:dir option and its option value are ignored.

Interchange Data Model

This section defines a data model representation of Unicode MessageFormat messages.

Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.

The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as ICU MessageFormat) to a Unicode MessageFormat implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.

Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.

This data model provides these capabilities:

any Unicode MessageFormat message can be parsed into this representation
this data model representation can be serialized as a well-formed Unicode MessageFormat message
parsing a Unicode MessageFormat message into a data model representation and then serializing it results in an equivalently functional message

This data model might also be used to:

parse non Unicode MessageFormat messages into a data model (and therefore re-serialize it as Unicode MessageFormat). Note that this depends on compatibility between the two syntaxes.
re-serialize a Unicode MessageFormat message into some other format including (but not limited to) other formatting syntaxes or translation formats.

To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. An equivalent JSON Schema definition message.json is also provided, for use with message data encoded as JSON or compatible formats, such as YAML.

Note that while the data model description below is the canonical one, the JSON Schema definition is intended for interchange between systems and processors. To that end, it relaxes some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.

Important

The data model uses the field name name to denote various interface identifiers. In the Unicode MessageFormat syntax, the source for these name fields sometimes uses the production identifier. This happens when the named item, such as a function, supports namespacing.

Message Model

A SelectMessage corresponds to a syntax message that includes selectors. A message without selectors and with a single pattern is represented by a PatternMessage.

In the syntax, a PatternMessage may be represented either as a simple message or as a complex message, depending on whether it has declarations and if its pattern is allowed in a simple message.

type Message = PatternMessage | SelectMessage;

interface PatternMessage {
  type: "message";
  declarations: Declaration[];
  pattern: Pattern;
}

interface SelectMessage {
  type: "select";
  declarations: Declaration[];
  selectors: VariableRef[];
  variants: Variant[];
}

Each message declaration is represented by a Declaration, which connects the name of a variable with its expression value. The name does not include the initial $ of the variable.

The name of an InputDeclaration MUST be the same as the name in the VariableRef of its VariableExpression value.

type Declaration = InputDeclaration | LocalDeclaration;

interface InputDeclaration {
  type: "input";
  name: string;
  value: VariableExpression;
}

interface LocalDeclaration {
  type: "local";
  name: string;
  value: Expression;
}

In a SelectMessage, the keys and value of each variant are represented as an array of Variant. For the CatchallKey, a string value may be provided to retain an identifier. This is always '*' in the Unicode MessageFormat syntax, but may vary in other formats.

interface Variant {
  keys: Array<Literal | CatchallKey>;
  value: Pattern;
}

interface CatchallKey {
  type: "*";
  value?: string;
}

Pattern Model

Each Pattern contains a linear sequence of text and placeholders corresponding to potential output of a message.

Each element of the Pattern MUST either be a non-empty string, an Expression, or a Markup object. String values represent literal text. String values include all processing of the underlying text values, including escape sequence processing. Expression wraps each of the potential expression shapes. Markup wraps each of the potential markup shapes.

Implementations MUST NOT rely on the set of Expression and Markup interfaces defined in this document being exhaustive. Future versions of this specification might define additional expressions or markup.

type Pattern = Array<string | Expression | Markup>;

type Expression =
  | LiteralExpression
  | VariableExpression
  | FunctionExpression;

interface LiteralExpression {
  type: "expression";
  arg: Literal;
  function?: FunctionRef;
  attributes: Attributes;
}

interface VariableExpression {
  type: "expression";
  arg: VariableRef;
  function?: FunctionRef;
  attributes: Attributes;
}

interface FunctionExpression {
  type: "expression";
  arg?: never;
  function: FunctionRef;
  attributes: Attributes;
}

Expression Model

The Literal and VariableRef correspond to the the literal and variable syntax rules. When they are used as the body of an Expression, they represent expression values with no function.

Literal represents all literal values, both quoted literal and unquoted literal. The presence or absence of quotes is not preserved by the data model. The value of Literal is the "cooked" value (i.e. escape sequences are processed).

In a VariableRef, the name does not include the initial $ of the variable.

interface Literal {
  type: "literal";
  value: string;
}

interface VariableRef {
  type: "variable";
  name: string;
}

A FunctionRef represents a function. The name does not include the : starting sigil.

Options is a key-value mapping containing options, and is used to represent the function and markup options.

interface FunctionRef {
  type: "function";
  name: string;
  options: Options;
}

type Options = Map<string, Literal | VariableRef>;

Markup Model

A Markup object has a kind of either "open", "standalone", or "close", each corresponding to open, standalone, and close markup. The name in these does not include the starting sigils # and / or the ending sigil /. The options for markup use the same key-value mapping as FunctionRef.

interface Markup {
  type: "markup";
  kind: "open" | "standalone" | "close";
  name: string;
  options: Options;
  attributes: Attributes;
}

Attribute Model

Attributes is a key-value mapping used to represent the expression and markup attributes.

Attributes with no value are represented by true here.

type Attributes = Map<string, Literal | true>;

Model Extensions

Implementations MAY extend this data model with additional interfaces, as well as adding new fields to existing interfaces. When encountering an unfamiliar field, an implementation MUST ignore it. For example, an implementation could include a span field on all interfaces encoding the corresponding start and end positions in its source syntax.

In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.

`message.json`

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "https://github.com/unicode-org/message-format-wg/blob/main/spec/data-model/message.json",

  "oneOf": [{ "$ref": "#/$defs/message" }, { "$ref": "#/$defs/select" }],

  "$defs": {
    "literal": {
      "type": "object",
      "properties": {
        "type": { "const": "literal" },
        "value": { "type": "string" }
      },
      "required": ["type", "value"]
    },
    "variable": {
      "type": "object",
      "properties": {
        "type": { "const": "variable" },
        "name": { "type": "string" }
      },
      "required": ["type", "name"]
    },
    "literal-or-variable": {
      "oneOf": [{ "$ref": "#/$defs/literal" }, { "$ref": "#/$defs/variable" }]
    },

    "options": {
      "type": "object",
      "additionalProperties": { "$ref": "#/$defs/literal-or-variable" }
    },
    "attributes": {
      "type": "object",
      "additionalProperties": {
        "oneOf": [{ "$ref": "#/$defs/literal" }, { "const": true }]
      }
    },

    "function": {
      "type": "object",
      "properties": {
        "type": { "const": "function" },
        "name": { "type": "string" },
        "options": { "$ref": "#/$defs/options" }
      },
      "required": ["type", "name"]
    },
    "expression": {
      "type": "object",
      "properties": {
        "type": { "const": "expression" },
        "arg": { "$ref": "#/$defs/literal-or-variable" },
        "function": { "$ref": "#/$defs/function" },
        "attributes": { "$ref": "#/$defs/attributes" }
      },
      "anyOf": [
        { "required": ["type", "arg"] },
        { "required": ["type", "function"] }
      ]
    },

    "markup": {
      "type": "object",
      "properties": {
        "type": { "const": "markup" },
        "kind": { "enum": ["open", "standalone", "close"] },
        "name": { "type": "string" },
        "options": { "$ref": "#/$defs/options" },
        "attributes": { "$ref": "#/$defs/attributes" }
      },
      "required": ["type", "kind", "name"]
    },

    "pattern": {
      "type": "array",
      "items": {
        "oneOf": [
          { "type": "string" },
          { "$ref": "#/$defs/expression" },
          { "$ref": "#/$defs/markup" }
        ]
      }
    },

    "input-declaration": {
      "type": "object",
      "properties": {
        "type": { "const": "input" },
        "name": { "type": "string" },
        "value": {
          "allOf": [
            { "$ref": "#/$defs/expression" },
            {
              "properties": {
                "arg": { "$ref": "#/$defs/variable" }
              },
              "required": ["arg"]
            }
          ]
        }
      },
      "required": ["type", "name", "value"]
    },
    "local-declaration": {
      "type": "object",
      "properties": {
        "type": { "const": "local" },
        "name": { "type": "string" },
        "value": { "$ref": "#/$defs/expression" }
      },
      "required": ["type", "name", "value"]
    },
    "declarations": {
      "type": "array",
      "items": {
        "oneOf": [
          { "$ref": "#/$defs/input-declaration" },
          { "$ref": "#/$defs/local-declaration" }
        ]
      }
    },

    "variant-key": {
      "oneOf": [
        { "$ref": "#/$defs/literal" },
        {
          "type": "object",
          "properties": {
            "type": { "const": "*" },
            "value": { "type": "string" }
          },
          "required": ["type"]
        }
      ]
    },
    "message": {
      "type": "object",
      "properties": {
        "type": { "const": "message" },
        "declarations": { "$ref": "#/$defs/declarations" },
        "pattern": { "$ref": "#/$defs/pattern" }
      },
      "required": ["type", "declarations", "pattern"]
    },
    "select": {
      "type": "object",
      "properties": {
        "type": { "const": "select" },
        "declarations": { "$ref": "#/$defs/declarations" },
        "selectors": {
          "type": "array",
          "items": { "$ref": "#/$defs/variable" }
        },
        "variants": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "keys": {
                "type": "array",
                "items": { "$ref": "#/$defs/variant-key" }
              },
              "value": { "$ref": "#/$defs/pattern" }
            },
            "required": ["keys", "value"]
          }
        }
      },
      "required": ["type", "declarations", "selectors", "variants"]
    }
  }
}

Appendices

Security Considerations

Unicode MessageFormat patterns are meant to allow a message to include any string value which users might normally wish to use in their environment. Programming languages and other environments vary in what characters are permitted to appear in a valid string. In many cases, certain types of characters, such as invisible control characters, require escaping by these host formats. In other cases, strings are not permitted to contain certain characters at all. Since messages are subject to the restrictions and limitations of their host environments, their serializations and resource formats, that might be sufficient to prevent most problems. However, MessageFormat itself does not supply such a restriction.

MessageFormat messages permit nearly all Unicode code points to appear in literals, including the text portions of a pattern. This means that it can be possible for a message to contain invisible characters (such as bidirectional controls, ASCII control characters in the range U+0000 to U+001F, or characters that might be interpreted as escapes or syntax in the host format) that abnormally affect the display of the message when viewed as source code, or in resource formats or translation tools, but do not generate errors from MessageFormat parsers or processing APIs.

Bidirectional text containing right-to-left characters (such as used for Arabic or Hebrew) also poses a potential source of confusion for users. Since MessageFormat's syntax makes use of keywords and symbols that are left-to-right or consist of neutral characters (including characters subject to mirroring under the Unicode Bidirectional Algorithm), it is possible to create messages that, when displayed in source code, or in resource formats or translation tools, have a misleading appearance or are difficult to parse visually.

For more information, see [UTS#55] Unicode Source Code Handling.

MessageFormat implementations might allow end-users to install selectors, functions, or markup from third-party sources. Such functionality can be a vector for various exploits, including buffer overflow, code injection, user tracking, fingerprinting, and other types of bad behavior. Any installed code needs to be appropriately sandboxed. In addition, end-users need to be aware of the risks involved.

Non-normative Examples

Pattern Selection Examples

Selection Example 1

Presuming a minimal implementation which only supports :string function which matches keys by using string comparison, and a formatting context in which the variable reference $foo resolves to the string 'foo' and the variable reference $bar resolves to the string 'bar', pattern selection proceeds as follows for this message:

.input {$foo :string}
.input {$bar :string}
.match $foo $bar
bar bar {{All bar}}
foo foo {{All foo}}
* * {{Otherwise}}

Each selector is resolved, yielding the list res = {foo, bar}.
bestVariant is set to UNSET.
keys is set to {bar, bar}.
match is set to SelectorsMatch({foo, bar}, {bar, bar}). The result of SelectorsMatch({foo, bar}, {bar, bar}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. k is set to the string bar.
4. sel is set to a resolved value corresponding to the string foo.
5. Match(sel, 'bar') is false.
6. The result of SelectorsMatch({foo, bar}, {bar, bar}) is false. Thus, match is set to false.
keys is set to {foo, foo}.
match is set to SelectorsMatch({foo, bar}, {foo, foo}). The result of SelectorsMatch({foo, bar}, {foo, foo}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. k is set to the string foo.
4. sel is set to a resolved value corresponding to the string foo.
5. Match(sel, 'foo') is true.
6. i is set to 1.
7. k is set to the string foo.
8. sel is set to a resolved value corresponding to the string bar.
9. Match(sel, 'bar') is false.
10. The result of SelectorsMatch({foo, bar}, {foo, foo}) is false.
keys is set to * *.
The result of SelectorsMatch({foo, bar}, {*, *}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. i is set to 1.
4. i is set to 2.
5. The result of SelectorsMatch({foo, bar}, {*, *}) is true.
bestVariant is set to the variant * * {{Otherwise}}
The pattern Otherwise is selected.

Selection Example 2

Alternatively, with the same implementation and formatting context as in Example 1, pattern selection would proceed as follows for this message:

.input {$foo :string}
.input {$bar :string}
.match $foo $bar
* bar {{Any and bar}}
foo * {{Foo and any}}
foo bar {{Foo and bar}}
* * {{Otherwise}}

Each selector is resolved, yielding the list res = {foo, bar}.
bestVariant is set to UNSET.
keys is set to {*, bar}.
match is set to SelectorsMatch({foo, bar}, {*, bar}) The result of SelectorsMatch({foo, bar}, {*, bar}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. i is set to 1.
4. k is set to the string bar.
5. sel is set to a resolved value corresponding to the string bar.
6. Match(sel, 'bar') is true.
7. i is set to 2.
8. The result of SelectorsMatch({foo, bar}, {*, bar}) is true.
bestVariant is set to the variant * bar {{Any and bar}}.
keys is set to {foo, *}.
match is set to SelectorsMatch({foo, bar}, {foo, *}). The result of SelectorsMatch({foo, bar}, {foo, *}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. k is set to the string foo.
4. sel is set to a resolved value corresponding to the string foo.
5. Match(sel, 'foo') is true.
6. i is set to 1.
7. i is set to 2.
8. The result of SelectorsMatch({foo, bar}, {foo, *}) is true.
bestVariantKeys is set to {*, bar}.
SelectorsCompare({foo, bar}, {foo, *}, {*, bar}) is determined as follows:
1. result is set to false.
2. i is set to 0.
3. key1 is set to foo.
4. key2 is set to '*'
5. The result of SelectorsCompare({foo, bar}, {foo, *}, {*, bar}) is true.
bestVariant is set to foo * {{Foo and any}}.
keys is set to {foo, bar}.
match is set to SelectorsMatch({foo, bar}, {foo, bar}).
1. match is true (details elided)
bestVariantKeys is set to {foo, *}.
SelectorsCompare({foo, bar}, {foo, bar}, {foo, *}) is determined as follows:
1. result is set to false.
2. i is set to 0.
3. key1 is set to foo.
4. key2 is set to foo.
5. k1 is set to foo.
6. k2 is set to foo.
7. sel is set to a resolved value corresponding to foo.
8. i is set to 1.
9. key1 is set to bar.
10. key2 is set to *.
11. The result of SelectorsCompare({foo, bar}, {foo, bar}, {foo, *}) is true.
bestVariant is set to foo bar {{Foo and bar}}.
keys is set to * *.
match is set to true (details elided).
bestVariantKeys is set to foo bar.
SelectorsCompare({foo, bar}, {*, *}, {foo, bar}} is false (details elided).

The pattern {{Foo and bar}} is selected.

Selection Example 3

A more-complex example is the matching found in selection APIs such as ICU's PluralFormat. Suppose that this API is represented here by the function :number. This :number function can match a given numeric value to a specific number literal and also to a plural category (zero, one, two, few, many, other) according to locale rules defined in CLDR.

Given a variable reference $count whose value resolves to the number 1 and an en (English) locale, the pattern selection proceeds as follows for this message:

.input {$count :number}
.match $count
one {{Category match for {$count}}}
1   {{Exact match for {$count}}}
*   {{Other match for {$count}}}

Each selector is resolved, yielding the list {1}.
bestVariant is set to UNSET.
keys is set to {one}.
match is set to SelectorsMatch({1}, {one}). The result of SelectorsMatch({1}, {one}) is determined as follows:
1. result is set to true.
2. i is set to 0.
3. k is set to one.
4. sel is set to 1.
5. Match(sel, one) is true.
6. i is set to 1.
7. The result of SelectorsMatch({1}, {one}) is true.
bestVariant is set to one {{Category match for {$count}}}.
keys is set to 1.
match is set to SelectorsMatch({1}, {one}).
1. The details are the same as the previous case, as Match(sel, 1) is also true.
bestVariantKeys is set to {one}.
SelectorsCompare({1}, {1}, {one}) is determined as follows:
1. result is set to false.
2. i is set to 0.
3. key1 is set to 1.
4. key2 is set to one.
5. k1 is set to 1.
6. k2 is set to one.
7. sel is set to 1.
8. result is set to BetterThan(sel, 1, one), which is true.
  1. NOTE: The specification of the :number selector function states that the exact match 1 is a better match than the category match one.
9. bestVariant is set to 1 {{Exact match for {$count}}}.
keys is set to *
1. Details elided; since * is the catch-all key, BetterThan({1}, {1}, {*}) is false.
The pattern {{Exact match for {$count}}} is selected.

Acknowledgments

Special thanks to the following people for their contributions to making the Unicode MessageFormat Standard. The following people contributed to our github repo and are listed in order by contribution size:

Addison Phillips, Eemeli Aro, Romulo Cintra, Tim Chevalier, Stanisław Małolepszy, Elango Cheran, Richard Gibson, Mark Davis, Mihai Niță, Steven R. Loomis, Shane F. Carr, Matt Radbourne, Caleb Maclennan, David Filip, Christopher Dieringer, Danny Gleckler, Bruno Haible, Daniel Minor, George Rhoten, Ujjwal Sharma, Markus Scherer, Lionel Rowe, Luca Casonato, Daniel Ehrenberg, Zibi Braniecki, and Rafael Xavier de Souza.

Eemeli Aro is the current chair of the working group. Addison Phillips was chair of the working group from January 2023 to July 2025. Prior to 2023, the group was governed by a chair group, consisting of Romulo Cintra, Elango Cheran, Mihai Niță, David Filip, Nicolas Bouvrette, Stanisław Małolepszy, Rafael Xavier de Souza, Addison Phillips, and Daniel Minor. Romulo Cintra chaired the chair group.

© 2001–2026 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.

Use of all Unicode Products, including this publication, is governed by the Unicode Terms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

Unicode Technical Standard #35

Unicode Locale Data Markup Language (LDML)Part 9: MessageFormat

Summary

Status

Parts

Contents of Part 9, MessageFormat

Introduction

Conformance

Terminology and Conventions

Stability Policy

Syntax

Design Goals

Design Restrictions

Messages and their Syntax

Well-formed vs. Valid Messages

The Message

Declarations

Complex Body

Pattern

Quoted Pattern

Text

Placeholder

Matcher

Selector

Variant

Key

Expressions

Operand

Function

Options

Markup

Attributes

Other Syntax Elements

Keywords

Literals

Names and Identifiers

Escape Sequences

Whitespace

Complete ABNF

message.abnf

Formatting

Formatting Context

Resolved Values

Expression and Markup Resolution

Expression Resolution

Literal Resolution

Variable Resolution

Function Resolution

Function Handler

Markup Resolution

Option Resolution

Fallback Resolution

Pattern Selection

Operations on Resolved Values

Resolve Selectors

Compare Variants

SelectorsMatch

SelectorsCompare

NormalizeKey

Formatting of the Selected Pattern

Formatting Examples

Formatting Fallback Values

Handling Bidirectional Text

Errors

Error Handling

Syntax Errors

Data Model Errors

Variant Key Mismatch

Missing Fallback Variant

Missing Selector Annotation

Duplicate Declaration

Duplicate Option Name

Duplicate Variant

Resolution Errors

Unresolved Variable

Unknown Function

Bad Selector

Message Function Errors

Bad Operand

Bad Option

Unicode Locale Data Markup Language (LDML)
Part 9: MessageFormat

The `:string` function

`:string` Operands

`:string` Options

`:string` Resolved Value

Selection with `:string`

`:string` Formatting

The `:number` function

`:number` Operands

`:number` Options

`:number` Resolved Value

Selection with `:number`

The `:integer` function

`:integer` Operands

`:integer` Options

`:integer` Resolved Value

Selection with `:integer`

The `:offset` function

`:offset` Operands

`:offset` Options

`:offset` Resolved Value

Selection with `:offset`

The `:currency` function

`:currency` Operands

`:currency` Options

`:currency` Resolved Value

The `:percent` function

`:percent` Operands

`:percent` Options

`:percent` Resolved Value

Selection with `:percent`

The `:unit` function

`:unit` Operands

`:unit` Options

`:unit` Resolved Value

Default Value of `select` Option

The `:datetime` function

`:datetime` Operands

`:datetime` Options

`:datetime` Resolved Value

The `:date` function

`:date` Operands

`:date` Options

`:date` Resolved Value

The `:time` function

`:time` Operands

`:time` Options

`:time` Resolved Value

`u:id`

`u:dir`