Unicode Group Releasing Message Format 2 Hopes for ‘Broad Adoption’ in Software Localization

MessageFormat 2.0

The Message Format Working Group (MFWG), a subgroup of the Unicode Common Locale Data Repository (CLDR) that develops the industry standard for software localization, has announced its initial release of Message Format 2.

The Message Format framework has been used to write and localize a significant number of software applications worldwide, and accounts for complex features of different languages – such as syntax, inflections, and gender – to output localized text that is grammatically correct in the target languages.

The release of Message Format 2 (MF2), which is the first update to the framework in over two decades, allows software developers to accommodate localization, internationalization, and globalization concepts in modern software coding.

This is because Message Format 2 organizes the translatable portions of messages into logical segments without the need for nesting or excessive amounts of code.

Addison Phillips, Chair of the Unicode Message Format Working Group, told Slator: “We designed MF2 to be portable, between resource formats and run time formats, with the hope there will be broad adoption.”

“What’s interesting about Message Format 2 is that we worked hard to make syntax embeddable. There are no Unicode escapes, no character escapes, or ASCII quote marks. Our syntax avoids these”, he added.

As for the benefits to linguists specialized in software localization, Phillips added that the translator “doesn’t have to change locale-specific fields as they are formatted automatically.”

The Role of Large Language Models in Software Localization

On the subject of large language models (LLMs) and their impact on software localization, Phillips told Slator that there are “multiple dimensions” to this in software localization.

While there are reasonable solutions to many localization problems in modern interfaces that use extensive user-generated content or runtime-generated text, Phillips said that in general, LLMs are “too large to run in many devices or applications” and make quality management “difficult”.

In addition, he mentioned that not every LLM is customized to every language and locale.

On the potential use cases of LLMs in software localization, Phillips told Slator “there’s going to be a role for that in message management and creation. For now, using LLMs to mutate gender, for example, are too large to be reasonably handled.”

Connect the Dots

Phillips hopes that MF2 will “connect the dots a little better”. The framework’s syntax was designed with extension mechanisms built-in, and includes evolving function registries, among other developments.

While MF2 is currently in tech preview, software developers can now deploy MF2 and submit initial feedback, before the approved release in fall 2024.