As of Apple’s annual online conference, WWDC 2023, BERT is now making its way into the (Apple) developer mainstream. Bidirectional Encoder Representations from Transformers, in short, BERT, was open-sourced by Google for NLP pre-training in late 2018.
Fast forward to Apple’s June 7, 2023, WWDC session, where the iPhone maker featured BERT as the key to creating new multilingual models in its Create ML app / framework. Create ML is a tool for training models for a variety of machine learning tasks in areas like image, sound, or activity but also tasks involving text such as text classification and word tagging.
Apple reminded developers that transformer-based contextual embeddings are trained on large amounts of text using a masked model style of training, in which the model is prompted to suggest a missing word in a sentence. The multi-headed self-attention mechanism behind Transformers allows models to train on large amounts of textual data — including multilingual data.
“It makes it possible to support many languages immediately and even multiple languages at once,” NLP Engineer Doug Davidson explained. “But even more than that, because of similarities between languages, there’s some synergy such that data for one language helps with others.”
Using BERT embeddings and three separate models, one each for a group of languages with related writing systems, Davidson continued, Create ML can now support 27 different languages.
One model supports 20 Latin-script languages; a second supports four languages written using the Cyrillic alphabet; and a third model supports Chinese, Japanese, and Korean.
Davidson walked participants through the process of training a multilingual model in the Create ML app. Users create a new project, select training data, and, under the algorithm section, choose a new option, the BERT embeddings. They then select one of the three script-based models and set the language selection to “Automatic.”
“The most time-consuming part of the training is applying these powerful embeddings to the text,” Davidson said. “Then the model trains fairly quickly to a high degree of accuracy.”
Davidson demonstrated the process using a model that classified text messages in English, Spanish, German, and Italian as personal, business-related, or commercial.
“As an example of the synergies that are possible, this model hasn’t been trained on French, but it can still classify some French text as well,” he pointed out, adding that the best practice is for developers to use training data for each language they plan to offer.
According to a June 6, 2023 session on Create ML, the multilingual BERT embedding model can also boost the accuracy of monolingual text classifiers.
In short — Apple is handing developers a range of new tools to embed NLP into their apps.