Insatiable Appetite for Language Data a Boon for Niche Providers

The artificial intelligence (AI) and machine learning boom is reflected not only in developments in neural machine translation (NMT), but also in the growing revenues of language data providers.

Slator’s NMT report 2018 noted “reinvigorated demand for high quality training corpora.” Language service providers (LSPs) have their own datasets—like Linguee’s one billion translations that power DeepL and’s currently 4.4 billion human contributions in MyMemory.

Those who do not have their own datasets can get access from providers, such as Appen or Flitto—both of which are experiencing upwards growth curves.

Appen Ups Guidance

Australian company Appen announced an update to their 2018 full year guidance last November 15, noting that “full year underlying EBITDA for FY2018… is estimated to be in the range of USD62m to USD65m.” This is up from the company’s previous guidance of USD54m to USD59m.

Listed in the Sydney Stock Exchange, Appen is a provider of annotated datasets for artificial intelligence (AI) and machine learning. The company has two divisions: a Language Resources Division that trains AI engines with audio, text, image and video datasets, and a Content Relevance Division that uses human evaluation and feedback to help clients train AI driven products (mainly search engines).

Since its 2015 IPO Appen shares have been on a tear, coinciding with what some have labelled artificial intelligence’s third wave. Slator began covering Appen as early as 2016, when the company’s shares were up 500% since the IPO. At the end of the year, Appen logged a 34% year-on-year increase in revenues to AUD111m (USD85m).

At the close of the first half of 2017, the AI and machine learning boom continued to lift Appen’s results by 39% YoY to AUD 74.1m (USD 58.8m). Full year 2017 results saw the company reach a 50.1% YoY growth to AUD 166.6m (USD 130m), not to mention breaking through a billion (Aussie) dollars in market cap.

By the first half of 2018, Appen 6-months revenues jumped 106% to AUD 152.8m (USD 112.3m) on the back of its acquisition of competitor Leapforce. Finally, Appen also recently brought in a new Chief Technology Officer: Wilson Pang, previously Chief Data Officer of China’s CTrip, one of the world’s largest online travel agencies.

For investors, meanwhile, their bet on Appen has paid off handsomely over the past twelve months with shares up nearly 120%, pushing the company’s market cap to nearly USD 1 billion.

Flitto Sells Segments

Appen is not the only one riding the AI and machine learning boom on an upwards growth curve. South Korean company Flitto began as a translation crowdsourcing platform in 2012. The company leveraged the translation data it generated—around 100 million sets of translated language data—and by 2017, 80% of its revenues came from selling this language data to companies like Microsoft, NTT DoCoMo, and Baidu.

Flitto is now branding itself as a more well-rounded and generalist—but still very much tech-oriented—LSP. Along with selling training data (text, voice, and image), the company’s main business lines include language services (crowdsourced and professional translations) and tech integrations via API.

In October 2018, Flitto was reportedly benefitting from servicing digital entertainment companies and the content they push out on platforms like Youtube.

According to a news report, Flitto’s sales of translated language data grew from 2.1 million units in 2015, 4.76 million in 2016, 6.89 million in 2017, and finally 30 million in 2018. Also by the end of this year, Flitto expects to generate 200 million sets of translated language data.

Flitto’s revenues reflect the growth: from KRW 400 million (USD 0.36m) in 2015 to KRW 1.4 billion won (USD 1.2m) in 2016, KRW 2.5 billion (USD 2.2m) in 2017, and KRW 5 billion (USD 4.4m) as of August 2018. CEO Simon Lee expects this year’s sales to reach KRW 7 billion (USD 6.2m).

Data Hunger

The demand for data is starting to hit a crunch: Korea’s largest telecom company, KT Corp. found this out for themselves when on January 2017, KT launched GiGA Genie, an AI-powered voice assistant that responds to commands in English and Korean.

The service had gained a million subscribers by July 2018 and KT intended to increase that to 1.5 million by the end of 2018. “The biggest difficulty has been acquiring sufficient data on languages,” according to Kang Da-Som, a Manager at KT Corp.

Finally, even Lionbridge, one of the world’s largest LSPs, has begun to market its existing business in supplying and curating language data in a new Machine Intelligence division.