Early investors in AI support services company Scale AI will be cheering this week, as the startup’s valuation broke through the billion-dollar mark following a USD 100m Series C round. The news was widely reported by various media outlets, including Bloomberg and TechCrunch. At its essence, Scale AI is a tech-driven AI support services company that relies on a large crowd of human workers to label data. Sound familiar?
This is the business Lionbridge CEO John Fennelly mentioned to Slator in early 2019, saying he could envision it becoming larger than the LSP’s traditional localization business.
Fennelly made the statement after announcing the acquisition of Toyko-based Gengo, which started out as a crowdsourced translation services provider but branched into AI support services a couple of years ago. The move paid off for Gengo as it factored heavily into Lionbridge’s decision to buy the company, according to CEO Fennelly.
Data labeling as a service has exploded, driven by rapid advances in AI. Media outlets from the New York Times to the Financial Times (paywall) have begun covering the industry, which has captured the imagination of aspiring entrepreneurs.
Gen Z Founder
Founded as recently as 2016, Scale AI’s numbers are impressive. Bloomberg reports that the company has 100 employees in San Francisco and has already built up a network of 30,000 contractors. Its 22-year-old CEO and Co-founder, Alexandr Wang, has guided Scale AI through four funding rounds, this latest one securing the company’s status as a bona fide unicorn.
With the help of its network of contractors, Scale AI provides an array of AI-destined services, broadly divided into the categories of computer vision and natural language processing (NLP).
Scale AI’s computer vision services include video content tagging and image categorization. This labeled video and image data is then used by Scale AI’s customers to improve their AI systems; for instance, to train self-driving cars and drones to “see” better by being able to understand and respond to different landscapes.
Under the umbrella of NLP, the company provides text classification, speech and voice transcription, and OCR (optical character recognition) transcription services. These, in turn, can be used in search relevance and e-commerce listing matchings, for example.
The Scale AI website says its tasks are performed by humans with “additional layers of both human, data and machine learning driven quality control checks” — meaning they have found a way to semi-automate the data-labeling process.
On the Lionbridge Radar
While Lionbridge has only recently begun to aggressively compete in the space, Australia-listed Appen’s early bet has paid off handsomely. Since IPO’ing in 2015, Appen’s share price skyrocketed to over 4,000%, pushing its market cap to nearly USD 2bn. Appen CEO Mark Brayan is scheduled to speak at SlatorCon San Francisco on September 12, 2019.
Similar to Scale AI, Appen has built an army of crowd workers who manually sort and annotate data. Taking a shortcut on what Appen said was five years’ worth of tech development, the company, in March 2019, acquired data annotation platform Figure Eight for up to USD 300m.
Scale AI has gone direct to an Appen-plus-Figure-Eight-model, combining Appen’s crowd power with Figure Eight-type automation capabilities.
Scale AI has also found itself on the radar of the world’s second largest LSP by revenue, Lionbridge. In a blog post published in February 2019, Lionbridge identified Scale AI as one of the Top 10 Crowdsourcing Companies for Tech Solutions.
One language industry leader that is not about to enter AI support services is SDL, whose CEO Adolfo Hernandez told Slator in an August 2019 interview that such a move was not part of their strategy.
Similarities and Where They End
AI support services share a number of similarities with the language industry in terms of organizational model (i.e., crowdsourcing) and processes (human-in-the-loop). But here is a fundamental conceptual difference between translation and data labeling. Translation has intrinsic value and typically serves its own specific purpose, be it marketing, compliance, or something else. By contrast, labeling data is a means to an end. It is used to train AI and is not a standalone product; hence the term “AI support service.”
Put simply, a translation is requested because a translation is required, while data labeling is requested because a better AI model is required. Granted, translation output can be, and is, used to train neural machine translation models, but this is not its primary function in the marketplace.