The fast-growing market of Data for Artificial Intelligence (AI) was the topic of discussion at SlatorCon Remote in May 2021. Experts in data-for-AI came together to discuss what makes the industry so dynamic and what it takes to succeed as a data provider.
The panel set the scene for the uninitiated, explaining how language datasets are used to build a wide range of AI applications. Speech and text data are collected, annotated by humans, then fed through algorithms to produce AI models. Major buyers include big tech, small AI startups, and companies from general industry that are launching their own AI initiatives.
For Michel Lopez, CEO of e2f, a language service provider (LSP) headquartered in Silicon Valley, moving into AI data was a client-driven decision. He described how, a few years back, “some of our tech clients were developing voice assistants and asked us if we could collect and annotate some datasets. So we said, why not?”
Meanwhile, Kåre Lindahl told the panel that AI training data was a natural progression for the tech-focused LSP Venga Global, which he leads as CEO. “We focused on technology from the get-go. So when we were asked if we could do data, we already had several people in the company with that background. The combination of high-quality control learned from localization projects, with a tech-savvy approach made a big difference,” Lindahl said.
President of Oxford Languages at Oxford University Press, Casper Grathwohl, explained how the publishing company had developed a dataset as an offshoot of its work in creating dictionaries and thesauri. The dataset later proved useful in training AI.
“We started thinking about this as highly structured language data that could help refine AI engines. We started with English, and then moved into multilingual data,” Grathwohl told the SlatorCon audience.
Diverse, Dynamic, Competitive
The panel went on to describe the different types of data projects and their end uses. Each AI data project is unique, according to e2f’s Lopez: “Helping a client develop a particular dataset is a one-time shot. After you’ve helped them solve one problem, they’ll come back to you with a different problem they want to solve.”
One of the many use cases for AI data is to power smart writing assistants, such as Grammarly, or in the gamification of word-based tasks: applications considered part of Education Technology (or edtech). Grathwohl said, “Edtech uses AI to create a conversational experience, one that’s adaptive and personalized to a particular student. Our data can underpin a lot of that.”
Competition between AI developers is intense. As a result, data providers often have no visibility on the data’s final use case. As Grathwohl explained, “AI companies can be quite secretive and protective. With big tech, half the time they don’t tell us what they’re going to use the data for.”
Moreover, project specifications may change mid-project. “You may think you understand what the client wants, provide a quote, and start working. Three days later, it changes,” Lopez said. Such changes may originate from the client’s data scientists or from communication glitches in the supply chain.
On top of this, broader market forces — such as competition between big tech, changes in client revenue, and data privacy concerns — can cause clients to put projects on hold or quickly launch new ones. All these factors create an industry that is dynamic, fast-moving and, in Grathwohl’s words, “choppy.”
Staying Strong in the Game
The SlatorCon Remote panelists next turned to the question of what it takes to succeed as a data provider. Venga Global’s Lindahl advised that a tech-first approach is essential. “You can’t succeed unless you’re willing and have the resources to work on technology solutions. You need automation to be able to make any money,” he said.
Mastering recruitment is also key, and quite distinct from the relationship-led vendor management practices typical of LSPs. “In translation vendor management, you build a relationship and work with the vendor for many years,” Lindahl explained, adding that, by contrast, “for AI, it’s about recruiting people quickly and finding out how to pay them quickly so they can do the work.”
Lopez agreed, saying “We need to be very quick and creative in gathering a crowd.”
The panelists then tackled the factors that help them stay competitive in a landscape that includes large, full service providers such as Appen and Telus (which acquired Lionbridge’s AI division in 2020), as well as crowdforce marketplaces like Amazon Mechanical Turk, and startups such as Scale AI.
One key differentiator is speed, according to Lindahl. “A key selling point is that we’re quicker than some of the larger suppliers in this space. It’s easier for us to pull people together,” he said.
The panel also identified flexibility and problem-solving abilities as assets. Grathwohl said, “Sometimes a big tech company will come to you and want something very significant. It may be worth millions of dollars but it’s a logistical nightmare. We say, ‘yes, we can do it’ — and then we scramble.”
Lopez echoed this view, telling the online audience, “You need problem solvers; quick, technology-aware, and mobile project managers.”
Grathwohl concluded, “The industry is evolving so quickly and the needs are changing from project to project so fast. That’s one of the reasons it’s exciting to be in this space.”