On December 15, 2021, Poland-headquartered language service provider Summa Linguae announced its acquisition of Belgium-based language data provider Datamundi.
Datamundi provides language data in all languages to NLP system builders, which use the data to create and enhance their machine translation (MT) engines or optimize search relevance. Its client base centers on the big-tech collective known as “FAMGA” — Facebook, Apple, Microsoft, Google, and Amazon.
Summa Linguae CEO Krzysztof Zdanowski told Slator that Datamundi was owned by Gert Van Assche (Managing Partner) and his wife prior to the sale. “We were contacted by Datamundi earlier in 2021 when they decided to start an exit process” and “our offer has been selected as the winning bid,” he said.
The transaction will see Summa Linguae pay EUR 5m (USD 5.7m), or 2.5x revenue, for Datamundi. The acquisition is being financed from Summa Linguae’s own funds — EUR 1.5m in shares and EUR 3.5m in cash.
Summa Linguae generated revenues of PLN 88.5m (USD 23.7m) in 2020, up 13.2% from 2019. Datamundi’s revenues are several times smaller, at EUR 2m (USD 2.3m). The language data provider has historically grown rapidly, at 40–50% annually, although revenues remained flat in 2021 versus 2020, the CEO told Slator.
According to Zdanowski, Datamundi will be integrated under Summa Linguae’s brand and “pro forma consolidated revenues for 2021 are in the USD 26–27m range.” Summa Linguae is aiming to deliver 15–20% growth in 2022, consistent with prior years.
Van Assche, an industry veteran, will join Summa Linguae as CTO, while Datamundi’s managers, developers, data scientists, and program managers, will also stay on.
Voice, Image, Language
Prior to Datamundi making contact with Summa Linguae, the two had no previous relationship but were known to each other because of an overlap in their client portfolio of one shared client, for whom they delivered different services.
Zdanowski explained the rationale behind the acquisition, saying “Datamundi complements our data solutions portfolio very well.” Summa Linguae was previously focused on voice and image data, while Datamundi’s sole focus is language data.
Since 2017, Summa Linguae has been shifting its strategic focus toward data solutions, which have since become an increasingly important part of the business. Zdanowski described the acquisition of Datamundi as “yet another milestone on this path” and said Datamundi’s focus on the FAMGA client base is well-aligned with the company’s strategy of servicing large, global accounts.
Having acquired Canada-based data annotation company Globalme in 2019, Summa Linguae already added to its data services capabilities, gaining data collection services, a post-processing platform, proprietary project and workflow tools, and automation technologies through the Globalme deal.
On Summa Lingae’s M&A roadmap, Zdanowski said that “given the nature of our ownership structure (Summa is majority-owned by PE), we cannot rule out yet another acquisition in 2022.” He added, “we are always on the lookout for great companies to complement our portfolio.”
According to the CEO, following the Datamundi acquisition, “over 70% of Summa Linguae’s revenue will come from non-localization work, almost entirely with US-based clients.”
Demand for Data Solutions
Datamundi’s experience in language data is broad. The company’s scientists bring expertise in data mining, filtering, automatic, and human-labeling methods, as well as data alignment, human alignment and annotation, data-defect detection, and data versioning.
Zdanowski said demand for Datamundi’s services, which include data labeling, evaluation of human translation and MT output quality, as well as automatic data alignment and tagging, is “only growing.”
Datamundi has a data-annotation platform that is used internally. It is not licensed externally but is available on a self-service basis to clients. The platform serves as a production environment to support project management and freelancer tasks.
Although the platform includes a number of automated features — such as an AI tool to detect sexually explicit content on websites, pattern detection, and an alignment tool — the platform is mainly designed for human annotators, Zdanowski said, adding that it also “uses a number of techniques to detect potentially fraudulent behaviors and QA freelancers’ output.”
Pointing out how tasks have “become very technical and very linguistic at the same time,” the Summa Linguae CEO said, “clients go well beyond the top 10–15 languages and build more NLP technology in others, including niche languages, which poses additional challenges and thus opportunities for us.”
Summa Linguae was advised by CK Legal Chabasiewicz, Kowalska i Partnerzy (Poland), and Novius (Belgium) in the transaction.