Build vs Buy: Why Evernote Built Its Own Localization Platform from Day One

Popular note-taking app Evernote is used in multiple geographies and multiple languages. As a tech unicorn valued at over USD 1bn, 70% of the app’s user base is outside the US, according to Director of Localization Igor Afanasyev.

And yet, Evernote’s localization efforts began with zero budget, internally developed open source tech, and crowdsourced translations.

“The main reason we didn’t look at paid tools and translation services was that we initially had literally zero budget for this,” said Afanasyev. “As for localization automation, back in 2008, there were no ready-to-use options at all. Nobody was even talking about localization automation. So we were forced to come up with our own automation scripts which later evolved into a pretty robust automation solution we decided to share with the world.”

“The main reason we didn’t look at paid tools and translation services was that we initially had literally zero budget for this” — Igor Afanasyev, Director of Localization, Evernote

Afanasyev started working on localization in 2008. He said the only way a one-man team can cope with the agile pace of product development was automation. So he worked for a long time on their localization platform, now called Serge, while also doing actual translation. They tested the internationalization and localization infrastructure in their pilot language: Russian.

Igor Afanasyev
Igor Afanasyev

Aside from Serge, Evernote today also uses an open source online translation frontend called Zing.

2.4 Million Words in 25 Languages for USD 160k

The Serge and Zing combination supports the localization of products and marketing materials from English into 25 languages.

“We add about 200,000 source words for translation each year,” said Afanasyev. “In 2016 we translated 2.4 million words, on which we spent about USD 0.16m. A quarter of that volume is product localization; the rest is marketing.”

He said they work with direct freelancers for most of the volume, though a couple of languages require in-house translators.

The localization team is now a duo. Afanasyev keeps an eye on localization infrastructure but also spends “most of the time on non-localization related initiatives.” Meanwhile, a fully automated process means a single localization manager is in charge of day-to-day operations.

He said they work with direct freelancers for most of the volume.

Evernote also has “a couple of languages [outside of the official 25] supported entirely by passionate volunteers,” but Afanasyev said the volume is minimal.

Community translations are more of a relic of early localization hurdles when Evernote’s localization program just started and they relied on crowdsourcing. Afanasyev said there are no plans to increase crowdsourced translation volume. Evernote’s platform, however, still supports volunteer translators, and Afanasyev said they can set people up “in a matter of minutes.”

Evernote shifted from crowdsourcing to a more traditional setup due to a couple of reasons, according to Afanasyev. First, they had to keep pace with release cycles and increased translation volume. Second, as they shifted to a freemium model, “many users would start perceiving Evernote as a commercial product (even though we had and still have a free tier). Because of that, fewer people would want to do translations for free.”

So today, all of Evernote’s 25 languages are supported by paid translations, and the localization process goes through their homegrown tech stack.

“Localization is Never a Limiting Factor” with Serge + Zing

Afanasyev compared Serge and Zing to a TMS (translation management system) and CAT (computer-assisted translation) tool, respectively, with key differences.

He said Serge has its own database used as translation memory (TM), but since all projects are processed within the same platform, all translations are naturally retained in that database. “No specific management is required,” Afanasyev explained. “You won’t find any common TMS functionality like merging TMs. We simply don’t need it.”

Zing, on the other hand, focuses on translating just one format type, .PO, which is a popular format for open source software and libraries. Serge supports multiple resource file formats, so Zing’s only job is translation. “Zing doesn’t have any segmentation/alignment options, you can’t upload or download arbitrary files,” he explained.

“You won’t find any common TMS functionality like merging TMs. We simply don’t need it”

Zing does allow translators to use Google’s machine translation (MT) to pre-translate segments, but the translator decides, Afanasyev noted. “We don’t bulk-translate using MT.”

This Serge-plus-Zing combo delivers changes almost immediately, Afanasyev said. “We localize about 50 different projects into 25+ languages; one full localization cycle takes about 10 minutes.”

“As soon as an engineer commits a new string to our source code repository, this string will appear for translation within 10 minutes. Same time is needed for a translation to be propagated back from our translation UI into resource files. In our experience, localization is never a limiting factor.”

Afanasyev explained that Serge continuously monitors both changes in source code repositories from developers and new translations from Zing. “This means that engineers only work with one source language (e.g. English) and Serge keeps all localized versions of these resources in sync, with zero effort from engineers,” he said.

“We don’t bulk-translate using MT”

Translations don’t need merging and conflicts don’t arise, he added. “With Serge, we can support 25 languages, or 100 languages, with the same (almost zero) effort as we support just one language.”

He noted, however, that engineers need to comply with internationalization guidelines, but that part of the process does not depend on the number of languages they support.

“Localization Only as a Prerequisite”

Afanasyev said he believed measuring the direct impact of localization alone is nearly impossible.

“We treat localization only as a prerequisite to reaching more customers worldwide; then it’s combined with marketing initiatives, better presence in regional social media, regional events and promotions, country-specific pricing and so on,” he said. “We think localization makes other things (gaining regional user base, improving monetization) simpler.”

Afanasyev went on to say that international expansion is not feasible without offering a “base layer” of localized product and website experience. “I’m convinced that bad localization can hurt more than having no localization at all. It’s important to be accurate and consistent with the voice and tone of the brand,” he said.

To this end, he explained they strive to find good linguists that understand their product. These translators are responsible for the quality of their work; there are no separate people for review.

“I’m convinced that bad localization can hurt more than having no localization at all”

As for quality assurance, Afanasyev said a set of automatic checks work right in the translation UI and they “do regular checks with an external vendor to get a better understanding of the translation quality.” Additionally, the regular product QA team performs tests in multiple languages, though their foremost concern is internationalization-related issues.

A “pseudo-localization approach” ensures these issues are identified early, Afanasyev said. They also leverage regular support channels for reporting and feedback. He said when the rare report does come in, “we usually fix things almost immediately.”

They strive to find good linguists that understand their product.

As for trends in language services such as neural MT, Afanasyev said he’s excited about the prospect. He said traditional use cases such as product and marketing translation will remain firmly for human translators, but MT will drive productivity up and rates down.

On the other hand, he said there are certain niches where MT fits really well: offline and real-time support (e.g. chats). “These are the niches where the target audience for a translated content is a single person or a limited group of people, and the primary goal is to understand what the other person has said and apply your own knowledge on top of MT. These use cases will thrive,” he said.