Netflix Credits Shakespeare With Better Localization Testing

Netflix Uses New Approach for A/B Testing Content Management and Localization Called Shakespeare

As one of the biggest purveyors of entertainment content on a global scale, Netflix has a vested interest in making sure the way it serves up movies and episodic content is optimized for global audiences.

Most obviously, this involves providing subtitled and dubbed content in multiple languages so audiences around the world can access series and movies in their preferred language on Netflix.

The streaming giant does more in the way of localization than subtitling and dubbing, however; Netflix’s website and apps — usually the first stop for users before they even begin to stream content — are also localized for the markets they serve.

Given the scale at which Netflix operates, changing a word here or there on the website could lead to a significant number of extra sign-ups. Adding another 0.01% sign-ups to their current subscriber base of 200 million people would generate additional subscription revenues in the order of millions. That’s a lot of money on the table.

This is what drives Netflix’s careful handling of user interface (UI) copy on their web and app platforms. But how do they determine what wording works best to attract audiences in the many markets and languages they serve?

A recent post on the Netflix TechBlog on Medium unveiled details of a cross-functional project the company has been working on to improve the way it tests the success of copy in various languages.

The post, entitled “Words Matter: Testing Copy With Shakespeare” after the name of the tool, was published on March 9, 2021. The authors are listed as Netflix employees Tim Brandall (Internationalization Team Manager), Internationalization Engineers Shawn Xu and Pu Chen, and Jen Schaefer (Head of UX Content Design).

Brandall celebrated the team’s achievement on LinkedIn, saying that it was “a great example of how the Netflix Internationalization team is tackling challenges outside the realm of a typical Internationalization team” — and thanking Engineers Chen and XU for “taking something that was originally just a rough diagram written on a piece of paper, all the way to a fully-fledged product that has become the cornerstone of copy testing at Netflix!”

Hacky Even in English

The authors described how A/B copy testing was run in the pre-Shakespearean era, if you will; identifying various pain points including engineering effort, set-up and configuration requirements, and the length of time needed to deploy tests.

Moreover, they said, there was “no uniform way to test across platforms” and “there was no easy way to test localized or transcreated copy.”

The authors said that, in general, “the way we handled copy testing was hacky even in English — but when you added in the dozens of languages Netflix is translated into, it became clear that not only was our process inefficient, it couldn’t handle our increasing need for copy testing for various languages and cultures.”

“The way we handled copy testing was hacky even in English — but when you added in the dozens of languages Netflix is translated into, it became clear that not only was our process inefficient, it couldn’t handle our increasing need for copy testing for various languages and cultures” — Tim Brandall, Shawn Xu, Pu Chen, and Jen Schaefer, Netflix

As a result, they pointed out, “copy tests were few and far between, and we were missing out on a lot of language-focused wins — not just in English, but globally.” 

To remove these pain points, Netflix assembled a crack team of engineers, language managers, and content designers, as well as project managers and data scientists. What followed involved delving into the product code and looking at how strings containing copy variants could be fetched for testing.

Major Benefit for Localized Copy

The web-based Shakespeare system is hooked up to a message repo (queue), which in turn connects to Netflix’s translation tool. One major benefit of Shakespeare is that “localized or transcreated copy can be tested independent from the English source,” Netflix said.

Users visiting the website or using one of Netflix’s apps see one of the copy variants being tested. User responses to cues such as “sign up now” or “see prices” are tracked back to determine which call to action performs best. While before, the less successful variants would have to be removed manually, Shakespeare overrides all but the winning copy automatically.

The blogpost said Shakespeare has reduced the time Netflix requires to deploy A/B copy tests from days or weeks to a matter of minutes, and has removed the need for extensive engineering support. Now, language managers — or whoever is running the test — manage the testing process by entering a few details into Shakespeare, including the different copy variants they want to test and a link to where the copy is stored.

Other key benefits include the fact that copy tests are “easier to set up, configure, and clean up,” that the teams now “have the availability to make real-time copy updates,” and can “consistently test across platforms” (i.e., web, TV, Android and iOS), Netflix said.

Shakespeare has been rolled out for use across Web, Android, and iOS settings, and the integration with Netflix’s TV app is underway.

According to the same post, Shakespeare has upwards of 50 copy tests under its belt, with more to come. Netflix also plans to explore what additional insights Shakespeare can provide; for example, relating to tone of voice, style, clarity, and context awareness.

One area of interest for the teams is global relevance; that is, accounting for the fact that “sometimes a language hypothesis created in Silicon Valley or L.A. doesn’t resonate in other areas of the world and feels more natural when it’s customized to the market.”

Head over to the original blog post for a more technical explanation of Shakespeare and for additional information on how it was developed.