Inside eBay’s Translation Machine

When I joined the machine translation (MT) team at eBay in early 2014, somebody told me that MT at eBay was completely different than anywhere else. And honestly, it took me very little time to understand why. So what makes our MT process so special and how do we make it work?

The Background

There are currently more than 800 million listings on eBay. Considering that each listing has around 300 words, how long do you think it would take any given number of linguists to translate these listings? Did I mention that some of the listings may only be online for a day or a week and that the inventory changes continuously?

So, don’t even pull out your calculator. The answer is simple – human translation is not viable. However, if you really want to know, we estimate it would take 1,000 translators 5 years to translate only the 60 million listings eligible for Russia! For listings, machine translation is clearly a much better fit in this scenario. Let me make clear at this point that other types of content, like UI, customer support, etc., are localized by our team of language specialists. ebay-1

How hard can it be?

Perhaps one of our biggest challenges is that listings are written by users, not by professional writers or content developers. And they do it in very different ways, using different devices, with different levels of attention. You get the idea. As you can guess, translating user-generated content is hard for any MT system, as it needs to learn how to deal with typos, poor grammar, lack of punctuation, foreign words, and so much more.

We estimate it would take 1,000 translators 5 years to translate only the 60 million listings eligible for Russia

Another huge challenge for our systems is that they must deal with different types of content. There are more than 12,000 categories on eBay, from auto-parts to antiques, from original comic art to power tools. That means that our MT engine has to deal, for example, with the right terminology for each of these categories. Training an MT system for one specific domain is relatively easy, but things get a bit more complicated when you need to cover so many different subjects.

And talking about domains, a third challenge: it is really hard to find in-domain data to train our engines. We use statistical translation models. In a nutshell, it means that translations are probabilistic results generated from huge amounts of parallel language data (aligned texts in a source and target language). It is not difficult to find parallel data about general subjects, some data is actually publicly available, but try finding a significant amount of bilingual texts about baseball gear, or action figures. Not that easy.

I recently found a great real-life example that illustrates the issues I just mentioned: “This is a sharp looking bat”. Without any context, can you blame MT for not knowing if bat is an animal or the stick used to play baseball? Can you really blame MT for thinking that looking is a verb here?

Try finding a significant amount of bilingual texts about baseball gear, or action figures. Not that easy.

And as if this was not enough, consider the number of listings eBay has, the more than 150 million active users trying to see those listings, the more than 30 countries those users are accessing the site from. All this requires a huge, solid infrastructure.

So how does MT work at eBay?

We have an MT science team and an MT language team that have been working together for years to make this happen. Since I am part of the latter, I am going to focus mostly on the language side of things. As I mentioned before, not everything is translated using MT; we focus on:

  • Search queries: this is what users type in the search box to find what they want to buy. This is the only case in which the text is translated from the user’s language into English, to match the query with the inventory, which is predominantly in English. Queries are really hard to translate because, among other things, there is no context and also many words are polysemous, i.e., they have more than one meaning. For examples, when users search for “glasses”, do they want to find sunglasses or beer glasses?
  • Item titles and descriptions: each listing has a title, so you can quickly find out what the item is, and a description, so you can see information like the condition of an item, measurements, specifications, etc. If you are familiar with titles as used in any e-commerce site, you may have noticed that they are hardly ever sentences or follow proper grammar rules. They are usually a collection of words, one after the other, sometimes heavily loaded with synonyms. A real example: “1 pcs good quality hard pencil bait fising lures tackle bait crankbait 3d eyes”. An MT system used to translate titles needs to be specifically trained with this type of content to produce translations with the expected “format”.

Queries are really hard to translate because, among other things, there is no context

  • Product reviews: these are the comments users leave for products they buy and are often packed with slang and idiomatic expressions (“the best bang for your buck”, “this is an amazing product hands down”) and slang; sometimes the context is non-existent (for example, “this is my favorite axe”), or the users try to be funny and use jokes, play on words, puns, etc. All these are extremely hard for MT to get right.
  • Product descriptions: these are professionally written texts users can read to learn more about products they want to buy
  • Member-to-member communication (M2MT): sometimes buyers and sellers don’t speak the same language, so we use MT to allow them to communicate with each other.

Training, Testing, and Training Again

So, how does a machine translation system learn to deal with all these complexities? You train it. Our science team generates translations using our MT engine, those translations are post-edited and then reviewed in-house by the machine translation language specialists. If the quality of the post-edited text is good, then it is fed back into the engine. The system learns from these improved translations and will use that data to generate better translations in the future.

But the quality of the MT output needs to be tested to measure the impact of the training process. How do we do that? Human translation. The content is translated the “traditional way”, reviewed in-house, and those translations are compared to the ones produced by the engine. Differences and similarities between MT and HT are used to generate a score, which can be used as an indication of the quality of the MT output.


One Last Thought

MT is critical in expanding eBay’s global presence. As you can see, this is a process with (many) humans in the loop, from scientists to linguists, and our goal is to build systems that can translate well despite all these challenges. Trying to effectively connect buyers and sellers all over the world is not easy, but definitely exciting.