User-generated content (UGC) means big business for machine translation providers and language service providers trying to capture non-critical content where only rock-bottom rates will do.
Knowing how a reviewer’s language influences a user rating for a tourist attraction and determining the relevance of that rating to speakers of other languages should have practical benefits for both buyers and vendors of UGC translations.
Now comes an interesting study by Scott Hale, Data Scientist at the Oxford Internet Institute, on how language affects ratings on TripAdvisor.co.uk. Slator reached out to Hale to learn more about the study.
“For many large user-generated content platforms, less than half the content is in English and many users do not speak English as a native language,” Hale points out in his study. He says future user growth and their contributed content will predominantly be in a language other than English given that “Internet-penetration rates are already high in most English-speaking countries.”
Hale says that, while the earliest TripAdvisor reviews from 2001 were all in English, from 2006 on “non-English reviews grew quickly,” the top eight languages being French, Spanish, Danish, Italian, Japanese, Portuguese, and Russian (Figure 2).
Taking a look at all 516,641 reviews on TripAdvisor.co.uk of some 3,040 London tourist attractions in July 2015 (hotels and restaurants not included), he notes, “25% of all reviews of London attractions were not in English. Just over half of all attractions had at least one non-English review, and 175 attractions (6%) had more non-English than English-language reviews.”
Hale’s study also shows that speakers of different languages and, by extension, cultures, have an impact on ratings (number of stars), with some language pairs showing more similar star ratings than other vastly disparate ones.
For instance, German, Norwegian, and French star ratings “are strongly correlated” with those in other languages. “In contrast, ratings in languages such as Portuguese and Japanese are less strongly correlated,” the study points out (Figure 1).
Asked to interpret these findings, Hale tells Slator there are many reasons why star ratings vary across languages. “If a museum, for example, has an audio guide available only in the five big European languages, then visitors speaking one of those languages probably get more information and, ultimately, have a different experience” compared to those who cannot speak one of the audio-guide languages.
Hale admits, “I do not know in particular why Portuguese and Japanese are less correlated. One possible reason that Japanese ratings are less correlated with other languages is comprehension and comfort in using foreign languages.”
Another study Hale did on bilingual editing of Wikipedia showed the Japanese less likely to engage with foreign-language content compared to speakers of other languages.
Hales tells us, “Language can also capture elements of culture, of course, and it may be that people coming from different countries or cultures evaluate tourist attractions with different criteria.”
“Users may derive some utility from the star ratings of reviews in languages they do not read, possibly more from rough machine translations of the review text”—Scott Hale
Stars and Outliers
Regarding outliers (Figure 1) among language pairs with low star-rating correlations such as Chinese-Danish (-0.11) or Polish-English (0.12) or a high correlation like French-Spanish, Hale explains, “There are relatively few tourist attractions that are reviewed in Chinese and Polish; so, the correlations between these languages and others may simply be noise.”
As for French-Spanish, he says, “The data suggest French and Spanish tourists, in general, review attractions similarly.” These tourists, he says, may also “have similar a priori criteria” for evaluating an attraction, or information could have been available in French and Spanish at the tourist spot.
About TripAdvisor’s so-called star ratings, Hale says this “form of non-personalized recommendation,” based on this study at least, “are fairly good at capturing the most common opinion.”
However, anomalies may occur in star ratings as well. Hale shares a survey with Slator of reviews in French and English for the Cirque du Soleil show in London. While French and English speakers generally agree on tourist attraction ratings, he says, “this is one example where they really disagree.” (Figure 3)
He qualifies though that “there are far fewer reviews in French than English.” As Hale points out in his TripAdvisor study, “The average star rating (1–5 stars) is sensitive to the number of reviews. With a small number of reviews, a single rating can be over represented.”
Moreover, “Users may derive some utility from the star ratings of reviews in languages they do not read and possibly more from rough machine translations of the review text.” Machine translations for TripAdvisor are partially provided by Bangkok-based Asia Online as Slator has learned from sources.
Hale recommends that site designers get a handle on user-generated content to know “what to do when there are few or no reviews in a person’s preferred languages.” One way of doing that is “calculating the correlations between languages and countries,” and leaning exactly how to deploy MT so star ratings will not be misleading.
Hale asks, “If there are few reviews in Finnish for a Finnish user, would it be better to show Swedish or English reviews as well, or no other reviews at all?”
Hale hopes to answer that question soon. He discloses, “I am currently working on behavioral experiments to understand how people respond to foreign-language reviews.”