logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
At F8: Facebook Says It Now Has a ‘Powerful Tool to Think About Language Problems in a Language Agnostic Way’

2 years ago

May 6, 2019

At F8: Facebook Says It Now Has a ‘Powerful Tool to Think About Language Problems in a Language Agnostic Way’

Technology ·

by Gino Diño

On May 6, 2019

2 years ago
Technology ·

by Gino Diño

On May 6, 2019

At F8: Facebook Says It Now Has a ‘Powerful Tool to Think About Language Problems in a Language Agnostic Way’

How does a social media platform with over 1.56 billion daily active users on average cope with spam, fake accounts, hate speech, and other undesirable forms of content? Facebook thinks the answer is artificial intelligence (AI). At the tech giant’s F8 2019 developer conference from April 30–May 1, 2019, Facebook explained how it intends to combat harmful content.

While Founder Mark Zuckerberg discussed general developments and curious new directions, Chief Technology Officer Mike Schroepfer devoted a full 10 minutes of his 30-minute presentation to natural language processing (NLP) and neural machine translation (NMT).

AI vs. Bullying

CTO Schroepfer said that AI can help fight harmful content — from simple things like spam to more serious threats such as bullying and terrorist propaganda.

Advertisement

Spam and fake accounts are such a huge ongoing problem that, in the third quarter of 2018 alone, Facebook identified and removed over 1.2 billion pieces of spam and over 700 million fake accounts. On top of that, Facebook had to contend with text, audio, photo, and video content in multiple languages; which is where NLP comes into the picture.

A priority of the Facebook AI team is to leverage developments in NLP to more effectively filter out harmful content. Schroepfer said that while their current NLP technology has improved by leaps and bounds from just four years ago and can detect most types of harmful content at over 90% accuracy, it still struggles with hate speech and harassment.

The company says its current NLP technology successfully manages to act on hate speech and harassment only 51.6% and 14.9% of the time; the rest of the time, Facebook users manually reported harmful content.

Schroepfer noted how developments in machine learning are helping them improve their AI technologies, including their in-deployment NLP and NMT, which are the front line of defense against harmful speech. NLP can classify a piece of text content as harmful, while NMT ensures that NLP can classify text in multiple languages.

He explained a key development, called self-supervised machine learning, which is increasing Facebook’s AI capabilities across the board, from text to image to video.

From Supervised to Self-Supervised

Self-supervised learning essentially helps machine learning models train with less human intervention compared to supervised learning.

According to Schroepfer, supervised learning is “the basic technology that has powered most of what we’ve progressed in the last five years.”

He explained, “You take a set of data like a bunch of pictures, you take a set of people, they meticulously label those pictures [and] describe everything that’s in them. You build a big enough dataset with that set of people, I can build a machine learning classifier that can identify anything that was found in this dataset.”

However, supervised learning is fairly specialized in that instances not found in the training data will also remain unidentified in production. Furthermore, the process from training to deployment is manual and slow because of the human labeling process.

Self-supervised back translation — where synthetic training data is derived from monolingual corpora — is currently in production at Facebook

Schroepfer then introduced self-supervised learning, which he explained with the help of a textual example from Google’s BERT or Bidirectional Encoder Representations from Transformers.

“Instead of building these manually labelled training sets, what you do is you take a big pile of data — say 80 million documents — you take the sentences in those data, and then you automatically adjust them in some way. In this case (using text documents), you mask out certain words, and then you train an algorithm to try to predict what those missing words are,” Schroepfer said.

Self-supervised learning yields two major benefits.

First, since the training data was adjusted (in Schroepfer’s example, some text was masked) then the answers are already known. There is no need for human supervision through the labelling process. This likely reduces the amount of work sent to companies like Appen, which has had a phenomenal ride curating data for machine learning applications over the past few years.

Second, since the training data and answers are generated at the same time, it is fully automated; which means massive amounts of data can be used again without the manual, time-consuming human supervision process.

Self-supervised back translation — where synthetic training data is derived from monolingual corpora — is currently in production at Facebook, Schroepfer said. As an example, he cited its use against misinformation and coordinated interference in the ongoing elections in India.

Better Than Human-Supervised

Speaking at the same forum, Manohar Paluri, Director of Artificial Intelligence at Facebook, said, “AI is our best bet to keep people safe on our platforms.”

Facebook’s goal for AI, Paluri said, was to “understand content with less supervision.” In other words, remove as much human supervision as possible without affecting the function of understanding content on the Facebook platform.

Referring back to the self-supervised machine learning models Schroepfer introduced, Paluri revealed that when they applied this approach to speech recognition, they needed 150 times less of the labelled data to achieve the same results.

“This gives us a powerful tool to think about language problems in a language agnostic way.” — Manohar Paluri, Director of Artificial Intelligence, Facebook

A self-supervised machine learning algorithm trained on 80 hours of audio was actually slightly better than a supervised, manual process that took over 12,000 hours of audio to classify sounds. Paluri said this model allows Facebook to tackle hate speech with 10 times less labelled data, “while achieving similar accuracy.”

Paluri also spoke on Facebook’s latest progress in unsupervised NMT: “Previously, we needed to build a model for each language. We found a way around this. We trained a model where multiple languages live in the shared representation space; and if two sentences mean the same thing, they are mapped closer to each other in this learned representation space. This gives us a powerful tool to think about language problems in a language agnostic way.”

Paluri said Facebook trains this shared representation model in 93 languages, 30 language families, and 22 different scripts. He added that this unsupervised NMT model allows Facebook to better tackle potentially harmful content in low-resource languages where they do not have enough training data.

Wit.AI and FBT

There were also a couple of presentations relating to language technology and internationalization at F8 2019, namely Wit.AI and FBT, Facebook’s internationalization markup language and framework.

Wit.AI is essentially a free NLP service where users can develop multilingual chatbots and integrate them into their Facebook business pages. Wit.AI is built into Facebook Messenger for easy deployment and is available in 23 languages. The technology itself does not translate languages — when a customer interacts with a Wit.AI chatbot in a language other than its default, it can automatically reroute the interaction to the Wit.AI chatbot in the right language, if one is available. Otherwise, it will just attempt to answer in its default language.

As for FBT, Facebook Software Engineer John Watson explained that Facebook’s internationalization markup language and framework helps developers on Facebook internationalize their work into over 100 languages and dialects.

“Our translators produce over two million translated words every week, an amount equivalent to double the Harry Potter book series,” Watson said, adding that over 57% of Facebook users are non-English. FBT helps developers by “lowering the burden it takes to coordinate translations in their apps” via a self-documenting, code-base searchable inline markup language.

The FBT framework provides everything outside translations and storage, according to Watson, which includes text extraction and translation dictionary generators. FBT also incorporates built-in constructs for handling common multiplexing pitfalls, such as pluralization and gender.

Image Source: Facebook Developer Conference 2019

TAGS

AppenF8 2019Facebookmachine translationMark ZuckerbergMike SchroepferMTnatural language processingneural machine translationNLPNMTself-supervised machine learning
SHARE
Gino Diño

By Gino Diño

Content strategy expert and Online Editor for Slator; father, husband, gamer, writer―not necessarily in that order.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
iDISC Awarded ISO 27001 Information Security Management Certification

iDISC Awarded ISO 27001 Information Security Management Certification

by iDISC

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.