logo image
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs
MENU
  • News
    • People Moves
    • Deal Wins
    • Demand Drivers
    • M&A and Funding
    • Financial Results
    • Technology
    • Academia
    • Industry News
    • Features
    • Machine Translation
    • — Divider —
    • Slator Pro
    • — Divider —
    • Press Releases
    • Sponsored Content
  • Data & Research
    • Research Reports & Pro Guides
    • Language Industry Investor Map
    • Real-Time Charts of Listed LSPs
    • Language Service Provider Index
  • Podcasts & Videos
  • Events
    • Design Thinking – February 2021
    • — Divider —
    • SlatorCon Coverage
    • Other Events
  • Directory
  • RFP Center
  • Jobs

Advertise on Slator! Download the 2021 Online Media Kit Now

  • Slator Market Intelligence
  • Slator Advertising Services
  • Slator Advisory
  • Login
Search
Generic filters
Exact matches only
Advertisement
The Masakhane Project Puts Africa on the Machine Translation Map

12 months ago

January 28, 2020

The Masakhane Project Puts Africa on the Machine Translation Map

Machine Translation ·

by Seyma Albarino

On January 28, 2020

12 months ago
Machine Translation ·

by Seyma Albarino

On January 28, 2020

The Masakhane Project Puts Africa on the Machine Translation Map

Machine translation (MT) has been shown to be stronger and improving more quickly for languages where there is lots of reference data. One area where such data has historically been lacking is Africa, whose 2,000-plus languages are underrepresented in the world of natural language processing (NLP), according to Masakhane project co-founders and chief investigators Laura Martinus and Jade Abbott.

The two South Africans have described a self-defeating cycle in which speakers believe that their languages will not be accepted as prime modes of communication. This, in turn, leads to a lack of funding for translation projects and a dearth of language resources; those that do exist are often siloed in country-specific institutions.

Inspired by the Deep Learning Indaba theme for 2018, Martinus and Abbott started the Masakhane project (whose name means “we build together” in isiZulu) to connect NLP professionals in different countries, with the ultimate goal of translating the Internet “and its content into our languages, and vice versa.”

Advertisement

Now, over 60 participants in 15 countries are involved in a continent-wide effort to build MT models for African languages. (The Masakhane project also collaborates with RAIL Lab at the University of Witwatersrand and Translators Without Borders.)

Slator 2020 Language Industry Market Report

Data and Research, Slator reports
55 pages. Total market size, biz dev and sales insights, TMS & MT review, buyer segment analysis, M&A, Covid impact & outlook.
$480 BUY NOW

The plan: Gather language data and develop MT models, which will then be analyzed and fine-tuned.

Martinus and Abbott have already trained models to translate English into five of South Africa’s 11 official languages (Afrikaans, isiZulu, Northern Sotho, Setswana, Xitsonga) using Convolutional Sequence-to-Sequence (ConvS2S) and Transformer architectures. They presented their findings at the 2019 Annual Meeting of the Association for Computational Linguistics (ACL). 

Since being profiled by VentureBeat in November 2019, the group has continued its work with a range of languages, and made a point of making any gains publicly available to combat the “low discoverability” of relevant resources, a major challenge for many African languages.

Slator 2019 Language Industry Market Report

Slator 2019 Language Industry Market Report

Data and Research
33 pages. Total market size, key verticals, services & tech landscape, market share by segment, M&A, and outlook.
$385 BUY NOW

Chief Investigator Kathleen Siminyu told Slator that the project now has 16 languages with benchmarks, which can be seen on the Masakhane project’s GitHub page.

“We are currently getting a lot of submissions, so this number is increasing often,” Martinus told Slator. “There are a few people I know who want to submit benchmarks soon, but have yet to finish up.” 

On a less field-specific platform, Abbott tweeted on January 22, 2020 that contributor Julia Kreutzer, a PhD student in Germany, had “used JoeyNMT to train an English-to-Afrikaans model and deploy it as a slack bot on our @MasakhaneMt slack account (Afrikaans chosen because as a German speaker, she could sorta figure out that it was sorta working).”

✨ Highlight of Today ✨@KreutzerJulia used JoeyNMT to train an English-to-Afrikaans model and deploy it as a slack bot on our @MasakhaneMt slack account 😍 ✨🎇🌍

(Afrikaans chosen because as a German speaker, she could sorta figure out that it was sorta working)#masakhane pic.twitter.com/lqDACE8yRV

— Jade Abbott (@alienelf) January 22, 2020

Kreutzer has described JoeyNMT (also available on GitHub) as a “minimalist neural machine translation toolkit […] specifically designed for novices.”

The Masakhane project plans to present at the AfricaNLP workshop set for April 2020 in Ethiopia. “At the moment, it looks like we will submit six papers, maybe more,” Siminyu said.

Martinus added that many Masakhane participants are also currently writing papers for the first workshop on Resources for African Indigenous Languages (RAIL) in May 2020, to be hosted by the South African Centre for Digital Language Resources (SADiLaR).

TAGS

ACLAfricaAssociation for Computational LinguisticsGitHubJade AbbottJoeyNMTJulia KreutzerKathleen SiminyuLaura Martinusmachine translationMasakhaneMasakhane projectMTnatural language processingneural machine translationNLPNMTRAIL LabTranslators Without BordersTWB
SHARE
Seyma Albarino

By Seyma Albarino

Staff Writer at Slator. Linguist, music blogger and reader of all things dystopian. Based in Chicago after adventures on three continents.

Advertisement

SUBSCRIBE TO THE SLATOR WEEKLY

Language Industry Intelligence
In Your Inbox. Every Friday

SUBSCRIBE

SlatorSweepSlatorPro
ResearchRFP CENTER

PUBLISH

PRESS RELEASEDIRECTORY LISTING
JOB ADEVENT LISTING

Bespoke advisory including speaking, briefings and M&A

SLATOR ADVISORY
Advertisement

Featured Reports

See all
Slator 2020 Language Industry M&A and Funding Report

Slator 2020 Language Industry M&A and Funding Report

by Slator

Slator 2021 Data-for-AI Market Report

Slator 2021 Data-for-AI Market Report

by Slator

Slator 2020 Medtech Translation and Localization Report

Slator 2020 Medtech Translation and Localization Report

by Slator

Pro Guide: Sales and Marketing for Language Service Providers

Pro Guide: Sales and Marketing for Language Service Providers

by Slator

Press Releases

See all
XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

XTRF Launches a Bi-Monthly Free Networking Event for Localization Professionals

by XTRF

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

150 Million Words Translated: the German EU Council Presidency Translator Sets New Records

by Tilde

BeLazy Announces Full Automation for Plunet

BeLazy Announces Full Automation for Plunet

by BeLazy

Upcoming Events

See All
  1. Memsource MT Post-Editing Pricing Models Webinar

    Pricing Models for MT Post-Editing Workshop

    by Memsource

    · February 3

    Hear a panel of innovative localization professionals share different approaches for MT post-editing pricing.

    More info FREE

Featured Companies

See all
Text United

Text United

Memsource

Memsource

Wordbank

Wordbank

Protranslating

Protranslating

Seprotec

Seprotec

Versacom

Versacom

SDL

SDL

Smartling

Smartling

Lingotek

Lingotek

XTM International

XTM International

Smartcat

Smartcat

Translators without Borders

Translators without Borders

STAR Group

STAR Group

memoQ Translation Technologies

memoQ Translation Technologies

Advertisement

Popular articles

Why Netflix Shut Down Its Translation Portal Hermes

Why Netflix Shut Down Its Translation Portal Hermes

by Esther Bond

The Slator 2020 Language Service Provider Index

The Slator 2020 Language Service Provider Index

by Slator

Top Language Industry Quotes of 2020

Top Language Industry Quotes of 2020

by Monica Jamieson

SlatorPod: The Weekly Language Industry Podcast

connect with us

footer logo

Slator makes business sense of the language services and technology market.

Our Company

  • Support
  • About us
  • Terms & Conditions
  • Privacy Policy

Subscribe to the Slator Weekly

Language Industry Intelligence
In Your Inbox. Every Friday

© 2021 Slator. All rights reserved.

Sign up to the Slator Weekly

Join over 13,000 subscribers and get the latest language industry intelligence every Friday

Your information will never be shared with third parties. No Spam.