On September 27, 2023, the Geneva, Switzerland-based World Intellectual Property Organization (WIPO) announced in a detailed report the development of an in-house solution for creating fully automated conference meeting transcripts and machine translations (MT) into various languages.
WIPO, a specialized agency of the United Nations, hosts numerous meetings each year that require simultaneous interpretation into the six official UN languages: Arabic, Chinese, English, French, Russian, and Spanish. According to WIPO, producing verbatim reports and translations for these meetings used to be a laborious and time-consuming process, taking anywhere from one to six months. “These were very high quality reports but were costly and time-intensive to produce,” said the authors of the report.
WIPO’s new solution, which combines its speech-to-text (S2T) system and MT system, has now streamlined this process. Within a few hours of a meeting’s conclusion, machine-generated transcripts and translations are readily available. The authors highlighted, “with the help of our solution, […] we provide machine-generated transcripts and their corresponding machine translations in a couple of hours after the conclusion of a meeting.”
WIPO was an early adopter of custom neural machine translation, which played a significant role and was a “major driver” that fueled the exploration of S2T technology to meet WIPO’s specific needs.
The authors also explained that this approach of cascading S2T with MT was chosen because it outperformed the end-to-end speech-to-translated-text approach. The cascading approach was chosen because WIPO had access to highly performant MT models customized for the meetings domain. These models were trained using WIPO’s own meeting data and related documents. Furthermore, WIPO lacked sufficient training data for the end-to-end speech translation approach.
Data scarcity presented a challenge not only for the end-to-end speech translation approach but also for the speech-to-text development. To address this the team collaborated with other international organizations to leverage historical meetings data, contracted external providers to transcribe WIPO in-domain audio, and acquired out-of-domain proprietary corpora. This ensured that WIPO’s S2T and MT components were well-tailored to the language used in international organization meetings.
The authors evaluated the system using automatic metrics such as Word Error Rate (WER) and BLEU for S2T and MT, as well as business-oriented metrics like fitness for purpose, turnaround time, user experience, and cost savings.
Despite occasional errors in the produced texts, WIPO reports that users have overwhelmingly embraced the system due to its rapid availability, convenience, and multilingual support.
User feedback has indicated that the benefits, including reduced turnaround time and cost savings, outweigh the drawbacks. This adoption of technology also aligns with WIPO’s policy for increased digitization and has improved working methodologies.
The system’s deployment aligns with WIPO’s strong data security and privacy policies, as it is installed on-premises to handle confidential meetings. The authors emphasized, “our solution, based on open-source tools, is installed on-premises, allowing us to meet our strong data security and privacy policies, and is even fit for our confidential meetings.”
After a year-long pilot phase for essential meetings, WIPO’s General Assemblies and many other international organizations — such as the United Nations office of Geneva, the International Labour Organization, the World Trade Organization, and the European Union Court of Justice — have adopted this system, replacing manually prepared verbatim reports.
WIPO has also experimented with OpenAI’s Whisper models, focusing initially on S2T and planning to explore the translation feature in the future. Customization of pre-trained models using in-domain data has been a part of their strategy to improve performance, particularly in recognizing domain-specific terminology.
Looking ahead, WIPO aims to continue improving transcript quality, expanding language support, and exploring different pathways to generate transcript languages.
Authors: Akshat Dewan, Michal Ziemski, Henri Meylan, Lorenzo Concina, Bruno Pouliquen