Andrew recalls his path to joining CLEAR Global, with leadership roles at content governance platform Acrolinx and TWB, and why he stepped in to drive the tech piece of the puzzle with CLEAR Tech. He also defines how CLEAR Tech works with TWB and CLEAR Insights, the research and data branch of the organization.
The CEO talks about the use of multilingual chatbots for two-way interactions in marginalized languages, and their plan to train chatbots in code-switching. He also shares how they collaborate with both Big Tech and local talent to build tech for low-resource languages in South Asia, South and Central America, Sub-Saharan Africa, and elsewhere.
The Pod rounds off with Andrew outlining the organization’s current initiatives and what the language industry can do to support the group’s humanitarian efforts, from donating to volunteering language expertise.
First up, Florian and Esther discuss the past three weeks of language industry news, with a recap of the best and worst-performing listed LSPs of 2021, which placed ZOO Digital in first place.
In M&A and Funding news, Florian covers Unbabel’s surprising acquisition of Lingo24, which brings the AI agency deeper into conventional language services. Esther talks about machine translation startup Language I/O’s latest USD 6.5m funding round and its plans to expand into Europe and the UK.
Starting 2022 off, New Zealand-based Straker Translations expanded into Europe with the acquisition of Brussels-based LSP IDEST in a cross-continental deal. This week, GLOBO strengthened its healthcare offering as it secured growth investment from VSS Capital Partners.
Florian: You have a very long background in the language technology space, tell us a bit more about your professional background and your journey in this space?
Andrew: Although I have spent the last 25 years or so doing language technology, I am really a language person. My first degree was in languages, then in translation, and then in linguistics, and I only, later on, got into language technology and AI, particularly in natural language processing. The reason I mentioned that is that I come from the angle of why this stuff is useful and what it is all about rather than being a technologist. I come at it from a desire to communicate and a passion for crossing language barriers and how we can communicate more effectively. Technology is just another fascinating piece of the puzzle as to how to do that well. I ended up having studied lots of things and ended up doing a PhD in Natural Language Processing. Then moved to Germany in the late nineties to work at the German Research Center for AI. I then led a couple of projects there. I ran the Transfer Center and out of that came a spinoff that we ran in Berlin for 20 years or so. Alongside that, I started working with Translators without Borders. I was a founding board member of the US entity. Lori Thicke, founded the French entity, and then as we expanded, we founded a US entity, and then I took over as chair when Lori stepped down, became more and more involved in that. As I stepped away from my company at the end of 2019, I suddenly got very involved with TWB. We made a transition to CLEAR Global, and I have since stepped in to help to drive the technology piece of the puzzle that we are now doing at CLEAR Global.
Esther: Many of our listeners will be very familiar with Translators without Borders. You were there for more than a decade, tell us a little bit about the pre-CLEAR Global era and some of the key milestones and challenges along the way with TWB.
Andrew: There have probably been three phases to the history of TWB and now in CLEAR Global. The whole thing started when Lori Thicke asked some of her translators if they would volunteer their time to work with humanitarian organizations in the Paris area, so it became very much if you would work for free, could we do translations for these local NGOs. It grew into a large community of volunteers who were offering their spare time as translators, between jobs or evenings and weekends to help with this content that needed to be translated. It was very much a volunteer effort and as it grew, it became harder and harder for us to meet the needs of our partners, the big UN agencies, huge international organizations, who are trying often to respond in crisis situations where they needed guaranteed turnaround times and well-run projects, so there was a need to put a layer in between professional project management and technical resources, et cetera. If you have worked in an LSP or similar, you will know that these things are more complicated than just a translator translating stuff and so there are all those layers at scales that we had to put on top of that. There was a shift from pure volunteer to hiring professional staff and hiring project managers who could help to marshal the resources and work with the community to meet the needs of our major partners. We work with all the big UN agencies, all the big international NGOs, as well as more local organizations worldwide.
That was one big shift that we made and moved into phase two, which is where we were still largely translating but it was translating as a much more professional organization, and then gradually over time, we evolved into adding way more things to our portfolio of services we were offering companies. With our partner organizations that we were working with, language services meant more than just translation. It meant software localization, but it also meant advising them on communication strategies. It meant designing posters and doing radio broadcasts and subtitling, a whole range of other activities that went beyond classic translation, localization, and technology came into the mix. We were being asked, could we provide machine translation for some of these language pairs that no one else was doing for low resource languages? Could we provide voice technology speech recognition? The whole technology thing started and at the same time people were asking us, we are going into Cote d’Ivoire, which languages do we need? Suddenly we needed to research. We needed to have a research profile where we could find out how we can quickly discover what the language needs are? How do we reach people if we want to communicate to people in languages they will understand, so out of that came a need for research, a need for technology, and the traditional language services piece that became also often very embedded in programs.
We had offices in countries where we are providing language services, as well as this global remote community so it became a bigger thing out of that, way more than translators. The translation remains a huge part of what we do, but we were growing outside of it and that was the drive behind needing to give space for those other pieces, so that is the first part of the name change. The second part of the name change is about the without borders piece, the without borders concept captures the idea very well if you are flying doctors from Europe or the global north into Africa for some crisis or the Philippines for a hurricane or a typhoon or whatever, but our whole approach is deeply local first. It is about understanding the local needs and helping local people meet those local needs. It is not about us coming in and saving the people in these places who do not know how to help themselves, they do. All we are doing is enabling some of that or helping some of that or bringing together conversations, making them happen so the without borders was a bit uncomfortable to us. In the sense, it does not capture our approach and our ethos, which is very much local first so that was what was behind the desire to shift away from that slowly. Translators without Borders remains the name for our community, which is a huge global community, which works across borders and they are largely translators so we felt that was a good name for that community, but the other pieces now have a larger mission and room to grow into it.
Esther: Can you give us a sense of the size and the breadth of the organization? In terms of the number of offices, volunteers, the mission, et cetera.
Andrew: We were ready for COVID before it hit so we have been a virtual organization since the beginning. We have no big brick-and-mortar offices anywhere. Now the community has grown itself. People have been flocking to us to volunteer their time and effort, which is hugely appreciated. We have grown the community from when I took over as chair. We were about 3000 people when we hired Aimee, our Executive Director and she and the team have grown it now to over 80,000 people and they are in 148/149 countries. We have hundreds of language pairs, over 200 languages covered and it has become a huge global thing that we are now starting to get our heads around how we can work together with this community. Up until now, it has been largely a resource that has offered their time for language services but there are many other things we would like the community to be doing too.
SlatorCon Remote June 2022 | $150
A rich online conference which brings together our research and network of industry leaders.
Florian: How big is the admin group of the organization? Can you speak a little bit about financing and funding and where you have been most successful?
Andrew: We have a team of people who nurture and work with the community. It is a very collaborative effort and largely the funding for our work comes from two or three different sources. The first major part of it is partnerships we have with the big international organizations, so with the UN agencies. We have global partnership annual agreements where we agree to offer them language services of various kinds. We also have similar agreements with big international NGOs, Save the Children, and all those kinds of organizations. That is the partnership model for language services, very similar to a client-vendor situation with an LSP. Although unlike an LSP we are very focused on the mission part of this, so we do not do their press releases, we do not do their websites. We do not do any fundraising activity for them. We are working on the content that they need that helps drive the mission. It is mission-oriented content designed to reach people who are not normally getting access to information, so translation for us has to be for a reason. If you just want to translate your website from English into French, we are not going to provide those services for you. There are plenty of commercial vendors that will do that. We are focused on delivering content where we can provide a humanitarian development aspect where it is not currently being done because it is not commercially viable, so we have those kinds of packages. The second kind of work we do is largely program funded where big international donors, institutional donors, big government agencies fund international developmental crisis response. They provide funding for us in many contexts to support large-scale crisis response, such as in Northeast Nigeria, for example, in DRC around the Ebola crisis, in Bangladesh around the Rohingya situation. In many of those situations, there will be a huge international response and we will be there to support that response with their translation, localization, and other communication needs around that. The programs will be funded as a shared service across those responses.
Esther: When you think about CLEAR Group, you have got CLEAR Tech and CLEAR Insights. Walk us through how you collaborate across the organization and can you give us an example of how some of the organizations work together across those groups?
Andrew: I can give you an idea of how it should work in practice. In theory, it never quite works like this, but the broad idea is that we are an evidence-based organization, so we do not want to do anything unless we understand why we are doing it. Is there a genuine justification for doing it, and are we doing the right things to have the most impact? Our mission is to give people access to information and have their voices heard, so it is about these two-way conversations that we are trying to set up. First thing is we need to do some research to understand what languages are spoken, what channels are available, what is the digital access? Do people have access digitally to information or is everything on the radio or how do they get access to the information? What are the levels of literacy and how do we reach the most marginalized people who get left behind in many of these other programs? The first thing is to understand what we are doing. Often people come and say, we want a chatbot and we will go, are you sure you want a chatbot? They want the new shiny toy, but there is no data to suggest that that will reach more people or have more impact, so we want to do the research to understand are we going to have an impact by doing this thing? That will often be collaborative and that would give us confidence that what we are going to do is the right way of reaching these people and engaging them in these conversations.
Second thing is to design a program that might include a technology piece. The best way to reach people might be to make some posters, print them out and stick them on the wall. It is unlikely that that simple approach would work. Typically, you want to have multiple channels, and where we do use technology, it is usually folded into other ways of other community engagement or accountability. Accountability is about giving people access, letting them tell you what they are thinking and how they are engaging with the work that you are doing. Community engagement is about getting them on board, so multiple strategies are usually needed. You want to have multiple channels, but you need to know what they are. The research will tell us what they are, and then we will design those in a combined package. That would then be the most effective way to reach people. Then we have the program work that needs to happen, and that will normally be a collaboration. Most of our projects involve partnerships where we work with big international organizations or preferably local organizations that understand the context well and can understand how all the bits and pieces need to work. The strategy is to design programs, which will often be supported by technology, which gives you that extra scale and that extra reach. Then partnerships with whoever we need to work with in order to get stuff done on the ground because as I said, we are largely a virtual organization. We do have staff in countries but typically a big part of our work is provided remotely, so they are the key three bits to it, and underlying that, there is also the community that supports all of these activities as we go along.
Florian: Where is the current focus and how do you pick those projects? Do you pick them, do they pick you? Do you have a list of 100 things that you could work on and then you could have prioritized?
Andrew: We have recently added a few areas. Traditionally our focus has been in South Asia, especially Bangladesh and Sub-Saharan Africa, but over the last year or so we have been running projects in South and Central America, we recently started a project in India. We have partners who are doing work in those countries where we have supported their work, so our reach has been in over 80 countries. We have real brick-and-mortar offices in three countries but we have reached into many, many more. I would say Central America, especially South America also, but Central America is going to continue to grow. We are doing a lot of stuff around the Venezuelan refugee, migrant situation. South Asia is a huge area, we are getting engaged there and we are expanding in Sub-Saharan Africa; Kenya, Nigeria, DRC, Democratic Republic of Congo. Working in Uganda, Rwanda, and other countries as well, so it is an expanding situation. Just starting a project for Amharic, for example, which will involve not people in Ethiopia, but will involve us working with Ethiopian staff.
Florian: When I think of Venezuela, I think Venezuelans crossing the border into Colombia, but that is Spanish-Spanish. Where is the language component?
Andrew: There are regional languages and there are non-Spanish speakers, but you are right. It is not a translation problem that we are facing, but it is an access to information problem. That is one aspect of it. First of all, what we have done is opened up a new channel of communication through conversational AI, through a chatbot that we made available in Peru, Ecuador, and Mexico. It is about providing people access to information through a new channel, so it is not translating anything. The information is all there, but this channel is making it accessible to more people. The second thing is the streams of people who are flowing from the South towards the US. They are not just Venezuelan migrants or refugees. We were able to discover that there were lots of French speakers in this flow of humanity. Turns out that there is an established flow of people from West Africa who are going through this and joining these channels and trying to get to the US so, there were a lot of questions coming in French around information that they were trying to get ahold of. On top of that came Haitian Creole because of the continued crisis in that country has led those people to be joining that as well. These things are never as simple as they first look.
Esther: How do chatbots enter the picture and where do you use them?
Andrew: Chatbots have become a thing. They were hugely hyped because they are a fascinating new area of new channels for communication with people, but they have suffered from the fate of every other technology, which means massively over-hyped and then people get disappointed with them because they make assumptions about them, which the technology cannot live up to. Chatbots are not a very well-defined thing. Chatbots can be incredibly dumb. They can be very transaction-oriented. They can help you answer your support question. But if they are used properly, then they can be used for two-way interaction, for conversation. Conversation is a big word, but what we mean by that is that we listen as well as speak, so we are taking in the effort to elicit information and hear from what the person has to say as well as giving information. Obviously, compared to a poster, this is automatically a two-way conversation and we can build interesting experiences. Second thing is they can be multilingual. Everyone in Sub-Saharan Africa is multilingual, so you cannot just put something out in one language and expect it to work for everybody. What the tech community has just discovered is this thing called code-switching, which has been around forever but it is this idea of people switching languages as they speak and this is hard for traditional tech to do. What we have been working on is building multilingual chatbots that can be two-way and can be comfortable with switching languages in the middle of a conversation, et cetera.
The last part that is key in our approach has been that we take the listening piece seriously, so we look at the conversations. We work out what the conversations are about and we were able to discover, for example, in our first chatbot, which went out in the early days of COVID in Sub-Saharan Africa, in DRC, we took the WHO’s information and put it into a chatbot, basically as an FAQ, so you could ask it things about what the WHO was saying about COVID. Now, what turned out was that young mothers, especially, were interested in, can I pass it to my child, or if I am pregnant, will the child get it? Or what is the risk to my baby? WHO had said nothing about that yet but we were able to give them feedback that this is a huge content gap that you have because we were listening to what the conversations were about and then we were able to plug that content gap and move on. That listening piece is not just a slogan, it is something that makes us better in the way we engage with people. We are learning how this works. We have now done a series of chatbot programs in South/Central America, in India. We are doing one in Kenya and Nigeria. Everyone is different, so we are still learning what the different ways of using this tool are depending on the context.
Florian: What is the interface, is it a smartphone, and if it is a smartphone or a semi smartphone, does it have to be connected to the internet or can you from time to time reconnect it?
Andrew: There are a couple of different ways. What we have done so far, largely, with smartphones, has been text messaging, WhatsApp messenger, or Facebook messenger. We can also do it via SMS gateways and so going into lower level connectivity, people have that. This will not go out on phone, but we have a hardware solution which we call Tiles, which is a Raspberry Pi with a little screen on it and you can put conversational AI onto a Raspberry Pi, which is a little credit card size computer and then you can put that out in health centers or hardware stores, or places of worship or wherever people gather who have no connectivity or perhaps no literacy or low levels of literacy and give them the opportunity to literally talk to them and have a conversation. The conversational AI as an engine can be used in lots of different modes.
Esther: Where are you spending time and money on R&D when it comes to AI and machine learning-based language technology?
Andrew: We work with existing technology wherever we can, so if language technology exists we are not going to build it again. In many cases, especially for the most marginalized people, they speak languages that are not well supported by commercial software and so we spend time understanding how we can quickly build language technology for those languages often with very little data available. Often we have to create the data, which is another context where we want to mobilize our community to help us create data more quickly for these languages. We built core technology for voice recognition, underlying machinery for chatbots, as well as machine translation, which we continue to work on. At the same time, we are also looking at not just building the engines, but building applications, so opening up channels of communication in those languages. We collaborate with amazing local networks now, especially in Sub-Saharan Africa. There is a network called Masakhane which develops natural language processing technology. We are collaborating with them on developing expertise, working with local experts, often researchers or young students who are doing their Masters or their PhDs in these areas, and helping them also understand where their technology can be used to have a social impact. It is a very collaborative environment and the research is around those key three areas in low resource languages. We also work with Big Tech, Google, Microsoft, Amazon, Facebook, and others on open-source collaborative efforts to drive more availability of technology for low-resource languages. There are many players in this.
Florian: Can you tell us a bit more about Africa specifically? What is the language technology environment there? Also from a talent perspective, is it hard to find these people or convince them to work with you? Are you competing with Big Tech? How does that work?
Andrew: There are huge networks, so when we started the whole technology piece, we were the only people doing this seriously, nobody else. We helped Google and Microsoft build their first Swahili machine translation capabilities back in the day. Now, we are a tiny part of a much, much larger network of experts, as well as a lot of people who are getting into this. Any young student who gets into machine learning and AI wants to do either natural language processing or vision. There has been an explosion of expertise and talent, and so it has become less we need to do this on our own and more we need to work out how to work with this expertise to help it have a social impact. Not all just remain as research papers.
Esther: Can you tell us a little bit more about how that might work in practice? Are they feeding information or data into you? Do you channel anything back once a project is closed? Where is the benefit for both parties?
Andrew: Clearly, it has to be a win-win. We only work with them on projects which are aligned with our mission and where we are seeing impact for us. One major initiative that we did was we ran a project together with those big organizations I just mentioned around COVID information. Most of the existing machine translation engines did not work very well with COVID information. They did not know what social distancing was or all of the other terminology and jargon that we suddenly invented when confronted with COVID. We worked in a collaborative effort with all of those organizations in a thing called TICO-19, where we built language data to train all of their engines on or to improve all of their engines with respect to COVID information so that all of their engines were getting better. All of the content was open-sourced, released under Creative Commons that is widely available, so it was not just us working for someone and giving them content that belongs to them and was locked away, so it was an effort that was for the common good and helps drive things forward. Lots of people still use Google Translate and Microsoft translation software capability in order to get information, so it is not something we want to ignore just because they are a commercial company that makes lots of profit. That is something that is not interesting to us. We are interested in achieving our mission.
Florian: Have you noticed an improvement in all those low-resource language MT engines/applications? We have noticed that there has been a huge interest from Big Tech around low resource and you have these massive multilingual models, which are supposedly translating anything to anything. Is this PR, is it research PR or do you feel there is an actual breakthrough happening here?
Andrew: I think it is over-hyped. These are serious research efforts and the results are impressive and they are directionally interesting. What it does not mean is that we have machine translation capability now for 100 languages like we have for English, French, Spanish, which are now almost at human levels. The translation quality deteriorates quite quickly once you get past those first languages. Often people will think that there is machine translation capability in all of these languages. It is not usable in the way that it is for the big languages, but I am old enough to remember what machine translation was like for English-French 20 years ago, so it is a journey. It is not binary, suddenly it works or it does not work. It is a journey that we are on and these things live by data. They live by getting used and getting people engaging with them and so it is a huge milestone to say, at least we have the capability in those languages and it will start to get used. The key thing is to make sure it gets used and does not stay in the research community because that way it will grow, it will get better and ultimately it will get there. It will get to the place where English, French, Spanish is now, but it is not happening quite as fast as some people would like you to believe.
Esther: How can our listeners get involved with your efforts? How can they support? What should they do? How can they be of use?
Andrew: Obviously, we need money to run the operation, so donations are always very welcome. Anyone with expertise or talent in languages, we have a huge range of things that we are engaged in and you can sign up as a volunteer on our website and join the community and we will be in touch to discuss how we can engage with things. There is a lot for us to do. We have grown incredibly fast over the years, mainly because there is an almost unlimited demand for what we do. This idea of giving people access to information in reaching the most marginalized people has enormous resonance from anyone with a background in language who understands how much of a barrier it can be to be stuck in a situation where you do not understand what is going on around you. Imagine that multiplied by a thousand and imagine the world going on, the internet happening and you are not able to be part of it and that is a huge motivation for people to get involved. The way in which they get involved will change over time, but join our community and become part of it.
Florian: With so much going on, what are the top two or three exciting initiatives for CLEAR Tech in 2022?
Andrew: We have some new channels. The Tiles device, which literally last year for the first time, we were able to get conversational AI, voice recognition, and conversational AI on a Raspberry Pi. This is a new development that is happening and we do not know where that will go. That is now getting down to the size where for $15 you can buy a little computer on a half-size credit card-sized chip and put tech on it, so we are interested in where that is going. Where can we put these devices? What kind of reach can we have with those kinds of technologies? Chatbots have only just started. We have scratched the surface on that, so those kinds of interactions are interesting. A big area for us is how do we mobilize our community for some of these efforts as well? How can we mobilize them to help create language technology for their own languages? With our community of Hausa speakers in the north of Nigeria, how can we work with them to help build more Hausa technology and validate the data that we are seeing and label it and help us build better language tech? On the research side as well, how can we gain more insights from our community that will help us do the right kinds of things? We used to have this team called Special Projects who were working on little tech things to try out, proof of concept type things, and now this year is going to be the year where that moves into the mainstream and that is going to be a lot of fun.