The co-founder of Cohere, Nick Frosst, joins SlatorPod to talk about rapidly scaling one of the most exciting natural language processing (NLP) startups. Nick shares his journey to establishing Cohere alongside Aidan Gomez and Ivan Zhang with Aidan’s seminal paper, Attention Is All You Need, serving as an inspiration to the startup.
He reflects back on his experience at the Google Brain team in Toronto, where he worked on neural network research with Geoffrey Hinton, and breaks down the evolution of neural networks to the current state-of-the-art large language models.
Nick touches on the level of machine learning experience needed to deploy on Cohere’s NLP platform — and how it is Cohere’s goal to eventually become the default NLP toolkit for all types of developers.
Nick gives his thoughts on generative AI, such as Dall-E and Stable Diffusion, and the possible business applications for them. He shares the rationale behind partnering with Google Cloud to cater to Cohere’s huge demand for raw computing power.
The pod rounds off with the startup’s future plans to integrate a multilingual offering; specifically multilingual embedding.
Florian: Nick is the co-founder of Cohere AI, a startup that provides easy-to-use NLP products to developers and businesses. Cohere also raised 170 million in a couple of rounds this year. To start, give us the elevator pitch for Cohere so we get a better understanding of what you and the company do.
Nick: Cohere is a natural language processing company, so we sell an API that allows developers to solve any language problem they might have, so if they need content generation or they need semantic search or they need entity extraction or classification. Those are all things that our API can do. It does all of those powered by large neural networks and state-of-the-art language AI.
Florian: Tell us a bit more about your professional background. You were with Google prior to Cohere. What drew you into the NLP space because there are a lot of other avenues open when you are with a tech giant like Google?
Nick: I was at Google for a while. I worked in the Toronto Research Brain office where I worked with Geoff Hinton on things like vision and adversarial examples and capsule networks, so stuff that was pretty unrelated to language. One of the interesting things about AI these days is that it turns out it is the same strategy that is the best at vision, as is the best at language, as is now the best at almost every domain. Neural networks can be used for all of these things. I was researching neural networks and got very excited about neural networks’ application to language and so I made the switch to found Cohere with Aidan and Ivan. I joined in January 2020.
Florian: Can you tell us the origin story behind founding Cohre with Aidan Gomez and Ivan Zhang?
Nick: Aidan is one of the co-authors of a paper called Attention is All You Need. For your audience who are not familiar with neural networks, neural networks are a form of machine learning and transformers are a type of neural network. It was this paper in 2017 called Attention is All You Need that introduced this new type of neural network. A few years later, other researchers showed that this type of neural network could be scaled up to be a huge neural network and when they did that, they got a great performance on language. That was very motivational and inspirational for us. Aidan in particular, after seeing this, came to the conclusion that there is an opportunity here for a company to make very big neural networks that are very good at language and provide companies access to them. Sometimes we think about our technology as a power plant. You are in a house. I am in a house. We are both using electricity, but neither of us have built our own generators. We are plugged into a centralized generator and we are paying for our electric usage. That is what Cohere is trying to be for natural language. We upfront the cost of creating these massive transformer neural networks and then we hook up companies to them and they pay for usage, so that is a win-win situation for everybody. Companies get access to a thing that they could not themselves build and we get to build a company that is adding value in some way.
Florian: That was one of the foundational papers in that space and a lot has happened since.
Nick: A lot has happened since, but that original paper is what introduced the idea.
Florian: You started in late 2019, early 2020, so what was the trajectory early on? Was it in stealth for a while and then you raised a seed round and a Series A?
Nick: We started with a seed from Radical which is a Toronto firm that specializes in AI and then we did a series A with Index and then a series B with Tiger Global.
Florian: Let us talk about large language models because it has become such a big part of, not just language, but also vision and other applications. Can you break down the concept of large language models for our listeners? For the non-ML expert, but the interested lay audience.
Nick: People have been trying to get computers to understand language for a very long time. Since computers were invented, that has been one of the very cool, promising things we hope that they would one day do, and for a long time, people tried to get computers to understand language by building in hard coding rules about the way we thought language worked or particularly the way we thought English worked. We would write a rule that says every sentence has a subject and a verb and an object, and we would write parsers that would try to look at a sentence and figure out what word was what. Was this a noun or a particle? Try to figure out the grammatical structure of that sentence and then it would try to infer the meaning based on dictionaries that describe words and the sentence that was constructed. We tried that for a very long time and it did not ever work. You could never get a computer to respond sensibly or take an action based on the sentence that was given to it, and the reason for that is that language is very messy. Even as I am talking to you now, I am starting and stopping in the middle of sentences. I am adding conjunctions. I am giving you run-on sentences forever. I do not even finish half of the sentence I open with and yet you are able to understand. That structured rule-based approach looks good on paper, but in practice is complicated.
In the past handful of years, there has been this other thing that was going on that first showed promise and that is machine learning. Instead of trying to hard-code rules, we are going to set up a system where we will give the computer a bunch of examples of what it should do and it will learn how to do the task we are doing, so that is what machine learning is. Neural networks are a particular type of machine learning and transformers, which were introduced by that seminal paper, are a particular type of neural network. Now, the reason they work so well is they have a particularly good mechanism for looking at sequences and that is very good for text. They are also very easy to scale. They are very easy to make big versions of and it turns out that when you make a really big version of it, you get very good performance. Now, instead of teaching the computer by writing rules, we show it as much text as we can possibly get. We train this system where we show it some text and try to get it to predict the next text. In doing so, we end up with a thing that can take in the first half of a sentence and write a second half, or it can take in text and do entity extraction, or it can take in text and write summaries and things like that. Even translation now because those are all text problems and they can be handled by the system that understands the text.
Florian: You guys are offering three products on your website, Classify, Generate, and Embed. Can you talk a bit about that?
Nick: We offer three endpoints. An endpoint is a function that we run on our server. If your computer makes a request to our server, we can send back these two, these three things. With Generate, it takes in text and then writes more text. That is what it does. It turns out you could use that for all kinds of things. You can use that to get a summary. You can use that to write blog post content or something. You can use that to do entity extraction. The way you do that is through prompt engineering. If you are clever about the text you give the model, then the model will give you back something sensible. If I want a summary of a paragraph, what I can do is I can give the model the paragraph and then write the TLDR colon, and then the model will know I need to write a summary here and you could do similar things for classification and stuff like that. We also offer Classify which is based on embeddings now, simply because it performs better, but that is where you can give text and a handful of examples of how you would like things classified and then get a classification back. Finally, we do embeddings and embeddings include the most math. It is used by people who are mostly doing more complicated things with this type of technology. An embedding is a vector and a vector is a list of numbers, so you give it some text and you get a list of numbers back. It turns out you can use those lists of numbers for semantic search or clustering and you can do that by measuring the distance within that vector space. Effectively, it means you put in text, you get a list of numbers, and now you can do arithmetic on those lists of numbers for various purposes.
Florian: That sounds a lot less consumer-friendly than Generate, where you give it a prompt and it will write a blog post for you.
Nick: The embedding stuff is super useful. We are pushing the boundaries and there are lots of embedding these days. We are gearing up to have some cool things coming out. You need to be more of a technical expert to make use of embeddings, but if you know what you are doing, you can build some cool stuff with it.
Florian: Let us dwell on AI content generation for a bit because there have been a bunch of products that have come out recently. Do you see this as a vast field and the more players the merrier or is it a bit of a land grab at the moment?
Nick: I think that the space is big. There are a lot of people in this industry. There are a lot of people trying to provide natural language solutions to people. That is because it is mostly a good idea. I think most people agree transformers are cool. They are difficult to make, so it is useful to create one and give people access to it. It is a pretty big field. To me, it is very odd that we spend our whole lives learning how to communicate via language, and then when we sit in front of a computer, we do not use that skill and we have to learn how to communicate with the computer. That is very odd to me. It should be that you sit down in front of your computer and you communicate with it the way you learn how to communicate with people, so there is a lot of ground to dig. There is a lot to build.
Florian: On your website, you say Cohere wants to become the default NLP toolkit for developers, so can you specify a little bit about what type of developer and what is the level of sophistication these developers need to build?
Nick: In the long term, I would love for any developer in the world to sit down and think, I have this language problem and Cohere could be the language solution. That is the goal. I want to build something super easy to use, that anybody could come to and solve whatever language problem they have. I do not think we are quite there yet. Right now we are getting there, our classification endpoint is very easy to use. You do not need to understand how classification works. You do not need to understand what technology is backing it. You can come and give it a few examples and get some out. You can do the same with Generate, and do things like entity extraction and summarization, but you need to understand how prompt engineering works and that is a barrier. Embeddings are useful but definitely require more expertise. I have seen some novice people build cool stuff with it who had no exposure to machine learning or NLP or anything. I have seen more people who have some experience working on it and then building cool stuff, but as time goes on we are pushing that barrier as low as possible. The goal is to make it so that anybody could jump on and build something cool.
Florian: Within the enterprise, would there be quite large enterprises with somewhat sophisticated tech teams that have a couple of people that are super interested in this? Or is it quite broad in terms of the type of companies that you are targeting?
Nick: It is quite broad. I have seen people do very cool stuff with it at large companies and people who are in high school who have never worked on this stuff before working on their own for fun and building cool stuff with them.
Florian: How does multilingual figure into your offering? How do you handle different languages? Could I put in a German prompt and get something out of it or is it restricted somewhat?
Nick: Right now we grade and rate our model in English. We measure our performances and our generative model on English and the reason we do that is that we mostly speak English and the data is mostly in English and the eval sets that are out there are English. As our company has grown, it is very obvious that there is a huge space there for multilingual. In particular, multilingual embeddings are something that we think is very promising. It would be very cool if you could embed a sentence in one language and find similar sentences in another language or embed a whole document and see where, regardless of language, similar things have been discussed. There will be some cool things coming out from us in the future in that regard, so multilingual is very exciting.
Florian: Recently there has been a lot of noise on Twitter about DALL-E and Stable Diffusion and then recently Whisper from OpenAI. Are you also sensing there are breakthroughs in terms of productization right now or do you feel it is an incremental step?
Nick: All these things are super cool. I would not hesitate to use the word breakthrough for Stable Diffusion and DALL-E. I think those are cool applications. There are some really smart decisions that went into making them. The diffusion thing was very cool and it obviously captured the attention of tons of people because it is very fun to get AI to generate something. It is fun for people. I think it will be a while before that technology lands in a particularly useful place, but it has obviously landed in a fun consumer market that people are willing to spend money on and they get value out of it. I recently saw somebody start using Stable Diffusion as a CAPTCHA on a website which was smart, so there are business applications for those things too. Stable Diffusion and Whisper are all multimodal. Originally, machine learning was looking at transformers and thinking about them as text because that is where they were amazing. Then slowly people started applying the models to other things. Now people are trying out different neural net strategies on different things. Stable Diffusion is a different architecture, but it is very good at taking in text and then creating an image, or Whisper taking in audio and then creating text. One day all of these modalities will be mapped by the same model and one day we will be able to give the model whatever modality we want and get out whatever modality we want, and that is very exciting.
Florian: One thing that is required is massive compute, so you are partnering with Google Cloud. Is there anything you can share about partnering with Google and how it may differ or not from other cloud offerings to run your tech on?
Nick: These things require a whole bunch of compute and the reason they require a whole bunch of compute is neural networks have billions of parameters. A parameter is a number, so when people say a neural network has a million parameters, what they mean is it has a million numbers that need to be trained. State-of-the-art transformers for language now have billions of parameters and more, so it is a huge amount of numbers you need to train, and the way you train that is by showing them examples of what you want it to do and then changing the numbers a little bit so that you get better at that task. You need huge amounts of large computers in order to store all these numbers. You need large computers in order to do the math required to figure out how to change the numbers to make it better at the task you are giving it. You need huge amounts of memory to store all the data that you are going to show it. The end result is you need massive supercomputers to do this in any reasonable amount of time, so we have partnered with Google. We work on Google Cloud and we use TPUs. TPUs are a specialized type of hardware. Just as a GPU is a graphics processing unit, a TPU is a tensor processing unit and so it is a chip that is specifically designed. Tensor is the name of a matrix with more than two dimensions and we use that when training neural networks. We use this particular type of computer and it is particularly good at this task, and we get that from Google.
Florian: You probably need to hire a lot of engineers now and very specialized roles in this field and AI is super hot while the rest of the economy is taking a little bit of a breather. How do you find it to hire and retain this type of super-specialized talent?
Nick: We are always looking for great people. We have built an incredible team. In the past year, we have hired some people who are phenomenal, so it has not been super challenging. Sometimes I think about why that has been the case. One of the reasons is that what we are working on is very exciting and fun. It is a cool thing to work on and so we have been very lucky. When I step back and look at our team, it is an amazing group of people and so we try to make sure it is a good place to work. We try to make sure they are working on fun stuff and in that way keep them around.
Florian: Are you guys mostly remote or do you have an office where people come in?
Nick: We have a few offices. We like offices. I find that people enjoy going to them. We are also remote-friendly. We have some people who are spread around the world. We have some people who live down the street from the office but never come in and then we have some people who live far away, but still like to come in every day. It is all over the place, but we open up offices when there is a large enough group of people in any city. That means we now have an office in Toronto, which is our headquarters, one in Palo Alto, and we have recently opened one in London.
Florian: Without giving away any secret sauce, is there anything on the product roadmap you can share for the next 12 or 18 months? Anything that you can pre-announce?
Nick: Some of the things we are working on now are very cool and very relevant to you guys and in particular in the multilingual space, so if there are people who are listening who are excited about trying to map one language to another language, stay tuned to Cohere.