Inside the Large Language Model Revolution with Nikola Nikolov

SlatorPod #160 - NLP expert Nikola Nikolov

In this week’s SlatorPod, we are joined by Nikola Nikolov, an experienced researcher, engineer, YouTuber, and consultant in natural language processing (NLP) and machine learning.

Nikola talks about the evolution of large language models (LLMs), where the core technology remains the same, but the number of parameters has grown exponentially and the capacity to fine-tune models on human data via reinforcement learning from human feedback has turbocharged the models’ capabilities.

Nikola unpacks the rapid increase in front-end use cases with companies like Google and Microsoft already integrating LLMs into their products. At the same time, he speculates about what will happen to the hundreds of startups that are using APIs to build similar tools like writing assistance or summarization.

Nikola shares the limitations of an API-only approach, which include using a model limited in data it has collected from the internet and that is not fine-tuned to a domain or specific use case. 

Subscribe on YoutubeApple PodcastsSpotifyGoogle Podcasts, and elsewhere

He discusses how LLMs perform when it comes to machine translation (MT). Although GPT is trained on large amounts of multilingual data, it’s not specialized in translation, so machine translation providers will retain their edge over ChatGPT for now.

Nikola predicts two different scenarios when it comes to the future of LLMs: the first is where large corporations quickly integrate LLMs into their products, competing with startups and putting many of them out of business. The second scenario is where startups will create novel use cases and integrate multimodal technology to build something completely new and different from big companies.

Transcript

Nikola: Thanks for having me and maybe I should also say a few words about me. As you said, I’ve been working in NLP now for seven years and I’ve been involved in research during my PhD and now as well and also I’m currently working at a startup focusing on edtech and I’m also doing some consulting as well. For those of you interested in talking to me, you can check out my website at nlplab.tech and I’m very excited about natural language processing, machine learning and the power it has to impact the world, so happy to be here again.

Florian: We’re trying to unpack all of these latest advances which were so fast and furious with the large language models, of course, now with ChatGPT and all these other models, so we needed to bring experts on to discuss this and make sense. Before that, I noticed you’re using Synthesia for your YouTube channel. Synthesia was on the podcast as well. We talked about the company in the past as well. Tell us about this because it’s a nice mix of kinds of NLP and kind of applied NLP in terms of the text output. Tell us a bit more about that and why you chose to use that for your YouTube channel.

Nikola: I’ve been doing YouTube focusing specifically on natural language processing topics for the last couple of years. I started off doing the videos myself, but I came across Synthesia a few months ago and I gave it a shot and I realized that actually, I’m able to make videos several times faster with Synthesia, so I’m able to produce many more videos and also they look quite professional too. So I think it’s a really great tool. What I’m doing is I’m actually having my own avatar as well. They now offer the option if you have a paid subscription to record two or three-minute videos of yourself and then they can create an avatar of yourself automatically, which you can use to embed in your videos, so it’s really a great help, helps me to do a lot more.

Florian: I watched it and then initially I noticed it wasn’t fully live, but it took me a minute or so to understand. Oh yeah, this is your avatar on Synthesia. Anyway, a great way to do a lot of YouTube output. So get us started on your views on the kind of large language model and the state of natural language processing today. Kind of lay the landscape here for us over the past maybe six months, but also going back a bit further.

Nikola: I think we are actually at the moment at a very, very interesting time in terms of AI. So we have seen some really impressive use cases and demonstrations of AI, in particular, large language models talking specifically about NLP and language technologies. Well, actually the technology itself, language models is quite an old one. The first statistical language models date back to the 80s already and we have seen some neural language models being released already at the beginning of the early 2000s. But now it’s kind of a very interesting situation and we have seen technologies such as ChatGPT, large language model APIs released by various players such as OpenAI, Cohere, Anthropic and maybe I should say… So I think my interpretation of this is that the technology, the core technology itself is not new, but we have seen some pretty amazing advancements in the last few years which has made it possible to get to the current fluency levels of those language models and here there’s like a few things that come to mind in particular. One is the scale of the models, so the small language models in the past were much smaller in terms of capacity, in terms of a number of parameters, as well as in the amount of training data they had seen. LLMs like GPT-3, GPT-4, ChatGPT have hundreds of billions of parameters. I think for GPT-3 it’s about 173 and for GPT-4 it’s not really known, but it’s about 1 trillion parameters are the speculations out there and also those models are trained on hundreds of huge amounts of gigabytes of raw text. This is one component that makes a huge difference to me as compared to what was in the past. And another component perhaps is the architecture of those models. Statistical models were just based on co-occurrence statistics, so, basically, predicting using n-grams of words to predict the next word, sequences of words. And we got to the deep learning revolution in the last ten years and now we are using transformers, which are the best architecture we’ve had so far for doing this sort of language modelling too, so that also has made an impact. And the transformer-based language models are much better at handling long-range dependencies. They’re easier to train, in some sense than previous architectures, so the architecture is also a component that’s making a difference right now, that we are seeing this explosion of large language models. And finally, a component that is very, very important as well, and also has been more recent, is our capacity to fine-tune these models on human data. So the base language models up to 2020 or so, for example, GPT-3 released in 2020 by OpenAI were only trained using the raw data sets available on the internet. So, stuff like the whole of Wikipedia, all possible web pages they were able to get their hands on, pretty much. But now what’s interesting since the introduction of ChatGPT or so, is the introduction of fine-tuning on human data sets or the so-called reinforcement learning from human feedback, which they have shown is that it better aligns the large language models to human expectations. So it gives this really great capacity of the language models to understand what is the goal, what request we have, we want to achieve. And it enables the large language models to follow this instruction and produce something that is actually useful, something that is related to what we want to get. And for previous models from two, or three years ago, even though they were large, they were not really able to do that to the same capacity. So all of those things combined together have led to the current state where we have really powerful chatbot dialogue, like models, like ChatGPT, GPT-4. They’re able to do a lot more than the past. It’s really remarkable what they can do, of course. Of course, they have as well various limitations and this has led to a very interesting situation where we are at the moment, as we have talked before with you in the past, there’s like an explosion of various startups using this technology in various capacities. Either building foundational technology or building applications on top of it and we’re also seeing, interestingly, the big companies are really jumping on board really quickly as well. Like, we have seen a lot of releases of prototypes by really big companies, so that has been very interesting to watch. As someone who has been also doing this for a while, this speed has been very exciting too.

SlatorPod – News, Analysis, Guests

The weekly language industry podcast. On Youtube, Apple Podcasts, Spotify, Google Podcasts, and all other major platforms. Subscribe Now.

SlatorPod – News, Analysis, Guests

Florian: How big of a deal do you think is GPT-4, from a technology point of view, in the grander schemes of large language models? Is it mostly kind of first to market and they were just bolder in coming out with something that maybe a big company like Google would have been a lot more hesitant to launch in the market? Or do you think they actually have a lasting technological edge at least for the next, let’s say, six to 12 months? That’s I guess what I mean by lasting because I don’t think they have anything in the next maybe five, six years. But how big of a deal is it now in terms of the technology? And you mentioned that reinforcement learning by human feedback, maybe, behind the curtain, they were doing a lot of this reinforcement learning by human feedback for months and months, and now everybody else is trying to catch up. But do you think they’re going to catch up relatively quickly, or do you think GPT or ChatGPT and OpenAI have a lasting advantage over the next year or two?

Nikola: That is a good question. In terms of GPT-4 specifically, we actually don’t really know a lot about it, so there are not a lot of details released to the public. There are only a few details outlined in a blog post and a report by OpenAI. But we don’t really know fundamentally whether there have been major advances or not to be fair and my kind of gut feeling into this is that the main advances have been in terms of scaling the existing systems. They have changed the training dataset, perhaps cleaning up a little bit better. They have increased the dataset that was used to train the model on human feedback. They have increased the model size. They have increased the training time of the model as well. Speculations around that have been trained several times, many more epochs than previous models, which of course helps the model to learn better, to better memorize and distil the training patterns – much better from the large corpus that it’s using. So in terms of the competitive landscape and what will happen, it is quite difficult to predict, to be honest, what will happen. My gut feeling there would be that the technology behind OpenAI fundamentally is open source. We don’t know all the details, but a lot of it is open source and Google is a very big company, so they have a lot of resources, a lot of researchers that are very well trained and I’m sure that they’ll be able to put together something that will be able to compete with GPT-4 fairly quickly. There are a lot of tricks, so OpenAI has some advantages. Mainly they have been doing this type of work for quite a few years now, specifically focusing on large language models. There might be some technologies, mostly in terms of how they train them, in terms of internal software, in terms of tools. How to train them and how to deploy them, there might be some advantage on that side from OpenAI. But in terms of replicating it, I think it’s possible to get something similar and. also. maybe even open-source models, we’re going to see more and more coming out that are closer to GPT-4 over the coming months. So yeah, hopefully, that answers your question.

Florian: Who are the key players? We had Nick Frosst from Cohere on the podcast actually before the whole ChatGPT craze started, so we have Cohere. We have OpenAI. You mentioned Anthropic just before, there’s I guess Google with Bard now. It’s kind of hard to keep track, but I guess those would be called the foundation models, right, and then is there anybody you would want to add to that and then tell us what is a foundation model? Because afterwards, I want to get your thoughts on building on top of these, but first I would like to understand, okay, what’s a foundation model and who’s in that space?

Nikola: A foundation model is basically the base large language model that can then be used for various downstream applications through the prompting that people are quite used to nowadays. And basically, the foundation model is a large language model that has been trained to be like a general-purpose model and then it can be accessed by anyone through an API or directly as well if you’re a big company like Google. You want to have basically this base pre-trained model that’s useful for whatever you want to do. Let’s say in the case of Google, maybe you want to integrate it into Google Docs, maybe you want to integrate it into Google Meet to do meeting summarization and various similar use cases. And yeah, as you mentioned, there are quite a few players providing this foundational language model as APIs. Growing a number, it’s quite tough to keep track indeed and all of them have slightly different models. Basically, they differ in terms of the training data that they use and also in terms of the human feedback data too. I think all of them to a certain extent do, mimic what OpenAI’s doing in terms of training on human feedback data, like the RLHF – reinforcement learning from human feedback type of fine-tuning.

Florian: Where does the foundation models kind of end and where does somebody building on top start? Like, for example, recently I saw Bloomberg has its own LLM now, so would that also be considered a kind of foundation model, but only trained on a ton of Bloomberg data or would that already be on somebody else’s kind of framework? I’m just trying to understand kind of the pyramid. What’s at the foundation of the pyramid and then going up and people building on top of it? So, for example, if you’ve seen the Bloomberg example, what would that be?

Nikola: That is also a foundation language model that has been kind of pre-trained on a lot of financial data and the idea there is that, basically, Bloomberg can take this model and then they can apply it to various use cases they might have internally. Focusing on, for example, summarizing financial news articles or using them to predict the markets or something similar. So basically the foundation model finishes where a specific use case starts where you want to take this model and really max it out, optimize it for this specific use case that you want to get high accuracy on.

Florian: Would you have to be a company of the size of Bloomberg to want to do this at the foundational model as opposed to all – right, now, I use GPT-4 – and build something on top or I build something on Cohere? What types of enterprise size do you need or what pool of data do you need to want to build this on your own?

Nikola: In terms of replicating a model like GPT, is very difficult to do. It requires a lot of funding, several million at least, to kind of replicate this model. And this might be necessary because what people have shown is that to get really the most out of these foundation models, to get them to the level where they really start to become really useful, you need to get to a scale that is at least 50 to 100 billion parameters. And basically getting to this level is very, very difficult unless you are a very well-funded startup or a very big company. For most people, therefore, where they should focus their efforts on is looking at existing pre-trained foundation models, seeing if they can fine-tune them for the specific use case or coding directly the APIs similar to GPT and doing stuff like prompt engineering or also those APIs offer the option to fine-tune directly on the platform. And we’re going to see a lot more of those services coming up in the next couple of months and years, offering companies the option to fine-tune their own foundation language model checkpoint, which they can directly integrate into their products.

Florian: Very interesting, and so, you’re mentioning building on top of these now. So I’m seeing a lot of front-ends with GPT now, with OpenAI. Anybody who gets their hands on kind of an API, it’s building some type of front-end. So is that really the kind of major explosion now that a lot of people are taking this and trying to get a quick win in terms of building some type of front-end, getting in a few subscriptions? What type of businesses are you seeing now very early on in having an API available to a lot of these Cohere and OpenAI or what type of businesses are being built now? I know there are hundreds or thousands, but do you see a trend already emerging?

Nikola: Yes, the impact in general of these foundation models for sure will be huge in the industry and in various use cases. It is a very interesting question to think about which companies are going to be successful in the end, let’s say in a year from now. Right now we’re seeing a lot of movement basically across the board, so big companies are integrating those foundation models. A lot of startups are coming up which are exploring novel use cases of them. For example, writing assistance or summarization is a very big use case. I’m sure there will be a lot of applications as well in the machine translation or cross-lingual type of services too and currently, we can talk about that if you would like in terms of how I see this. It’s tough to predict of course. I think that now is a very interesting time. There’s a lot of opportunity. Currently, I actually see this, in terms of winners, there will be like… A part of me is a little bit sceptical. A part of me is also seeing the opportunity. Actually, I see that the big companies are going to be major winners for sure out of this situation. Companies like Google and Microsoft already they’re integrating these models into their products. Microsoft announced Copilot for Office. I don’t know if you saw that one, which basically is giving you access to ChatGPT-like functionality directly within Office, Excel, Word, and within Teams. You’ll be able to generate emails, content, whatever you want, and images within Microsoft directly. This will be a major win but at the same time… So these big companies for sure, there’s going to be a big impact that I see. At the same time, there are a lot of those small startups popping up that are trying to use the APIs to build writing assistance and various other tools there. I find it a little bit more difficult to say which one will be successful or not. And especially because many of them are very similar, they just code the API and in my opinion, there will be a lot of companies that will not make it unfortunately because they will not be able to bring an edge over products like Microsoft’s Copilot which already has a large user base. Basically can be a writing assistant. It can generate whatever copy you want, talking about, for example, advertisement materials or material for your website, or blog posts, it can do that too. So it’s a question of what will happen with all of those hundreds of startups that are popping out at the moment in this space and there is an opportunity there too for products to come out. However, there it’s getting more difficult and I think there will be companies that are bringing in something new that differentiates them from Microsoft and the big companies. This could come up in terms of datasets that they might be able to collect. It might come out also in terms of novel use cases that are very niche and are not really interesting for the big companies. Maybe also in terms of new technology, although that’s very difficult for smaller players to bring in, as I mentioned because developing cutting-edge foundational language models or NLP is becoming increasingly more difficult unless you are very well funded as a startup. That’s how I see it and also one segment of companies that will for sure be successful is actually going to be the companies that are bringing in foundational technology both in terms of the hardware and the software. So companies like Nvidia for sure are going to profit from the whole generative AI situation and large language models because everyone needs to be using GPUs to run those foundational language models as well as companies like OpenAI or Cohere. There will be a lot of interest in general from small, medium-sized and even large companies to use those APIs directly. Especially companies who don’t have the capacity or the motivation to develop those technologies further themselves. They might want to just call an API and bring in some AI-like features to their products and they will have to do this as well if they want to remain competitive because all of those startups are going to come up with the same product but using the API. And those medium-sized companies, for example, it could be also a translation company, they can just implement those same features easily by coding the API themselves and like that, they already have the user base. They’ll be able to basically potentially put smaller companies out of business in the future.

SlatorCon Remote November 2023 | $98

SlatorCon Remote November 2023 | $98

A rich online conference which brings together our research and network of industry leaders.

Buy Tickets

Register Now

Florian: You put out a post on the Global NLP Lab also about the limitations of the API-only approach, right? I mean, you touched on some of the limitations already, but maybe let’s just go through this quickly again. Again, from a startup point of view, you’re just relying on the API and you’re building a front-end. What are some of the limitations that you listed in that piece?

Nikola: One major limitation of just coding an API is that you don’t have a model that is fine-tuned to your specific domain and your use case of interest. So let’s say you’re focusing on the application use case that is targeting biology, the API approach is going to be basically limited to what the model has learned from the tens of hundreds of gigabytes of data of biology it has basically seen from Wikipedia and similar sources on the internet. However, the model has also seen not only biology documents but also, let’s say, financial data, news articles, and various topics unrelated to what you’re trying to achieve. So a model fine-tuned on biology will of course always outperform a model fine-tuned on the whole of the Internet. And this is one limitation that you cannot really easily get a model that is specialized to your specific really narrow domain and application that you want to target with APIs. So that’s something to consider if you’re looking into using APIs directly for your use case. I think there’s a big value in using APIs to put together quickly application prototypes and at the moment as well, there’s still a buffer period if you want to make a startup or so in this space. There is a possibility, I think, still, to put together a very powerful product nobody has thought about. But you need to quickly collect, in my opinion, custom datasets and you need to think about, okay, what will be the next step, because what will happen is you’re going to have 10, 20 more companies, both small startups and big startups, which might have data that you don’t have access to, which will fine-tune the model and will be able to outperform you on the long term. So that’s one thing to consider, no specialization. Another one is the lack of differentiation that you will have if you just code an API, so if you want to get investment for your startup, a question is going to come up, of course, like, what do you bring to the table? Do you just have an interface and you’re just coding this API and okay, magically, is it going to work in the long term? Probably not, because you don’t really have anything that basically keeps you on the edge compared to the competition, compared to the hundreds potentially other companies in the same space as you might be doing something slightly different, but they might just decide to integrate an API themselves and target the same use case you are targeting and if they do that, basically you will be out of business. So this is one thing to consider for sure for someone who is looking into using this technology as a core business strength that they want to have. And the final thing to keep in mind too, is that if you’re just coding an API, you don’t really have ownership of the technology. So in that sense, you don’t really understand what precisely is going on in the background. You cannot really give precise guarantees to your customers regarding what they can expect. You also might have issues like privacy as well as various privacy or data types of concerns. Cannot really give promises that what your users are providing to your app, how it’s being treated, and how it’s being processed by those API providers. So basically you’re having limited flexibility as well when you’re using the APIs as a key component in your business.

Florian: Talking about privacy, literally a couple of days ago, Italy banned ChatGPT, right? Mostly around privacy concerns were the key reasons. What’s your kind of hot take on this? Is this a start to a much wider ban or limitation to these models? Or is that just Italy blazing the trail and staying alone? I don’t know.

Nikola: There’s a lot of concern, a lot of discussions going on, for example, within universities or within governments, I’m sure, about the impact of this technology it might have because I also heard on the internet someone mentioning that basically a lot of employees from big companies are throwing a lot of data in ChatGPT and using it for various use cases. And I think companies are starting to be increasingly more aware of this, so thinking about, okay, what is the impact of this? So basically, will the OpenAI team get access to our proprietary data? Will they be able to somehow reverse engineer or use this? I think we are moving in the right direction. Also, OpenAI has been strengthening its stance towards this. For example, they are saying that they’ll no longer be using data from ChatGPT and from the APIs to improve their core API offerings. But at the same time, there are a lot of API providers, there’s a lot more transparency that needs to be provided if those foundational models want to be used in big companies where data sensitivity is a prime concern, and for example, things like banks or other institutions, let’s say law firms. They have very, very strict requirements and concerns about this. There’s also an opportunity, perhaps you’re going to see a lot foundational language model companies showing up that is specifically targeted towards that and also it opens up opportunities in terms of the companies that are offering dedicated capacity for them coming in in-house and building your own foundational language model fine-tuned to your data, which you have yourself internally and you can use. I’m sure that we’re going to have a lot more companies in that space as well. I have seen quite a few, actually.

Florian: We’re actually piloting a project there where we give all of our data and now we’re trying to get kind of an answer box that’s specifically trained on all of our content, so maybe it’s more of a temporary issue. Just people, I mean, Italy is concerned about all of the personal data that it was: A) maybe trained on, and B) that people are inputting there. But I’m sure that’s going to get worked out over the next year or two. I’m sure, who knows? I guess it will. Now, let’s talk about LLMs and machine translation, specifically machine translation, this being a podcast that’s centered around the language industry. Very stupid question. Why wouldn’t an LLM the size of ChatGPT not, massively outperform any other machine translation model? It’s trained on everything and every possible input. It has a trillion parameters or however many it does like so, so much. It’s competitive and we’ve covered this, but it’s not like massively outperforming. It doesn’t seem to obviate the need for more narrower models. Why is that? That’s number one. Number two, maybe first, how does it translate at all? Is that fundamentally different from the machine translation, the neural machine translation models we’re currently using? Or is it kind of the same fundamental approach to it?

Nikola: Actually, GPT-4 has made quite a big advancement in that space. Not specifically about machine translation, but in terms of multilingual use cases. Actually, one interesting result that they report in the blog post is that GPT-4 across 40 languages outperformed GPT-3.5 in English on a narrow question answering benchmark. So it seems that the foundation models, especially those trained on large amounts of multilingual data, are powerful and they do have the potential to unlock various NLP use cases in the lower resource languages. In terms of machine translation, there are limitations. Here, the general limitation of GPT coming into play is that it’s kind of a general-purpose model trained on the whole of the internet. It’s not really specialized towards translation. And for example, one funny title of a paper that I saw recently comes to mind. The title, I think, was “ChatGPT: Jack of all trades, master of none”, so basically GPT models are very good general-purpose models, but for specific use cases, there’s still a lot of value that can be added by fine-tuning them on specific data sets like for machine translation. And it’s very difficult for a general-purpose model to get to the point where it’s able to do not only stuff like dialogue, like ChatGPT, not only solve various tasks like sentiment analysis, classification, summarization, anything you can think of and get a state of the art performance on machine translation, it’s very difficult to get there. Certainly, I’m quite optimistic that we’re going to be able to get very, very good accuracy. But in my opinion, the accuracy specifically on machine translation will always be better off using a dedicated model. Also, because machine translation is a special case where you have… Quite often you want to specialize in a specific vocabulary for a narrow domain. You want to focus, for example, on biology only. You want to use specific terminology relevant to your company that you want to target. We’re going to see a lot of interesting use cases of machine translation using ChatGPT or GPT or foundation language models, I’m sure. But it will be very difficult to match that with a dedicated large enough model specialized on your data, I have the feeling, and I will be sceptical if this happens. For example, I think DeepL will always be better than ChatGPT for the foreseeable future just because of the unique datasets that they’re using for training.

LocJobs.com I Recruit Talent. Find Jobs

LocJobs is the new language industry talent hub, where candidates connect to new opportunities and employers find the most qualified professionals in the translation and localization industry. Browse new jobs now.

LocJobs.com I Recruit Talent. Find Jobs

Florian: Sometimes I feel like when I’m playing around with ChatGPT it goes through English as a pivot. Do you think that’s the case? Even if I prompted in German to do something, blanking on kind of the specific example here, sometimes I feel it machine translates it into English and processes my request in English and then gives me almost a machine-translated German output back. Do you think that could be the case? Or if I prompted in German or whatever other language that’s not English, does it actually kind of run this in that particular language? It’s hard to frame this, but I feel sometimes it actually pivots into English, does what it has to do and machine translates the result back into German. Do you think that’s a reasonable hypothesis or does it make no sense at all?

Nikola: That’s a very interesting property that GPT has. I’ve also been quite surprised. You can even, for example, request GPT to summarize an article and you can actually give an instruction like your output should actually be in German now and ChatGPT is able to do that. And it’s quite a remarkable capacity that as far as I know, only came out with ChatGPT. I’m not sure if previous iterations were able to do this. I’m not sure if there’s a lot of insight into how it’s able to do this. There’s not a lot of research data around this. I think because it’s a very deep model, what it’s doing is precisely something like what you’re saying. So earlier layers in the model, in the network are handling this precise task. They’re figuring out what is the response to you in perhaps some sort of AI language or GPT internal language. The representations are produced that are basically encapsulating what the output should be like and perhaps later layers are able to kind of produce the final output, which is going to be a stream of tokens in the target language, in a target format, basically handling the case that you want to be handled by the model and that’s kind of my intuition for this at least. And I’m sure there’s a lot more to uncover regarding what those models are doing precisely. This is one of the kinds of the unique properties we have seen. Specifically talking about large language models, larger than 100 billion parameters and we don’t know yet precisely why it happens and which type of model, let’s say which size, which properties of the model lead to these really amazing qualities that are super useful as well.

Florian: Do you think OpenAI knows what’s going on inside, or they don’t either, or they barely know?

Nikola: I think they probably know a little bit more than us, but not that much. Also from what I have seen, for example, I watched an interview with the CEO of OpenAI, Lex Fridman, and there he was mentioning that they were also very surprised by the capacity of ChatGPT and by the popularity it gained. And I’m sure that they’re also very surprised by many, many of the qualities we have seen in many of the amazing use cases they have done in the demos, for example, where you can do things like… Also, the multilingual qualities are quite impressive as well where in the demo of GPT-4, they took an image of a website and then they were able to generate the source code of the website, which was able to be compiled and was actually doing what they wanted it to do. So we’re kind of learning day by day. I’m sure there’s a lot more to uncover.

Florian: If I was a founder and I like the, let’s say, translation multilingual content problem, what should I build? Or what do you think are the top three-five ideas to build right now?

Nikola: I think there’s a potential for personalization. One really cool thing about those foundation models is that you can really personalize them on a massive scale to many people. And talking about the translation problem, for example, you will be able to target a use case where basically everyone is getting a personalized translation system specifically for you. Let’s say for you, Florian, you want to target, let’s say, a very narrow use case. Let’s say you want to translate the podcast into multiple languages here. What you can do with those foundation models is you can provide a bunch of examples and very quickly you can get a personalized translation system that is using your vocabulary as an example. And this is an advantage over a player like DeepL or Google Translate where the personalization is quite limiting, as far as I know, I’m not really a user of DeepL Pro or similar services. They might offer something similar, but basically, you’ll be able to build a personalized translation system much more quickly with those technologies. So that’s something that comes to mind and I think there is a potential in there. I don’t know, in terms of the business side, what is the capacity there because the question there of course is can you actually do much better than DeepL? Because that is a question like if you’re not able to match at least DeepL, it will be little incentive for the people to switch to those translation providers, to your personalized system. And maybe there is another idea that comes to mind that is related to the multimodal space. For example, in terms of translation, like translating not only text but maybe going on the audio level or video level. Perhaps this would be much more interesting with foundation models. You can also combine, for example, models for images, for videos. Maybe you can generate videos in different languages. That’s where also a use case like Synthesia comes into play. This is also something that they promised on the website that I don’t think they have looked into yet, but it would be amazing. Imagine now I’m making my YouTube videos in English, but in the future, near term future, I will be able to generate with a click of a button, a variant of my YouTube videos targeting 50 languages and I can have 50 YouTube channels targeting whatever language I want. And if I’m a big company, then I’m basically using YouTube as a marketing tool like content marketing. This could have a major impact because right now I’m reaching audiences only in English because it’s too prohibitive for me to be publishing blog posts and YouTube videos in multiple languages. But now I can basically have a much larger audience and for some languages in some countries, the adoption of English is not as significant. Especially if it’s about B2C type of products, I see that this could have a major impact. So those are two ideas that come to mind. Regarding other ones, I’m not sure, do you actually have some yourself that come to mind?

Florian: You mentioned YouTube, so their multilingual audio option is now much more broadly available, right? They used to have it only with MrBeast and a couple of other giant YouTubers and now it’s much more broadly available. I’m not sure if it’s in general availability. What do I mean by that? We don’t actually have to have multiple YouTube channels in multiple languages. You can just switch the audio in an existing YouTube video, right? Which then would make it much easier, even more easy to scale this. I was asking you because I think if I started out in 2023 in translation multilingual content generation and I didn’t have like 100 million in VC money lined up, I don’t know what I would build. It’s very challenging in the kind of generative part, not in the B2B workflow. I see a lot of potential there. Recently there’s a company that came out more publicly called Blackbird which is like a Zapier for various language industry applications. I like that. But just on the generative side, very, very tough, I think, to compete because no matter where you go there’s already somebody else in there that’s been more established. But yeah, that’s why I’m not a founder. I’m an observer.

Nikola: It depends on what layer you want to approach. There are different layers in the startup space or AI space at the moment. So one layer is, for example, the hardware or the foundation type of layer where we would be competing with companies like DeepL or OpenAI, Cohere, providing the foundational type of services which other people use as an API. There it’s very difficult, I think, especially specifically on the language translation type of services. Unless it’s a very, very niche use case that nobody is really interested in. Somehow maybe you have domain expertise, or maybe you have some custom data that you can use. Maybe you have connections in this space and then you come in and you’re able to produce a product that is somehow substantially better than what those other companies are providing. That is one option that I see. Another layer is the application where you kind of use those tools, those APIs, you basically code them, and you put together a novel use case. Maybe you also collect some data from your customers and basically, you’re trying to gather attention and you want to be like a first mover of this technology. And honestly, there I think it’s very tough at the moment to build something just because the space is very rapidly changing. You have tools like ChatGPT plugins that just came out. I don’t know if you’ve seen this, but this is a very interesting development.

Florian: It’s not available though, right?

Nikola: It’s not available yet. I think it’s also just like a waitlist at the moment, but basically, ChatGPT plugins could potentially put out of business many of those startups that are just focusing on the application layer. They’re just coding the APIs and putting together novel use cases because people can just come in and put together a ChatGPT plugin in a few weeks, which is targeting the same use case and the plugin already has distribution power because OpenAI is having hundreds of millions of users at the moment and users that are also paying for this service. And why would then someone switch to your new application if they can just use OpenAI and already integrate with this? I mean, it’s a pretty complex topic, to be honest, a very interesting situation at the moment.

Florian: I hope they get a better UI soon. It’s incredible that they can sign up hundreds of millions of users on this super basic UI. Now, so let’s speculate a bit. Next three, six, 12 months, where does this go? Where do you see this going? Is it in twelve months from now I will just go to my Google Docs and I’ll use Bard or I don’t even care if it’s Bard or OpenAI or a plugin or an API, I just kind of got used to these various new cool things I can do? Or where do you see this going the next three to six, 12 months?

Nikola: There are kind of two scenarios that I see playing out here and I’m not sure, I haven’t made up my mind which of those two scenarios will be the most likely yet. Just because the space is so rapidly changing and there’s a lot of new stuff coming out. Basically, I can tell you about two kinds of options that I see. Option one is the option where the main beneficiary of this technology would actually be large corporations which are already jumping into this technology very quickly, companies like Microsoft, Google, Adobe, Duolingo, they’re also, for example, integrating large language models. All of those companies are jumping on board and they will be able to quickly integrate them into their products and basically having this capacity, at least a basic version will become the norm. And when this happens, this will basically put out of business many of those startups that are just coding those APIs and trying to do the same, especially if they’re in direct competition with what some of those big companies are already doing. It’s going to be very difficult for them to compete because let’s say I’m a big company and I’m already using Microsoft, the Microsoft suite, I already am using Teams, Microsoft Office, there’s no chance for me to pay an extra $50 for writing assistant when everything is already integrated into Microsoft or in Google. So that’s one scenario that I see and there the success of the smaller companies will be the ones that are really, really narrow, niche use cases where the interest from the big companies is going to be much less prominent. I think there’s an opportunity there for sure for this to happen. So we might see a couple of pretty big companies that are either targeting very, very narrow, specific areas: let’s say, healthcare in a specific use case, summarizing clinical notes or something like that, where a big company like Google or Microsoft doesn’t have interest in jumping in. And there might be some interesting new companies getting bigger and bigger which also need to be very fast. They need to iterate, they need to collect their own proprietary datasets and potentially also come up with new technology, not just rely on APIs because there are going to be other similar companies coming up and we want to do the same. And another class of companies in this scenario, that will most likely be successful at least in the short term are the foundation language model companies. Right now I think there’s a lot of interest in just using them as they are. But in the long term I actually think that they will be less prominent, especially as people realize that ultimately if you want to have an edge, you need to have your own foundation language model. You need to fine-tune it on your own data set that you collect from your users and this is the value. Many of those companies that are now integrating APIs as they are out of the box will switch to looking into developing more and more of their own custom tech in the next three to six to 12 months as many of the startups, many of their competitors start to do the same. This will be becoming the edge. This will be scenario one where basically big tech foundation model companies will be making money and will be successful from this. But foundation models might lose power over time potentially unless they’re able to keep up, they’re able to come up with new models because also there will be pressure from open source which is going to basically replicate what foundation models are doing. It will be more and more easier. Basically, you wouldn’t have to rely on API but you could just download a model from Hugging Face and use that one. So that will be scenario one and of course, companies like Nvidia will be profiting at the end of the day or companies which are more fundamental to the whole ecosystem. No matter what, they’re going to be beneficial, they’re going to benefit from the developments. Scenario two would be a little bit more optimistic towards the startups because this technology is really revolutionary at the end of the day, at least what you can do with it. So there might be some potential really novel use cases that are basically not putting big companies out of business but potentially creating new industries which are completely different from what the big companies are doing. Here, for example, could come in search or assistance. We might see some really innovative assistants coming out. Right now our current assistants are pretty bad like Siri and Google Assistant. They’re not really good, so perhaps integrating advanced LLM technology also multimodal technology, let’s say images, videos, dialogue and building something really new that is completely different from what the big companies are doing at the moment could allow some companies, some startups to be getting an edge which might actually give them an advantage because of their speed. They’re building completely new products. It might create some new big companies out of this situation rather than just the big companies benefiting from this situation and then basically they provide this technology to the users. Those are kind of the two scenarios that I’m seeing. Either big companies are benefiting or basically, the users are benefiting and everyone is getting this technology for free sooner or later. So everyone is going to get ChatGPT-like functionality for free within Google Docs, within Microsoft. Nobody is going to pay for custom software because everything is kind of covered already by those big companies. Or we might see even more innovation. So it could be that I’m kind of short-sighted with scenario one, and basically, I’m underestimating what can be achieved in terms of innovation, in terms of new products, because the field is changing a lot every day. We’re seeing new models coming out every day, multimodal models, models that can do more and more, that are better aligned. So who knows, we have to see what happens.

Florian: Finally, would you sign the pause giant AI experiment letter? Do you see any risks here in the short to medium term or do you think this is all a bit overdone and we’re not about to be extinguished by a super-intelligent AI going rogue?

Nikola: I don’t really see any danger of large language models at the moment. The main danger that I see is the huge number of fake blog posts or inaccurate information being propagated by this technology that is potentially a risk because more and more people are using them to write blog posts, to prepare materials both for the internet, like for social media, as well as internally for companies. Students are using them to write essays and to prepare for exams, so that’s a risk that we might get into this bubble where we just have this garbage in, garbage out, like AI garbage in, AI garbage out situation. But I’m hopeful that at the same time, there’s so much we can do with this and so many possibilities that I don’t think that it’s rational to kind of constrain this. Also, the constraint will be difficult to enforce. There will be companies, let’s say in the US they’re able to constrain it, but maybe in Europe or maybe in other countries, then they’re not going to be able to enforce this type of constraint. So basically to create an imbalance, potentially, and we don’t want that. We want the technology to be free and to have an open market where anyone can come up with a new large language model and potentially outperform. It is getting more and more difficult for smaller companies to compete in this space. That is one concern, potentially this gap between OpenAI, and startups that might have innovative ideas, but don’t have the resources. It’s going to get more and more difficult, but I don’t really have a good answer for this. Perhaps open-source will be one answer. We should push more towards open-source models and open-source software to allow anyone to develop their own large language model, as well as for the small companies to be able to build something new on their data, on top of their data to compete with the big companies. So unless they’re able to do this unless they have the capacity to utilize their data and to fine-tune large enough models to compete with OpenAI, indeed it might be that they might end up just coding the APIs and we might end up having this kind of, simplified AI for everyone, which might not be ideal because it will limit what can be achieved in various industries.