On March 14, 2023, OpenAI announced the release of GPT-4, the most advanced iteration in the company’s series of AI language models. Building on the success of GPT-3.5, GPT-4 introduces new capabilities in language understanding, creative problem-solving, and multimodal support.
Over the past two years, OpenAI has been hard at work rebuilding its entire deep-learning stack in collaboration with Azure, Microsoft’s cloud computing platform. GPT-3.5 served as a test run for the new supercomputer, designed specifically for AI workloads.
Following the refinement of the system and improvement of its theoretical foundations, GPT-4 has been established as a robust and dependable model that builds upon the advances of its predecessor models.
The difference between GPT-3.5 and GPT-4 is most evident when tackling complex tasks. GPT-4 is “more reliable, creative, and able to handle much more nuanced instructions” than its predecessor, according to OpenAI. Furthermore GPT-4 is multimodal, i.e., capable of handling both image and text inputs.
GPT-4 was tested on a variety of benchmarks, including simulating exams originally designed for humans, such as the bar exam, Olympiads, SAT, LeetCode, etc. The results reveal GPT-4’s impressive performance, especially in comparison to GPT-3.5.
Latvian, Welsh, and Swahili…
One of the most significant improvements in GPT-4 is its non-English language performance. Using Microsoft’s translation engine, the team converted 14k Multiple Choice Question Answering (MCQA) items from the Massive Multitask Language Understanding MMLU dataset into various languages.
GPT-4 beat GPT-3.5 and other large language models (such as DeepMind’s Chinchilla and Google’s PaLM) in 24 out of 26 languages studied, including in low-resource languages like Latvian, Welsh, and Swahili.
However, Slator’s internal exploration of Indic language generation via ChatGPT Plus — which involved instrucing GPT-4 to generate children’s short stories and poems in Telugu and Tamil — showed that it still lacks human-level generation and text coherence capabilities.
Early Adopters
GPT-4’s text input capability is available via ChatGPT Pro subscription. API access is being provided on a rolling basis and there is a waitlist in place. The model’s image input capability is not currently available to the general public and is in private testing with bemyeyes, an app that connects visually impaired people with virtual volunteers.
While the official announcement of GPT-4 was made on March 14, 2023, several companies were given access a few months earlier and have been building on top of the model internally.
Duolingo, for example, has developed an AI-powered language tutor on top of GPT-4 and offers two new AI-powered features. “Explain My Answer” gives users more context on lesson quizzes, while “Roleplay” allows learners to simulate interactions with different virtual characters (such as coffee shop baristas etc) in various languages.
Intercom has released a GPT-4 based AI bot, “Fin”, which it calls “ChatGPT for customer service”. Companies can point the bot towards user support articles; the bot then learns from the articles and can respond to customer queries. In cases where the bot fails, it can redirect to human customer support.
Novel Use Cases
Now that GPT-4 is able to take in both images and text as input, new use cases are possible. We present some possible applications below.
Translation with visual context
When translating, GPT-4 can take into account the visual context that accompanies a source text. This can potentially increase the accuracy of the output, particularly in scenarios where the meaning of a phrase or sentence is heavily dependent on visual elements.
Image-relevant transcription
GPT-4 can also potentially provide more accurate transcriptions by considering visual context. For example, you can have a video presentation accompanied by audio. The slides of the presentation could be visually described and passed as an input prompt when performing transcription with services like OpenAI’s Whisper. This would reduce errors in the accompanied transcription.
Meme and multimedia content generation
GPT-4 can understand and generate meme texts for images. This can be useful for social media marketing or creating engaging content for online platforms.
Rich image cataloging
GPT-4 can generate concise summaries or descriptions of images, making it useful for cataloging, organizing, or annotating visual content. This can be particularly beneficial for enabling rich semantic search — beyond text — in documents and pdfs.
Design critique and feedback
GPT-4 can analyze and provide feedback on design elements within an image, such as color schemes, typography, layout, or composition. This can be a valuable tool for designers and artists seeking to improve their work.
Board drawing to UI with code
GPT-4 can interpret visual representations of user interfaces — such as wireframes or mockups — and generate code to build functional web or app interfaces. This can streamline the development process and improve collaboration between designers and developers. A version of this was demoed in the launch live-steam video of GPT-4.
Not So Open After All?
The response to GPT-4’s release has been mixed. Several users expressed disappointment that the official paper didn’t carry any information about the size of the GPT-4 model or the training dataset.
The paper covered the model’s capabilities, while omitting information about its architecture, dataset construction, and training method. OpenAI explained the omission in section 2 of the paper, citing the “competitive landscape and the safety implications of large-scale models like GPT-4.”
In conclusion, the release of GPT-4 marks a significant step forward in the world of AI. With its enhanced language understanding, creative problem-solving abilities, and multimodal support, GPT-4 has the potential to impact various industries and applications.