Researchers have aimed to launch the new AI, code-named Strawberry (previously called Q*, pronounced Q Star), as part of a chatbot—possibly within ChatGPT—as soon as this fall, said two people who have been involved in the effort. Strawberry can solve math problems it hasn't seen before—something today’s chatbots cannot reliably do—and also has been trained to solve problems involving programming. But it’s not limited to answering technical questions.
The Takeaway • OpenAI demonstrated Strawberry to national security officials • Strawberry aims to improve upcoming ‘Orion’ large language model • Smaller version of Strawberry could launch in chatbot form When given additional time to “think,” the Strawberry model can also answer customers’ questions about more subjective topics, such as product marketing strategies. To demonstrate Strawberry’s prowess with language-related tasks, OpenAI employees have shown their co-workers how Strawberry can, for example, solve New York Times Connections, a complex word puzzle.
The effort to launch Strawberry is part of OpenAI’s never-ending battle to stay ahead of other well-funded rivals vying for supremacy in conversational AI, or large language models. The technology also has implications for future products known as agents that aim to solve multistep tasks. OpenAI and its rivals hope the agents can open up more revenue opportunities.
OpenAI’s business is growing at an incredible rate: Its sales of LLMs to corporations and of ChatGPT subscriptions have roughly tripled to $283 million in monthly revenue compared to a year ago, though its monthly losses are likely higher than that. The company is privately valued at $86 billion.
But OpenAI’s prospects rest in part on the eventual launch of a new flagship LLM it is currently developing, code-named Orion. That model seeks to improve upon its existing flagship LLM, GPT-4, which it launched early last year. By now, other rivals have launched LLMs that perform roughly as well as GPT-4.
It isn’t clear whether a chatbot version of Strawberry that can boost the performance of GPT-4 and ChatGPT will be good enough to launch this year. The chatbot version is a smaller, simplified version of the original Strawberry model, known as a distillation. It seeks to maintain the same level of performance as a bigger model while being easier and less costly to operate.
However, OpenAI is also using the bigger version of Strawberry to generate data for training Orion, said a person with knowledge of the situation. That kind of AI-generated data is known as “synthetic.” It means that Strawberry could help OpenAI overcome limitations on obtaining enough high-quality data to train new models from real-world data such as text or images pulled from the internet.
In addition, Strawberry could aid upcoming OpenAI agents, this person said. (Read more about OpenAI's development of agents, including those that use computers, here.)
Reducing Hallucinations
Using Strawberry to generate higher-quality training data could help OpenAI reduce the number of errors its models generate, otherwise known as hallucinations, said Alex Graveley, CEO of agent startup Minion AI and former chief architect of GitHub Copilot.
Imagine “a model without hallucinations, a model where you ask it a logic puzzle and it’s right on the first try,” Graveley said. The reason why the model is able to do that is because “there is less ambiguity in the training data, so it’s guessing less.”
Earlier this month, CEO Sam Altman tweeted an image of strawberries without elaborating, fanning the flames of speculation about an upcoming release. OpenAI also gave demonstrations of Strawberry to national security officials this summer, said a person with direct knowledge of those meetings. (Read more about this in AI Agenda.)
“We feel like we have enough [data] for this next model,” Altman said at an event in May, likely referring to Orion. “We have done all sorts of experiments including generating synthetic data.”
He is also looking to secure more money for the company and find ways to reduce its losses. OpenAI has raised about $13 billion from Microsoft since 2019 as part of a business partnership with the enterprise software giant contracted to last through 2030, said a person who was briefed about it. The terms of the partnership could change, including how OpenAI pays Microsoft to rent cloud servers for developing its AI, this person said. Cloud servers are the biggest cost for OpenAI.
An OpenAI spokesperson did not have a comment for this article. Reuters earlier reported on the Strawberry name and its reasoning goals.
A Lucrative Application
AI that solves tough math problems could be a potentially lucrative application, given that existing AI isn’t great at math-heavy fields such as aerospace and structural engineering. It’s a goal that has tripped up AI researchers, who have found that conversational AI—ChatGPT and its ilk—is prone to giving wrong answers that would flunk any math student.
Improvements in mathematical reasoning could also help AI models reason better about conversational queries, such as customer service requests.
Google and a number of startups are also hard at work on development of reasoning technology. Last month, Google DeepMind said its AI would beat most human participants in the International Mathematical Olympiad. Another major rival, Anthropic, said its latest LLM could write more-complicated software code than its prior LLMs could, and answer questions about charts and graphs, thanks to improvements in its reasoning capabilities.
To improve models’ reasoning, some startups have been using a cheap hack that involves breaking down a problem into smaller steps, though the workarounds are slow and expensive.
Regardless of whether Strawberry launches as a product, expectations are running high for Orion as OpenAI looks to stay ahead of its rivals and continue its remarkable revenue growth. Earlier this month, for instance, Google beat OpenAI to launch an AI-powered voice assistant flexible enough to handle interruptions and sudden topic changes from users, despite OpenAI first announcing its version in May.
And LLMs from other model developers like Google, xAI, Anthropic and Meta Platforms are quickly catching up to OpenAI’s on leaderboards such as the Lmsys Chatbot Arena, though OpenAI models are far and away the top choice for business buyers and AI application developers.
What Ilya Saw
Strawberry has its roots in research. It was started years ago by Ilya Sutskever, then OpenAI's chief scientist. He recently left to start a competing AI lab. Before he left, OpenAI researchers Jakub Pachocki and Szymon Sidor built on Sutskever's work by developing a new math-solving model, Q*, alarming some researchers focused on AI safety.
The breakthrough and safety conflicts at OpenAI came just before OpenAI board directors—led by Sutskever—fired Altman before quickly rehiring him.
Last year, in the leadup to Q*, OpenAI researchers developed a variation of a concept known as test-time computation, meant to boost LLMs’ problem-solving abilities. The method gives them the opportunity to spend more time considering all parts of a command or question someone has asked the model to execute. At the time, Sutskever published a blog post related to this work.