Before OpenAI’s ChatGPT became famous for being able to make up interesting phrases, a small company called Latitude was wowing customers with its AI Dungeon game, which let players use artificial intelligence to make up stories based on their suggestions.
But as AI Dungeon became more popular, Latitude CEO Nick Walton said that the costs of keeping the text-based role-playing game running went through the roof.AI Dungeon’s text-generation program was powered by GPT language technology from OpenAI, an artificial intelligence research group backed by Microsoft.The more players who played AI Dungeon, the more money Latitude had to pay OpenAI.
Walton found out that content marketers were using AI Dungeon to write promotional content, which was even worse. His team hadn’t thought of this use for AI Dungeon, but it ended up adding to the company’s AI costs.
Walton thinks that at its peak in 2021, Latitude spent roughly $200,000 per month on OpenAI’s so-called generative AI software and Amazon Web Services to keep up with the millions of user inquiries it required to handle each day.
“We joked that we had human workers and AI employees, and we spent roughly the same amount on both,” Walton added. “We spent hundreds of thousands of dollars every month on AI, and we are not a large firm, so it was a huge expenditure.”
By the end of 2021, Latitude will switch from OpenAI’s GPT software to AI21 Labs’ cheaper but still powerful language software. According to Walton, the company has also added open source and free language models to its service to keep costs down.Walton says that Latitude’s costs for generative AI have gone down to less than $100,000 per month. To help keep costs down, the company charges gamers a monthly membership fee for more powerful AI features.
The high costs of AI in Latitude show an uncomfortable side of the recent rise of generative AI technologies:Software development and maintenance can be very expensive, both for businesses that build the underlying technology, called “big languages” or “foundation models,” and for businesses that use artificial intelligence to power their own software.
The high cost of machine learning is an unsettling fact of the business world, where venture capitalists look for companies that could be worth trillions of dollars and big companies like Microsoft, Meta, and Google use their huge amounts of money to get ahead in technology so that smaller competitors can’t catch up.
But the current boom could end if the profit margin for AI applications stays lower than it was for software-as-a-service in the past because of the high cost of processing.
The enormous cost of training and “inferring,” that is, actually executing, huge language models is a structural cost that distinguishes this computer boom from prior ones. Even when the software has been constructed or trained, running big language models demands a massive amount of computational resources since they do billions of computations every time they respond to a prompt. In contrast, providing web applications or pages needs much fewer calculations.
These computations also require the use of specialized technology. Traditional computer processors are capable of running machine learning models, but they are sluggish. The majority of training and inference currently takes place on graphics processors, or GPUs, which were originally designed for 3D gaming but have since become the norm for AI applications due to their ability to do numerous basic computations at the same time.
Nvidia manufactures the majority of GPUs used in the AI sector, and their core data center workhorse chip costs $10,000. Scientists who create these models often quip that they “melt GPUs.”
According to analysts and experts, the key step of training a huge language model like GPT-3 might cost more than $4 million. More sophisticated language models might cost in the “high single digit millions” to train, according to Rowan Curran, a Forrester analyst specializing in AI and machine learning.
For example, Meta’s biggest LLaMA model, published last month, took 2,048 Nvidia A100 GPUs to train on 1.4 trillion tokens (750 words equals around 1,000 tokens), requiring about 21 days, according to the firm.
It takes around one million GPU hours to train. It would cost more than $2.4 million using AWS’s dedicated pricing. And, with 65 billion parameters, it has fewer than OpenAI’s existing GPT models, such as ChatGPT-3, which has 175 billion parameters.
Hugging Face CEO Clement Delangue said that training the company’s Bloom big language model took more than two and a half months and needed access to a supercomputer with “the equivalent of 500 GPUs.”
Since big language models are expensive, organizations that construct them must exercise caution while retraining the software, which helps the program enhance its skills.
“It’s important to realize that these models are not trained all the time, like every day,” Delangue said, noting that’s why some models, like ChatGPT, don’t have knowledge of recent events. ChatGPT’s knowledge stops in 2021, he said.
“We are actually doing training right now for version two of Bloom, and it’s going to cost no more than $10 million to retrain,” Delangue said. “So that’s the kind of thing that we don’t want to do every week.”
Inference and who pays for it
Engineers use a machine learning model that has been trained to make predictions or write text. This is called “inference,” and it is much more expensive than training because it may need to run millions of times to make a good product.
Curran thinks that for a system as popular as ChatGPT, which UBS estimates had 100 million monthly active users in January, it may have cost OpenAI $40 million to handle the millions of prompts individuals gave into the program that month.
As these instruments are used billions of times every day, the costs rise. Analysts believe that Microsoft’s Bing AI chatbot, driven by an OpenAI ChatGPT model, would need at least $4 billion in infrastructure to provide replies to all Bing users.
Although Latitude did not have to pay to train the underlying OpenAI language model it was accessing, it did have to account for inferencing expenses that were “half a cent per call” on “a few million queries per day,” according to a Latitude spokeswoman.
“And I was being quite cautious in my estimations,” Curran stated of his calculations.
To plant the seeds of the present AI boom, venture investors and IT behemoths have poured billions of dollars into firms specializing in generative AI technology. According to media estimates in January, Microsoft spent up to $10 billion with GPT’s overseer, OpenAI. Salesforce Ventures, the company’s venture capital arm, has launched a $250 million fund dedicated to generative AI firms.
According to Semil Shah of the venture capital companies Haystack and Lightspeed Venture Partners, “VC monies switched from financing your cab trip and burrito delivery to LLMs and generative AI computing.”
Several businesses perceive hazards in depending on possibly subsidised AI models over which they have little control and only pay on a per-use basis.
“When I speak to my AI colleagues at startup conferences, I urge them not to rely only on OpenAI, ChatGPT, or any other huge language models,” said Suman Kanuganti, creator of personal.ai, a chatbot now in beta phase. “Since businesses change, aren’t they all controlled by giant tech companies? If they take away your access, you’re out.”
Businesses, including corporate software company Conversica, are investigating ways to employ the technology via Microsoft’s Azure cloud service at the present reduced pricing.
Although Conversica CEO Jim Kaskade refused to disclose how much the firm is paying, he did acknowledge that the discounted cost is helpful while the company investigates how language models may be successfully employed.
“If they were actually attempting to break even, they’d charge a lot more,” Kaskade said.
How it could change
It’s unknown if AI processing will remain expensive as the sector grows. Firms developing the foundational models, chip manufacturers, and startups all see potential in lowering the cost of operating AI software.
Nvidia, which controls over 95% of the AI chip market, continues to create more powerful versions intended expressly for machine learning, although overall chip power advances have stalled in recent years.
Yet, Nvidia CEO Jensen Huang predicts that in ten years, AI will be “a million times” more efficient due to advancements in not just processors but also software and other computer components.
“In its greatest days, Moore’s Law would have produced 100x in a decade,” Huang remarked on an earnings call last month. “By developing new processors, systems, interconnects, frameworks, and algorithms, as well as collaborating with data scientists and AI researchers on new models, we’ve made huge language model processing a million times quicker throughout that whole range.”
As a business opportunity, some startups have focused on the high cost of AI.
“No one was suggesting that you construct something specifically designed for inference.'” “How would it look?” stated Sid Sheth, creator of D-Matrix, a firm developing a technique to save money on inference by conducting more processing in memory rather than on a GPU.
“Now, most inference is done on GPUs, namely NVIDIA GPUs. They purchase the expensive DGX systems sold by NVIDIA. The issue with inference is that if the demand jumps extremely quickly, like it did with ChatGPT, it went from zero to a million users in five days. Your GPU capability cannot keep up with that since it was not designed to do so. “It was designed for training and graphics acceleration,” he said.
HuggingFace CEO Delangue feels that instead of the huge language models that have received the most attention, many organizations might benefit from concentrating on smaller, more specialized models that are less expensive to train and operate.
But OpenAI said last month that it will make it cheaper for businesses to use its GPT models.It now costs a fifth of a penny for around 750 words of output.
Latitude, the producer of AI Dungeons, has taken notice of OpenAI’s cheaper costs.
“I think it’s fair to say that it’s obviously a major transition in the market that we’re thrilled to see happen, and we’re continually reviewing how we can give the greatest experience to consumers,” a Latitude representative said. “Latitude will continue to analyze all AI algorithms to ensure that we have the greatest game available.”