1.3 Million GPUs for AI Training
Jan. 25th, 2025 03:10 pm
“We'll end the year with more than 1.3 million GPUs,” he wrote in a Facebook post yesterday, on Friday(More details: https://www.facebook.com/zuck/posts/pfbid0219ude255AKkmk4JAueXZeZ9zpjNYio2tBkd7bNmCaRbJ6iJaVVjypUgDg78CNdq5l).
The statement suggests Zuckerberg plans on doubling the company’s stock of enterprise GPUs, which excel at training ever-larger AI models. A year ago, Meta’s CEO said his company was on track to acquire the equivalent of 600,000 Nvidia H100 GPUs by the end of 2024.
The goal is to power cutting-edge AI assistants through the company’s upcoming Llama 4 model, which will be released later this year to compete with OpenAI’s ChatGPT and Google’s Gemini. “In 2025, I expect Meta AI will be the leading assistant serving more than 1 billion people and Llama 4 will become the leading state of the art model,” Zuckerberg said.
He also envisions developing an “AI engineer" that'll be smart enough to contribute computer code to Meta’s R&D efforts(The company recently laid off 3,600 employees). To power the AI programs, Zuckerberg is preparing to build an exceptionally large data center, one so vast it "would cover a significant part of Manhattan," he said.
The data center will also use a massive amount of power—over 2 gigawatts, or 2000 megawatts. In contrast, the US’ fastest supercomputer runs on around 30 megawatts of electricity.
According to Zuckerberg, the upcoming data center will go online later this year starting with 1 gigawatt of compute power. His company also plans to invest up to $65 billion in capital expenditures this year to build the data center.
“This is a massive effort, and over the coming years it will drive our core products and business, unlock historic innovation, and extend American technology leadership,” he added.
This appears to be Zuckerberg’s response to the competition. Elon Musk has been building his own AI supercomputer in Memphis, Tennessee, which already has 200,000 GPUs, but is set to house at least 1 million of them. Earlier this week, Sam Altman also announced Project Stargate, a $500 billion joint effort with Softbank and Oracle to build data centers in the US dedicated to AI development.
Earlier this month, Microsoft said it's on track to spend about $80 billion to build out AI-enabled data centers in FY 2025 "to train AI models and deploy AI and cloud-based applications around the world." Last fall, Redmond also signed a deal to restart a nuclear facility on Three Mile Island in Pennsylvania to address the energy demands of generative AI.
Next-generation AI data centers will likely need even more power. According to The New York Times, OpenAI envisions building data centers that use 5 gigawatts of electricity, or more than five times a single nuclear power plant can typically generate.