Amazon's AI Chips Are Finally Winning Over Developers,Here's Why Now
Amazon's custom AI chips are starting to compete seriously with Nvidia's dominance, thanks to major software improvements and backing from leading AI companies like Anthropic. After years of development, Amazon's Trainium and Graviton chips are now attracting smaller developers and startups who previously saw them as too difficult to use. The shift reflects a broader industry trend where cloud providers are building their own silicon to improve economics and reduce dependence on Nvidia.
Why Are Amazon's AI Chips Suddenly More Attractive?
For years, Amazon's Trainium chips struggled to gain adoption outside of Amazon's own operations. The main problem was not the hardware itself, but the software ecosystem. Nvidia's CUDA framework, an open-source software layer that developers have spent years learning, created a powerful moat around Nvidia's graphics processing units (GPUs). Amazon had to build its own equivalent, called Neuron, and convince developers it was worth the switch.
That calculus changed recently. Nvidia's severe chip shortage has made Amazon's pitch more compelling. Sales representatives from Amazon have been telling startups that Nvidia GPU availability is limited, while Amazon has signaled it has more Trainium capacity available and is willing to negotiate on price. At the same time, Amazon has made dramatic improvements to Trainium's documentation and support, removing what many developers saw as the primary barrier to adoption.
"Our response has always been the lack of software support being a barrier. That's the thing that changed in the last couple months. That barrier has been removed," said Daniel Svonava, CEO of Superlinked, an infrastructure firm that helps companies run AI models on rented infrastructure.
Daniel Svonava, CEO at Superlinked
Amazon has backed this commitment with concrete support. Superlinked, for example, received $200,000 worth of AWS credits to test Trainium chips. Other developers report that Amazon has improved how Trainium works with popular open-source tools and models, making the transition less painful.
How Is Anthropic Driving Amazon's Chip Strategy?
The real turning point came through Amazon's partnership with Anthropic, one of the world's leading artificial intelligence (AI) companies. In 2023, Amazon announced that Anthropic would use Trainium and Inferentia chips to train and run its large language models (LLMs), which are AI systems trained on vast amounts of text to generate human-like responses. By 2024, Amazon had committed $8 billion to Anthropic, cementing the partnership.
This was not a passive investment. Amazon and Anthropic engineers worked closely together to optimize Trainium for Anthropic's specific needs. The collaboration went deep, involving software improvements that could benefit other customers as well. Some of this work focused on helping Trainium perform more processes simultaneously, making models cheaper and faster to run.
"The collaboration between Anthropic and AWS on the NKI has been very, very deep," explained Carlos Escapa, a former AWS executive who worked on selling Anthropic models, referring to Amazon's Neuron Kernel Interface software that lets developers fine-tune how models run on Trainium chips.
Carlos Escapa, Former AWS Executive
By the end of 2024, Amazon had launched its second-generation Trainium chip and announced Project Rainier, a large Trainium cluster dedicated to Anthropic. Inside Amazon, Trainium use began picking up, with Amazon's own Nova large language model starting to use Trainium in 2024 and ramping up since then.
What Are the Business Implications for Amazon?
Amazon's custom silicon business, including Trainium and Graviton chips, has reached a more than $20 billion annualized run rate, according to CEO Andy Jassy. If measured as a standalone chip seller, that would translate to roughly $50 billion in revenue. This figure reflects revenue from customers using Trainium and Graviton directly through Amazon's EC2 service, which is Amazon's core cloud computing offering.
The economics matter because Amazon is betting that Trainium can improve the profitability of its AI cloud business. As Jassy noted in a January interview, while Amazon plans to continue buying Nvidia chips, the company is strategically disadvantaged if it does not have its own custom silicon for large-scale inference workloads, which are the computations needed to run trained AI models in production.
Amazon's Bedrock service, which gives customers access to AI models from Anthropic and other providers, initially relied on GPUs. However, as Trainium software matured, more Bedrock workloads moved to Trainium chips. Amazon now says Trainium runs the majority of Bedrock inference across more than 125,000 customers. The company is also planning to train its largest internal models on Trainium going forward.
Steps to Evaluate Amazon's AI Chips for Your Workload
- Assess Your Inference Needs: If your primary use case involves running trained AI models in production rather than training new models from scratch, Trainium may offer better cost and performance than Nvidia GPUs, especially if you have flexibility in your software stack.
- Check Software Compatibility: Review whether your preferred AI frameworks and open-source tools now support Trainium. Amazon has significantly improved documentation and support for popular tools, but compatibility varies by use case.
- Compare Pricing and Availability: Request pricing quotes from Amazon for Trainium capacity and compare them to Nvidia GPU pricing. Given current Nvidia shortages, Amazon may offer more flexible terms and faster availability.
- Test with AWS Credits: Contact Amazon to discuss whether your organization qualifies for AWS credits to test Trainium on a small workload before committing to a larger migration.
What Challenges Remain for Amazon?
Despite the progress, Amazon still faces hurdles. Early in the generative AI boom, some Amazon teams did not use Trainium broadly, and Amazon's Nova large language models were initially trained on Nvidia GPUs rather than Amazon's own chips. This suggests internal skepticism about Trainium's capabilities existed even within Amazon.
External adoption has also been uneven. Developers at Hugging Face, a popular platform for open-source AI models, experienced frustration with Trainium support. Amazon was sometimes slow to support newer models on the platform. However, recent improvements in documentation and support suggest Amazon is addressing these gaps.
The broader context is that Nvidia's software moat remains formidable. Developers have spent years building expertise around CUDA, and switching to a new platform requires retraining and code rewrites. Amazon's success will depend on whether the cost savings and availability advantages of Trainium outweigh the friction of switching, especially as Nvidia's supply constraints ease over time.
For now, Amazon's momentum is real. The combination of Anthropic's endorsement, improved software, and Nvidia's supply constraints has created a window of opportunity. Whether Amazon can sustain this momentum and build a durable alternative to Nvidia remains to be seen, but the company's $20 billion annualized run rate in custom silicon suggests the bet is paying off.